This is carlo.log, the weblog of Carlo Zottmann — coder, gamer, runner and husband from Munich, Germany. There's an RSS feed, too.
You are currently browsing an archive of entries tagged with 'Howto'.
October 18, 2008.
During the last few weeks, I’ve built a couple of pipes I want to share. (”Pipe”, in this context, means an application built in Yahoo! Pipes.)
Here’s one of them.
As I’ve mentioned once or twice in the past, I’ve become a more or less avid runner during the last year. One central piece of my equipment is my Nike+. I don’t run without it. I am a geek, I love numbers. The Nike+ provides me with numbers. It makes running a game.
I sync my iPod with my iTunes after each run, and the Nike site is taking the raw data, crunching it, giving me graphs and more numbers, and (this is the interesting thing) badges for my website.
Now, I don’t really care about those. But when there are badges, there must be an API which provides the raw data to them. Looking behind the scenes, I quickly found it. It’s not password-protected or secured in any way; when you set your nikeplus.nike.com profile to “public”, the API will return some of your data (run overviews, run details etc. — no personal details).
So, knowing the API URL, I’ve built a pipe which will do a few things:
When writing the pipe, I’ve made a few assumptions, namely that…
Since I am lazy, I’ll only use the data for the most recent run, so the pipe’s results will be exactly that — just a single item, your most recent run.
That being said, I found this sufficient. After adding the pipe’s RSS URL as new “blog”-type service1 to Friendfeed, FF will effectively trigger the pipe a few times each day, and your latest run will be added to your stream quickly. Next time you sync your iPod, the Nike site will pick up the new data, the API will return the new data to the pipe, and the new run will be added as new Friendfeed item.
The pipe will post your runs in the following format: “[Nike+ runner name] ran [distance] [km/mi] in [time]“, for example “3R ran 11.3543 km in 1:03′54””. The message will link to the public page of the run — a page like this. (3R is my Nike+ moniker.)
Now, why would I want to add my runs to Friendfeed? Well, why not? For me, running is a nice part of my life. I’m actually enjoying it, I’m keeping it casual, and I am proud of every damn kilometer mark I pass. Plus, as mentioned, I am a geek, and I like to share what I build. :)
If you have questions or suggestions, speak your mind in the comments.
Friendfeed’s terminology is a bit misleading here… If you want to add an RSS feed, you’ll have to use “Blog” as new service. Eh. ↩
October 18, 2008.
During the last few weeks, I’ve built a couple of pipes I want to share. (”Pipe”, in this context, means an application built in Yahoo! Pipes.)
Here’s one of them.
A few weeks ago, I’ve first tested and then bought the excellent CrossOver Games. It’s an emulator (basically a highly specialized version of WINE) which allows me to play a slate of Windows games, old and new, under OSX. So, that’s how I’ve spent big chunks of my spare time during the last few weeks: playing through the wonderful Portal and the great Half-Life 2 games. (On a related note, I’ve noticed the World of Goo demo is running flawlessly in CXG. Awesome!)
I’ve got the games via Steam, and was both delighted and highly annoyed to learn that newer Steam games offer achievements. You see, I am a sucker for achievements. I love them, even though they aren’t good for anything. I usually spend too much time trying to get this or that achievement. These meaningless little pixel badges are “awarded” for different things you manage to do in different games. You can get achievements in various games, on various platforms. For example on Xbox Live, or, as mentioned, on Steam.
So, being a male gamer in his mid-30s, I naturally like to use these superfluous thingies to brag about my mediocre gaming skills. Meaning, I want them to show up on my Friendfeed profile.
Thus, I wrote a pipe which grabs the achievements from any (public) Steam ID page (here’s mine), spitting them out in an usable format — in Friendfeed’s case, that’d be RSS. (Pipes also returns the data as JSON if you want, or even as a handy HTML badge you can put on your blog or whereever.)
I’ve then added the RSS URL of the finished pipe as new service (type: “Blog”1) to Friendfeed.
So, that’s all there is to it. Maybe I am the only one caring about this type of thing, maybe not. If you have questions or suggestions, sound off in the comments. :)
Friendfeed’s terminology is a bit misleading here… If you want to add an RSS feed, you’ll have to use “Blog” as new service. Eh. ↩
October 20, 2007.
So, (almost) everyone is totally crazy for lifestreams these days. In case you managed to get around the whole issue so far: a lifestream is basically a big bucket (i.e. web page) where all the updates and update notifications from your blog, your ADD-induced Twitter posts, your Flickr uploads etc come together in one concise way so it’s easier for others to ignore them. Also, you only have one URL to hand out to hot women in pubs because the stream inadvertedly works as a hub page, too!
Joy.
Anyways, I was talking with Hendrik about the various stream services out there, and I figured it’d be fairly easy to build something server-less using a wee bit of Javascript and Yahoo! Pipes. So I’ve spent around an hour to do just that.
The basic idea is that you make a pipe containing all your feeds, and then access the output using Javascript from your page. Add some styling, and voilá.
It’s a bit rough around the edges, but hey, it’s just a prototype. It’s not life- or game-changing in any way, but I’d like to share anyways.
Update 2007-11-21: It’s mostly broken right now since I am playing around with something. Not to worry.
September 17, 2007.
Over the weekend I finally had some time to continue playing around with Yahoo! Pipes. (Turns out that quitting World of Warcraft makes your days longer. Huh. Who’d have thought…) It really is a neat toy/tool.
Alas, it is not without hitches. If you’ve worked with Pipes before, you know that it has a module named ‘Fetch Data’, but unfortunately this module is only able to deal with XML and JSON data. It’s just that it will outright reject anything else. And most HTML pages fall in this category.
Here’s my story on how I managed to make Pipes process run-of-the-mill HTML pages. I couldn’t find anything on this topic on the net, so even though I don’t think I am the first guy to tackle this problem, I want to share what I found out.
A little background. I wanted to build a comic feed for Explosm.net’s Cyanide & Happiness, a comic strip for the …wicked. C&H has an official RSS feed, but it only carries links to the new comic pages. Which is fine and dandy for most people, but I am a busy guy, you know, what with my hectic web-3.1 lifestyle and all, and I really can’t afford to even lose 20 seconds by clicking through to anywhere. (Read: I’m really lazy.)
So I tried to make Pipes to parse the Explosm.net comic pages and produce a Pipes-powered feed that not only had the link, but also the particular comic image for that day.
I started out with the Fetch Feed module reading the Explosm.net RSS feed. I am usually not interested in anything but the comics, so in a second step, I made use of the Filter module to ditch feed items that doesn’t contain the word “comic”.
So that left me with a list of feed items, each with a link, a description, a title and some other metadata.
That’s where the trouble started. Boy, what was I thinking?! The HTML pages are far from perfect, they sure as hell doesn’t validate as XML. Then again — that’s alright! The site wasn’t made for scraping or parsing, it was made to display -tasteful- weird comic strips, and that’s enough. I like it for that. It’s just that there is no Pipes module to grab a glob of textual data and work with it. (Yet? Please, someone say “Yet”, employing a knowing undertone.)
After a while of trying to beat any of the available ‘Fetch *’ modules into submission, I was close to giving up. But then I remember I had come across a HTML Tidy service a few months or years ago. After a few minutes, I had rediscovered it: W3C’s own Tidy service. It’ll digest any HTML page (you just pass the address), and it will clean it up and return valid a XHTML document, which also constitutes as XML. Huzzah!
Well then, commence plumbing! The third module I used was the wonderful Loop module. It’s special because it will process every item in a given dataset, in this case it’d iterate over all remaining feed items. I’ve told the Loop module to use the String Builder module, there I concatenated the Tidy URL, http://cgi.w3.org/cgi-bin/tidy?forceXML=on&docAddr=, with the item.link attribute, and assigned the resulting string to a new a attribute, item.link_tidy.
Step four consisted of another Loop, in which I would use a Fetch Data module to grab the data from the Tidy’d Explosm.net pages, which had now been transformed into neat and valid XML structures and therefore could be digested by the module. It took me a few minutes to find the exact location of the image element (body.div.table.tr.td.div.0.div.1.table.tr.td.1.div.font.0.content), which I then stored in a new attribute named item.content.
(While we’re at it, in the future I would very much like to have the option to use XPath addressing in ‘Fetch Data’… or any related module, for that matter. Thanks in advance. :) )
Anyways, the next steps are using yet another Loop to concatenate the image HTML with the original item.description, and then another one to rewrite the original image URLs a bit so the images are served through the free and excellent Coral Distribution Network.
(Technically, I might not be stealing from Explosm.net, especially since they allow hotlinking of their comic strips from message boards and pretty much anywhere, but for the sake of this example and story here, and the assumption that a few people might end up cloning the Pipe — yes, I am a dreamer –, I feel a bit better knowing I leave less of a footprint on their image servers.)
Then it was applying a Reverse module to flip the order of the items (a cosmetical issue), and that was it.
So, to sum up this long and rather tedious story: you can process HTML pages with Yahoo! Pipes. It’s just not as straightforward as you would expect.
Well, and if you want to take a look at it: here’s the finished Pipe — it might not look like much since the HTML preview doesn’t show the images, but the RSS output is complete.
Tip o’ the hat to Cyanide & Happiness for mindblowing hijinx, the W3C for the free & open Tidy service, and the CDN for being good people.
Now go and use Yahoo! Pipes. It’s one of the cooler products we (as a company) have released in the last few years, and quite frankly, it deserves praise.
So there.
Quick links: Finished Pipe, RSS Output
Update, 2007-09-19:* I’ve slightly adjusted the element addressing in the Pipe to waterproof it a bit. The general idea and order of the modules didn’t change, tho.
Update, 2007-12-07: My technique has been superceded by the offical new Fetch Page module — enjoy! :)
April 28, 2004.
Okay, so you want to use SpamBayes on your mail server in order to enjoy working with your webmail or IMAP again. Here’s how to do it.
$HOME directory.Spam, TrainingHam, TrainingSpam, Unsure..procmailrc:
# ~/.procmailrc for Dreamhost
# Uses Maildir format mail directory.
## Directory for storing procmail-related files
PMDIR=$HOME/.procmail
## Set to yes when debugging
# VERBOSE=yes
# Logging
# LOGFILE=$PMDIR/log
## Remove # when debugging; set to no if you want minimal logging
# LOGABSTRACT=all
# Message directory (Courier IMAP and mutt)
MAILDIR=$HOME/Maildir
:0 fwr:hamlock
| $HOME/bin/sb_filter.py
# Messages that are so obviously spam that we should not train on them
# are put in the Spam folder.
# (Once the training has proceeded long enough, you might want to change
# the recipient folder to /dev/null in order to delete the spam that
# scored 1.0 right away.)
:0
* ^X-SpamBayes-Classification: spam; 1.00
.Spam/
# Messages that are so obviously spam that we should not train on them
:0
* ^X-SpamBayes-Classification: spam; 0.9[5-9]
.Spam/
# Messages that are spam but we might want to train on them are copied
# to the TrainingSpam folder, the original remains in the inbox
:0 c
* ^X-SpamBayes-Classification: spam
.TrainingSpam/
# Unsure messages must be copied to the unsure folder for training
:0 c
* ^X-SpamBayes-Classification: unsure
.Unsure/
# Ham that doesn't score 0.00 is eligible for training as well
:0 c
* ^X-SpamBayes-Classification: ham; 0.0[2-9]
.TrainingHam/
:0 c
* ^X-SpamBayes-Classification: ham; 0.1[0-9]
.TrainingHam/
# Folder names for use in recipes should be of form '.Foo/' for Maildir format
### INCLUDERC=$PMDIR/spam.rc
# INCLUDERC=$PMDIR/testing.rc
# Anything that hasn't been filtered yet is delivered to your inbox by this
# recipe. The '/' at end of pathname indicates a Maildir format mailbox
:0
$HOME/Maildir/
Write the following line to a file named .cronjob:
21 15 * * * $HOME/bin/sb_mboxtrain.py -d $HOME/.hammiedb -g $HOME/Maildir/.TrainingHam -s $HOME/Maildir/.TrainingSpam >> $HOME/cronjob_sb_mboxtrain.log
Do a crontab .cronjob to install the cronjobs.
That’s it: You’re basically done!
What does this setup do?
Spam folder.TrainingSpam folder for learning purposes.TrainingHam folder for learning purposes.Unsure folder.TrainingHam and TrainingSpam folders.The first couple of days or weeks you might have to do some manual sorting in order to train SpamBayes to tell good from bad; if you have spam in your inbox, you move it to the folder TrainingSpam. If your Spam folder or TrainingSpam contain good mails (ham), move those to TrainingHam. If there is mail in the “Unsure” folder, SpamBayes isn’t sure whether it’s spam or ham, so you have to sort the mails to one of the two training folders yourself.
The initial training can be a bit tiresome, but after a relatively short period of time you’ll enjoy a spam free inbox again. By now it works so well for me and Dana that we don’t run spam filters in our desktop mail clients anymore. We just don’t need them anymore. :)
A little shortcut for the initial training: connect to your mailbox via IMAP and copy all the old spam mails from your desktop clients to the TrainingSpam folder on the server. I did that with a couple of hundreds spam mails, and that sped up my initial training quite remarkably. ;) Also, once your account is trained, and you want to repeat your success on another mail account, you can copy the file .hammiedb (which is all the training data) to the new account once you have set up everything there. :)
Have fun, and good luck.
Questions?
You look like you have no time, but still need to find some Xmas gifts, but have no idea what to get them, right?
Then try my new site, random.li: fast-paced, crack S.W.A.T. team-style gift finding.
Go go go! No time to waste! :)