tail -f carlo.log

Sep 25 2007

Open Facebook

— Posted under , , ,

(Disclaimer: The following paragraphs might be pointless and you might end up feeling I have once again stated the obvious. So… you’ve been warned.)

TechCrunch reports on Google apparently working with some unnamed industry bigshots on opening up their social networks services.

Yesterday a select group of fifteen or so industry luminaries attended a highly confidential meeting at Google’s headquarters in Mountain View to discuss the company’s upcoming plans to address the “Facebook issue.” […] Google’s goal – to fight Facebook by being even more open than the Facebook Platform. If Facebook is 98% open, Google wants to be 100%.

Well, we’ll see how that’ll pan out.

I have been discussing the whole Facebook API thing with friends and co-workers over the past few weeks. You are constantly hearing about how cool it is that “Facebook is open”, that everyone can build FB applications and become wealthy and all that jazz, but most people have already figured something out — it isn’t open. It’s a goddamn closed system.

So let’s assume you want to use their rich API set to build the next killer application and get rich in the process. How’s that going to work? Yes, you can build cool stuff. Definitely. It’s just — how are you going to monetize your work? In the end, in my eyes, there are but two types of applications:

1. Advertisements, i.e. marketing widgets/apps that display links (in different forms, of course), trying to get the user to click through to a non-FB site.
2. For-fun apps like games or graffiti-wall-alikes, which completely run within the boundaries of FB and never leave them.

And that’s the thing. Either you’re building something that is basically an eye-catcher for your already existing site, not unlike a digital carnival barker, and hope to get attention and visitors coming from Facebook to your site. Or you build something far more complex, inside Facebook, without a viable way to make money from it. Which might cool from a hobbyist point of view, but is crap when you build things for a living. What’s built for Facebook stays in Facebook.

Hendrik noted the other day that if there was some sort of “Facebook points” (akin to Xbox Live Arcade points or Linden Dollars), i.e. micropayments, the situation would be far more interesting for professionals. And he’s right. Alas, there’s no such thing. Too bad.

So, to me, it seems that right now everyone developing real applications within Facebook is an unpaid semi-employee of Facebook.com. If you need to make money on the web to put food on your own table, I don’t think is an option. There is no incentive. So far, Facebook is a purely hobbyist platform. End of story.

I might be completely wrong about this, mind you. But I still don’t understand all the applause and excitement about the prospect of writing Facebook applications. Just saying.

Sep 17 2007

Yahoo! Pipes Tutorial: How To Process HTML Pages

— Posted under , , , ,

Y! Pipes: HTML processing, thumbOver the weekend I finally had some time to continue playing around with Yahoo! Pipes. (Turns out that quitting World of Warcraft makes your days longer. Huh. Who’d have thought…) It really is a neat toy/tool.

Alas, it is not without hitches. If you’ve worked with Pipes before, you know that it has a module named ‘Fetch Data’, but unfortunately this module is only able to deal with XML and JSON data. It’s just that it will outright reject anything else. And most HTML pages fall in this category.

Here’s my story on how I managed to make Pipes process run-of-the-mill HTML pages. I couldn’t find anything on this topic on the net, so even though I don’t think I am the first guy to tackle this problem, I want to share what I found out.

A little background. I wanted to build a comic feed for Explosm.net’s Cyanide & Happiness, a comic strip for the …wicked. C&H has an official RSS feed, but it only carries links to the new comic pages. Which is fine and dandy for most people, but I am a busy guy, you know, what with my hectic web-3.1 lifestyle and all, and I really can’t afford to even lose 20 seconds by clicking through to anywhere. (Read: I’m really lazy.)

So I tried to make Pipes to parse the Explosm.net comic pages and produce a Pipes-powered feed that not only had the link, but also the particular comic image for that day.

I started out with the Fetch Feed module reading the Explosm.net RSS feed. I am usually not interested in anything but the comics, so in a second step, I made use of the Filter module to ditch feed items that doesn’t contain the word “comic”.

So that left me with a list of feed items, each with a link, a description, a title and some other metadata.

That’s where the trouble started. Boy, what was I thinking?! The HTML pages are far from perfect, they sure as hell doesn’t validate as XML. Then again — that’s alright! The site wasn’t made for scraping or parsing, it was made to display tasteful weird comic strips, and that’s enough. I like it for that. It’s just that there is no Pipes module to grab a glob of textual data and work with it. (Yet? Please, someone say “Yet”, employing a knowing undertone.)

After a while of trying to beat any of the available ‘Fetch *’ modules into submission, I was close to giving up. But then I remember I had come across a HTML Tidy service a few months or years ago. After a few minutes, I had rediscovered it: W3C’s own Tidy service. It’ll digest any HTML page (you just pass the address), and it will clean it up and return valid a XHTML document, which also constitutes as XML. Huzzah!

Well then, commence plumbing! The third module I used was the wonderful Loop module. It’s special because it will process every item in a given dataset, in this case it’d iterate over all remaining feed items. I’ve told the Loop module to use the String Builder module, there I concatenated the Tidy URL, http://cgi.w3.org/cgi-bin/tidy?forceXML=on&docAddr=, with the item.link attribute, and assigned the resulting string to a new a attribute, item.link_tidy.

Step four consisted of another Loop, in which I would use a Fetch Data module to grab the data from the Tidy’d Explosm.net pages, which had now been transformed into neat and valid XML structures and therefore could be digested by the module. It took me a few minutes to find the exact location of the image element (body.div.table.tr.td.div.0.div.1.table.tr.td.1.div.font.0.content), which I then stored in a new attribute named item.content.

(While we’re at it, in the future I would very much like to have the option to use XPath addressing in ‘Fetch Data’… or any related module, for that matter. Thanks in advance. :) )

Anyways, the next steps are using yet another Loop to concatenate the image HTML with the original item.description, and then another one to rewrite the original image URLs a bit so the images are served through the free and excellent Coral Distribution Network.

(Technically, I might not be stealing from Explosm.net, especially since they allow hotlinking of their comic strips from message boards and pretty much anywhere, but for the sake of this example and story here, and the assumption that a few people might end up cloning the Pipe — yes, I am a dreamer —, I feel a bit better knowing I leave less of a footprint on their image servers.)

Then it was applying a Reverse module to flip the order of the items (a cosmetical issue), and that was it.

So, to sum up this long and rather tedious story: you can process HTML pages with Yahoo! Pipes. It’s just not as straightforward as you would expect.

Well, and if you want to take a look at it: here’s the finished Pipe — it might not look like much since the HTML preview doesn’t show the images, but the RSS output is complete.

Tip o’ the hat to Cyanide & Happiness for mindblowing hijinx, the W3C for the free & open Tidy service, and the CDN for being good people.

Now go and use Yahoo! Pipes. It’s one of the cooler products we (as a company) have released in the last few years, and quite frankly, it deserves praise.

So there.

Quick links: Finished Pipe, RSS Output

Update, 2007-09-19: I’ve slightly adjusted the element addressing in the Pipe to waterproof it a bit. The general idea and order of the modules didn’t change, tho.

Update, 2007-12-07: My technique has been superceded by the offical new Fetch Page module — enjoy! :)

Sep 15 2007

Quick Note Of The Day

— Posted under ,

I like my life. It’s not perfect, but I don’t care. It’s pretty cool. I could do so much worse, seriously.

w00t!

Sep 14 2007

New Flickr API Output Format: LOL

— Posted under , , , , , , , ,

Looks like there is a new Flickr API output format: LOL. Not my work, no idea who built this, but I really like it.

Here’s an example of returned data (?tags=cats&format=lol):

HAI
IM IN UR BUCKETS MAKING UP FORMATS
GIMME PHOTOS FROM EVERYONE TAGGED CATS WITH GEODATA
I CAN HAS PHOTO IMG_2326
ITZ AT http://www.flickr.com/photos/blush_response/1377293729/
INVISIBLE METADATA
LOL
KTHX.
I CAN HAS PHOTO IMG_2329
ITZ AT http://www.flickr.com/photos/blush_response/1378198718/
INVISIBLE METADATA
LOL
KTHX.
I IS BORED
KTHXBYE.

Grab the data as usual, the format attribute is lol, apparently.

So… If you always thought about teaching your cat how to parse Flickr data, now might be the right time to jump in. Progress!

Sep 10 2007

Testing Out Xbox Live

— Posted under , , ,

Thanks to Mike for lending me his 360 for a few days!

photo of Carlo Zottmann Carlo Zottmann carlo@zottmann.org
München Germany
AIM YIM Jabber