It’s time to get techie again! As you may have noticed, I’ve recently added a new item to my sidebar content called Puede que esté en… (that’s Spanish for I might be at…, for my few Spanish-impaired readers). The location displayed below, a link to Google Maps, is an approximate rendering of my current whereabouts, and it’s aggregated from three different sources using Yahoo Pipes:
- A fixed location feed coming from a Google Spreadsheet timetable where weekdays are columns and hours are rows. This accounts for my usual daily toil between home and work.
- A feed extracted from location information for events on my public Google calendar.
- And a feed from my Twitter stream that gets location information from twits (nope, no magic involved; if a twit contains some text between asterisks, it is assumed to contain location data, and the tagged text is formatted as a Google Maps query —not true mobile location, but not invasive either, plus it allows me to cheat!)
The three streams are merged and a result is selected according to a priority rating. Current version has a simple fixed implementation: timetable data has the lowest rank, while twits have the highest. It’s a rather complex use for Yahoo Pipes, featuring a master pipe with three subpipes with a pretty high module count. Have I started to reach the limits of the Pipes platform concerning pipe intelligibility?
Musing about combined uses for Pipes and Google Spreadsheets I came across an interesting idea: both platforms, or at least a subset of them, might be isomorphic. Two equivalent languages. Could I implement the same feed mixing goodness in a more compact and expressive way by using a single on-line spreadsheet? Pipes themselves are nothing more than connectors, and they could be replaced advantageously in a spreadsheet context with cell references. These can be absolute or relative, may be constructed on the fly and are amenable to indirection (that’s the point where things get complicated in spreadsheets, and one is usually better off seeking greener pastures in real programming languages). There were limitations by both sides, to be sure. At a glance:
- Nearly every data mingling in Pipes is painful. And with painful I mean extremely painful. Dynamic data type conversion is handled by the platform and regex modules are of great help, but don’t attempt more than a handful substitutions or concatenations in the same pipe or things could go out of hands really soon. And let’s forget about numerical computations. Spreadsheets are a much more natural medium for all that jazz.
- User input means are much more accessible on Pipes than on its hypothetical Google Spreadsheets counterpart. Input on Spreadsheets means resorting to a complex API built upon HTTP and Atom extensions: the simplest data entry requires a GET request to find a base URL, an authentication procedure and a PUT with dynamically generated XML content. You may end up altering the whole of a spreadsheet remotely in that fashion, but it’s simply not a practical goal for a mashup to attempt: I only have to picture in my mind the oodles of Javascript that ought to be written —and debugged— to roll my eyes in dismay.
The Spreadsheets feature that puts it in approximate equal footing with Pipes are its functions for external data: a catalogue of spreadsheet functions that allow to retrieve content from outside the sheet for processing. The equivalent in Pipes are Source Modules: less painful to use, but at times less capable —it’s hard to beat full XPath XML data extraction, but then such a feature is overkill in most cases. On the other hand, good luck trying to synthesize feed items with Google Spreadsheets. It just doesn’t work that way. But putting differences aside, Spreadsheets should let me aggregate feeds and process them all the better. Publishing options allow me to emit a feed as a result; thus the full Pipes workflow would be realised, in a more compact way. True?
False. There is a big gotcha lurking beneath Google Spreadsheets surface. Let me demonstrate it:
- Create an empty sheet.
- In cell A1, add the function
=NOW()
. That should nicely display the current date. - Now, publish your sheet and look for the More publishing options link, way down on the dialog. Click on it, and generate an URL for a cell-based feed for cell A1.
- Open that link in any serious browser (one that supports RSS, I mean). What do you see?
The exact same date that appeared on the spreadsheet when you added the current date function. Now, wait an indeterminate amount of time, and retry. What? That’s not current any more! Plus, there is no caching involved. NOW()
is not NOW()
(I wonder if it should be called THEN()
). Trouble is, spreadsheets are completely unable to update themselves unless an actual edit takes place. And this limitation affects external data functions too. The verdict: Google Spreadsheets are totally unusable for the purpose of aggregating, processing and emitting content. Humble Pipes win big this time with its, well, pipes paradigm.
However, Pipes and Spreadsheets can work together provided any dynamic component of the intended result is implemented on the Pipes side. Spreadsheets can just take on a supporting role, a kind of two-dimensional database. Which is a shame, if you ask me. Just think of the possibilities!
Comentarios
Comenta en el blog con tu perfil en el Fediverso, simplemente contestando al post correspondiente del perfil @blog@brucknerite.net.
8 respuestas a «Pipes vs Spreadsheets: a showdown»
Wow, you nerd!
I had heard before about Yahoo Pipes, but never dared to have a look at the interface or a piece of code. Now I’m shocked! I should have known that not all web-2.0-thingies are for dummies.
[nelson-voice]Ha, ha![/nelson-voice] Pipes is a fun toy to play with… And I’m always willing to try out ideas. Sometimes I even have some remotely useful ones.
Nice job!
Maybe you have one problem: I think the sort on Yahoo Pipes should be first by date and later by priority?
Sharp eye! Those twisted little pipes get curled into spaghetti pretty fast… Thanks to your suggestion, I have amended calendar_location to return a single element (the same as twitter_location and fixed_location pipes). That way, date sorting after merging is not needed, as there will be at most one element for each priority value in the result set. Date sorting is implemented for each subpipe, before truncation (where needed) is performed.
Nada que ver, pero mira qué alegría 🙂
¡La hostia, que offtopic más cojonudo! Gracias, gracias, gracias, gracias.
Hola:
navegando por la red he visto tu blog, me he parado para descansar y lo he explorado, me gusta mucho. Ahora continuo mi viaje. Cuando quieras ven a ver mi blog.
Ciao.
Un placer tenerte por aquí. ¿Me permitirás usar tus lecciones de español al revés? Me sorprende que haya tan pocos españoles que hablen italiano, con lo rentable que resulta: curva de aprendizaje razonablemente plana y grandes posibilidades de comunicación a un salto de distancia.