Greasemonkey and microformats (4)

Once more unto the breach, dear friends! In this third fourth (already?) installment of Microformats and Greasemonkey, we’ll wonder in sheer amazement at the unifying power of our two leading subjects when carelessly wielded against mounds of unstructured ignorance. How’s that for a beginning?

The previous post ended abruptly in a cliffhanger-wannabe fashion, suggesting great things to come from the UnitFormat Greasemonkey script. While it won’t regrow your lost hair or help you make fame and fortune, there are pesky measurement unit conversion tasks it can handle for you. Let’s throw an example.

Suppose you are publishing information about your wonderful hybrid car fuel consumption habits. You should know for a fact that the usual way to do that differs fundamentally between the UK (and the USA, for that matter) and the rest of the world. I may be thinking litres per 100 kilometres, while you are used to miles per gallon. Being a semantically conscious blogger, you may annotate your data this way:

My car does usually 
<abbr title="48" class="unit mi/gal l/100km">48 miles per gallon</abbr>

Live, it would look this way: 48 miles per gallon. With UnitFormat installed, you’d see this:

48 miles per gallon*

Note: if you have already installed UnitFormat, you ought to see two asterisks. That’s all right.

This is the generated HTML code. Note that it’s in the form of another measurement unit microformat, so screen readers and other tools may have a share of the benefits: it lacks a target unit, nevertheless.

<abbr title="48" class="unit mi/gal l/100km">48 miles per gallon</abbr><sup><abbr title="4.90030384" class="unit l/100km"><a title="4.90030384 l/100km">*</a></abbr></sup>

This is a tad more difficult than just converting feet to centimetres, but by now I’d bet you’ve already realised something far more interesting. Second example:

I paid <abbr title="23220" class="unit USD EUR">$23,200</abbr> for it.

Guess what? Google Calc handles the conversion all right:

I paid $23,220* for it.

Online currency conversions at your fingertips! But wait, there’s more. Third example:

I'm expecting a lifetime average cost of 
<abbr title="0.45" class="unit USD/mi EUR/km">45 cents per mile</abbr>.

This winds up as:

I’m expecting a lifetime average cost of 45 cents per mile*.

Complex units with variable conversion factors! Google Calc let’s you specify nearly anything, but you should stay with the simplest unit names (following standards and common sense whenever possible). Composite units should not have spaces in their names (but /, * and ^ are explicitly allowed, so units like m/s^2 are possible). Aren’t microformats powerful, and isn’t life great?

Greasemonkey and microformats (3)

Welcome to the third installment in this series about microformats and Greasemonkey! In the last episode, after failing spectacularly at providing a not too long rant on the subject at hand (and even omitting completely the expected Greasemonkey bit), I boldly introduced a barebones measurement unit microformat proposal. Today, after deftly inserting another split infinitive just for the sake of it, I’ll tweak it a bit and leave it running full speed on a collision course with an unsuspecting Greasemonkey script. Let’s see what happens!

A microformat shouldn’t be a solution desperately in need of a problem (well, I suppose that could be said of any half-successful technology out there); our tiny measurement unit microformat is not. I, for a fact, coming from a southwestern-european corner without any significant metrology history before the 1800s, have a pretty tough time grokking those strange “Imperial” or “customary” units with a glorious history of intellectual conquest. Intellectual indeed should be, to come up in a snap with how many inches to the mile there are (and just which mile, to be sure?). That mental exercising must do wonders for intelligence. But what has technology in store for me and other zero-carriers and comma-displacers of the world?

In a more serious mood, it would be quite an advantage to have our browsers displaying automatic conversions for units. Microformats are the right tool for the job, without doubts. Perhaps my proposal for a measurement unit microformat won’t quite cut it, but as simple implementations go, it’s quite powerful —with a single modification. Let’s add a target unit to the class attribute, and let’s illustrate it with a verse taken from the classic hit song Route 66:

[...] More than <abbr class="unit mi km" title="2000">two thousand miles</abbr> all the way [...]

Just how long is that? Well, I for one know that a (statute) mile is somewhat around 1.6 kilometres long. The fine author (not Bobby Troup, the hypothetical microformat author) has provided an additional unit name to the class attribute of <abbr>:

unit [source_unit] [target_unit]

Mental calculation ability aside, wouldn’t be nice to have Firefox render that code snippet as:

[…] More than two thousand miles* all the way […]

Here is a simple Greasemonkey script (please, refer to the nice documentation on how to install it) able to do just that: parse a web page for measurement units and provide an alternate display for them, as specified by the page author, in a non-intrusive way.

// ==UserScript==
// @name           UnitFormat
// @namespace
// @description    Unit microformat processing and conversion.
// ==/UserScript==

// License:
// Copyright (c) 2007 Ivan Rivera
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.

// Select all <abbr> elements with an "unit" class using XPath
var allUnits = document.evaluate(
// Send everyone of them off to Google for conversion
for (var i = 0; i < allUnits.snapshotLength; i++) {

 * Queries Google via xmlHttpRequest for unit conversions. Origin and 
 * destination unit are specified in the class attribute of <abbr> elements 
 * with the following syntax:
 *      class="unit [origin] [destination]"
 * where origin and destination are mutually interchangeable units in a format 
 * acceptable to Google Calc (whichever that means). The response is handled by 
 * processResponseData(unitElem, html, altUnit). Any kind of error (parsing,
 * connectivity or otherwise) should result in the function returning without
 * side effects.
 * @param unitElem  <abbr> DOM element representing a unit microformat.
function queryGoogle(unitElem) {
    var value = unitElem.getAttribute('title');
    var classes = unitElem.getAttribute('class').split(' ');
    if (classes.length < 3) {
    var unit = classes[1];
    var alt = classes[2];
        method: 'GET',
        url: '' + value + '+' + escape(unit) + 
            '+in+' + escape(alt),
        headers: {
            'User-agent': 'Mozilla/4.0 (compatible) Greasemonkey'
        onload: function(response) {
            processResponseData(unitElem, response.responseText, alt);

 * Parses Google Calc's responses and creates an element containing the 
 * conversion in the form of an asterisk adjacent to the original unit
 * microformat. The generated HTML is unit-microformat compliant in itself,
 * but contains no target unit:
 *      <abbr title="[destination_unit_value]" class="unit [destination_unit]">
 *          <a title="[destination_unit_value] [destination_unit]">*</a>
 *      </abbr>
 * @param unitElem  <abbr> DOM element representing a unit microformat.
 * @param html      HTML text with Google Calc's response to process.
 * @param altUnit   Destination unit name.
function processResponseData(unitElem, html, altUnit) {
    var re = new RegExp(' = .+</b></td>', 'g');
    var matches = html.match(re);
    if (matches && matches.length > 0) {
        var value = parseHtmlNumber(matches[0]);
        var resultWithUnit = matches[0].split('</b>')[0];
        var badge = document.createElement('sup');
        badge.innerHTML = '<abbr title="' + value + '" class="unit ' + altUnit +
            '"><a title="' + value + ' ' + altUnit + '">*</a></abbr>';
        unitElem.parentNode.insertBefore(badge, unitElem.nextSibling);

 * Takes a number expressed as HTML text and returns the corresponding floating 
 * point number Parsing assumes a single number, maybe with presentational HTML 
 * interspersed, perhaps containing some power of ten. Not expecting more than 
 * one separator sign (decimal, maybe "," or ".").
 * @param html  HTML text containing a number.
 * @return      Parsed floating point number.
function parseHtmlNumber(html) {
    var mantissa = 0;
    var exponent = 0;
    // Let's deal with the exponent first
    if (html.indexOf('<sup>') > -1) {
        supSplit = html.split('<sup>');
        exponent = parseInt(supSplit[1]);
        html = supSplit[0];
        if (isNaN(exponent)) {
            exponent = 0;
    // Follow up with the mantissa
    html = html.replace(/<[^<>]+> ?/g, '')
        .replace(/[^0-9,.-]/g, ' ')
        .replace(/,/g, '.');
    mantissa = parseFloat(html);
    if (isNaN(mantissa)) {
        mantissa = 0;
    return mantissa * Math.pow(10, exponent);

This link to the UnitFormat script for your convenience.

The code is rather lousy on the edges, chiefly around parseHtmlNumber(html) function (regexes, yuck!), but seems to work for plenty of cases. A particular use for this dawned on me a millisecond after testing the first version of the script. Can you spot it? Until next time!

Greasemonkey and microformats (2)

Welcome back! Today I am going to rant —briefly, I promise— about microformats and try to cobble together a simple one to denote measurement units. So there!

As I said in the previous post, microformats are small pieces of self-significant HTML that may be used to embed automatically parseable information in a web page. They are just POSH, but not in the fake Port Out, Starboard Home way: any given microformat is a subset of HTML, and a significant subset (in a semantic way), at that! Let’s try to illustrate that with the simplest of microformats, rel-tag.

This microformat is so tiny it might be called a nanoformat instead. It just consists on one HTML attribute, the little known rel. Valid for links (<a> and <link> tags), rel describes the relationship of the current document to the anchor specified in the href attribute of the tag. There is a bunch of suggested possible values in Section 6.12 of the HTML 4.01 specification, and all rel-tag does, syntax-wise, is adding a value to that list: tag.

The general idea behind rel-tag is providing robots with an easy way to tag content, therefore adding some much needed common sense to searches. The tag itself can come from a variety of tag spaces, one of the most evident ones being Wikipedia. Here is a tag example (extracted from this very blog) using a self defined namespace:

<a rel="tag" href="">javascript</a>

In the process of incubating a microformat, the first thing to do is to resist any urges at design for design’s sake. These wise words notwithstanding, I’d like to tell you of this little idea of mine: what about a microformat for measurement units? I don’t want to cheat on anybody by asserting it hasn’t been proposed before, because it has. Trouble is, discussion stopped on it without arriving at a significant consensus several months ago (more than eight, less than ten). I believe there is a real world problem to solve, and not much done in the form of in-the-wild implementations. If anything is about to gain any traction, it should be simple. Dead simple. What about this?

<abbr class="unit EUR" title="1320">1&thinsp;320&nbsp;&euro;</abbr>

It’s just a simple application of two recommended design patterns: class and abbr. The former one seems to be pretty well established, however the latter has some controversy behind it. But I wouldn’t mind having a <span> element there, to be honest. An explanation could be of some help at this point.

The <abbr> element allows, by means of its title attribute, to provide a machine parseable value for the eminently presentational string inside. For (non-vision impaired) humans, the string appears as “1 320 €”: the contents of title, being a straight string representation of the value, allows robots to skip the complexities of the human mind (and, perhaps, to read the number aloud correctly).

The class attribute provides discerning parsers with two fundamental pieces of semantic information: it’s an unit, and the unit is a currency, namely euros. As a valid HTML class literal can be nearly anything (spaces and other small quirks excluded), I’d stay with unit names acceptable to one of the more popular unit name parsers out in the Net: Google Calculator. The rationale of this proposal, and the Greasemonkey bit, to be explained in a further article. See you!