Greasemonkey and microformats (5)

Leaving so soon? I was hoping to interest you on yet another release of my famous series Greasemonkey and Microformats! No, wait! Please, oh please, just a little more…

Well, for the few of you still there, here I am again at it. Today, let’s talk about something I fancy calling microformat lifecycle. It’s easy enough: you and a dozen more geeks gather around a table to despise all committee-designed monstrosities of the world and come up with a shiny new microformat. After 34 meetings. All is well and parsers flourish, but for the newfangled microformat to gain any traction, there must be some content producers slapping it into their, well, whatever is that they might be producing. Or, as Web 2.0 3.0 advocates would rather say, ‘semantically enhancing their creations’.

The key to microformats adoption would be then, to bootstrap an healthy ecosystem of both producers and consumers. Live examples can be found for the best example in established microformats: hCard. A vehicle for contact details, hCard has seen quite an adoption for several reasons:

  • It’s a semantic HTML implementation of an already established standard: RFC 2426.
  • There are readily available consumers for it. The Operator Firefox extension, which makes hCard embedded data available as vCards readable by most contact management applications, among other interesting features.
  • There are several hCard producers, the simplest one being hCard Creator. A wide sampling of the hCard ecosystem can be found in the hCard implementations page.

A main concern of microformat supporters ought to be helping the non-HTML literate masses of the world to add semantic information to their markup. But there’s a catch-22 in there, as ‘non-HTML literate masses’, as per their definition, have at best a hazy understanding of markup, and well intended pages like this hCard authoring guide aren’t to be of much help but for the most dedicated of hobbyists (well, us geeks can do with just a formal hCard description, XMDP-way, can’t we?)

As a matter of example, I’m putting my money where my mouth is and release a Greasemonkey script that adds measurement unit microformat support to Blogger‘s new post (and edit post) pages. It’s creatively called InsertUnit, and after installation (just point your Greasemonkey enabled Firefox browser to the previous link, thank you) will add a button to the Edit Html editor tab just for your measurement unit microformatting pleasure. To illustrate its workflow here are some screenshots. Who doesn’t like screenshots?

  1. First of all, select some text to denote as a measurement unit. You may skip this step and just press the damn button, already.
  2. That’s it: press the uF button. I should have come up with a better glyph, but that one (as ‘microformat’ abbreviation) is way economical, byte-wise (and not an image, if I may add).
  3. Here you can see the microformat description dialog. Any text selected in step 1 will appear into the corresponding text box, saving you some keystrokes and thus helping you avoid those dreaded RSIs.
  4. After filling the rest of the text boxes (you’ll need a scalar value for the measurement, a name for the unit —to Google’s liking— and a name for a suggested target unit —same way) you may click on the Test conversion link to do just that. If an error shows up, adjust your unit names until you get the desired result. There is no documentation on expected unit names, but Google did much work to keep it a matter of common sense.
  5. And here is the result after tapping on the OK button, in all its semantic HTML glory. Nifty, isn’t it?

Before calling it quits for the day, two remarks: you are not forced to specify a target unit name (but you should state at least a value and a unit name; the OK button doesn’t enable itself until after you’ve done so). You might be interested in marking up your units, but not keen on any particular conversion. InsertUnit is a good match with UnitFormat (as described in the third part of this post series), but by no means is a required match. You might come up with a better parser, for instance (hint, hint) one that is able to suggest target units —out of the blue, or reading from some preferences file.

Last one: now you can get this script and its natural fit from userscripts.org, here: UnitFormat (userscripts.org) and here: InsertUnit (userscripts.org).

Greasemonkey and microformats (4)

Once more unto the breach, dear friends! In this third fourth (already?) installment of Microformats and Greasemonkey, we’ll wonder in sheer amazement at the unifying power of our two leading subjects when carelessly wielded against mounds of unstructured ignorance. How’s that for a beginning?

The previous post ended abruptly in a cliffhanger-wannabe fashion, suggesting great things to come from the UnitFormat Greasemonkey script. While it won’t regrow your lost hair or help you make fame and fortune, there are pesky measurement unit conversion tasks it can handle for you. Let’s throw an example.

Suppose you are publishing information about your wonderful hybrid car fuel consumption habits. You should know for a fact that the usual way to do that differs fundamentally between the UK (and the USA, for that matter) and the rest of the world. I may be thinking litres per 100 kilometres, while you are used to miles per gallon. Being a semantically conscious blogger, you may annotate your data this way:

My car does usually 
<abbr title="48" class="unit mi/gal l/100km">48 miles per gallon</abbr>

Live, it would look this way: 48 miles per gallon. With UnitFormat installed, you’d see this:

48 miles per gallon*

Note: if you have already installed UnitFormat, you ought to see two asterisks. That’s all right.

This is the generated HTML code. Note that it’s in the form of another measurement unit microformat, so screen readers and other tools may have a share of the benefits: it lacks a target unit, nevertheless.

<abbr title="48" class="unit mi/gal l/100km">48 miles per gallon</abbr><sup><abbr title="4.90030384" class="unit l/100km"><a title="4.90030384 l/100km">*</a></abbr></sup>

This is a tad more difficult than just converting feet to centimetres, but by now I’d bet you’ve already realised something far more interesting. Second example:

I paid <abbr title="23220" class="unit USD EUR">$23,200</abbr> for it.

Guess what? Google Calc handles the conversion all right:

I paid $23,220* for it.

Online currency conversions at your fingertips! But wait, there’s more. Third example:

I'm expecting a lifetime average cost of 
<abbr title="0.45" class="unit USD/mi EUR/km">45 cents per mile</abbr>.

This winds up as:

I’m expecting a lifetime average cost of 45 cents per mile*.

Complex units with variable conversion factors! Google Calc let’s you specify nearly anything, but you should stay with the simplest unit names (following standards and common sense whenever possible). Composite units should not have spaces in their names (but /, * and ^ are explicitly allowed, so units like m/s^2 are possible). Aren’t microformats powerful, and isn’t life great?

Greasemonkey and microformats (3)

Welcome to the third installment in this series about microformats and Greasemonkey! In the last episode, after failing spectacularly at providing a not too long rant on the subject at hand (and even omitting completely the expected Greasemonkey bit), I boldly introduced a barebones measurement unit microformat proposal. Today, after deftly inserting another split infinitive just for the sake of it, I’ll tweak it a bit and leave it running full speed on a collision course with an unsuspecting Greasemonkey script. Let’s see what happens!

A microformat shouldn’t be a solution desperately in need of a problem (well, I suppose that could be said of any half-successful technology out there); our tiny measurement unit microformat is not. I, for a fact, coming from a southwestern-european corner without any significant metrology history before the 1800s, have a pretty tough time grokking those strange “Imperial” or “customary” units with a glorious history of intellectual conquest. Intellectual indeed should be, to come up in a snap with how many inches to the mile there are (and just which mile, to be sure?). That mental exercising must do wonders for intelligence. But what has technology in store for me and other zero-carriers and comma-displacers of the world?

In a more serious mood, it would be quite an advantage to have our browsers displaying automatic conversions for units. Microformats are the right tool for the job, without doubts. Perhaps my proposal for a measurement unit microformat won’t quite cut it, but as simple implementations go, it’s quite powerful —with a single modification. Let’s add a target unit to the class attribute, and let’s illustrate it with a verse taken from the classic hit song Route 66:

[...] More than <abbr class="unit mi km" title="2000">two thousand miles</abbr> all the way [...]

Just how long is that? Well, I for one know that a (statute) mile is somewhat around 1.6 kilometres long. The fine author (not Bobby Troup, the hypothetical microformat author) has provided an additional unit name to the class attribute of <abbr>:

unit [source_unit] [target_unit]

Mental calculation ability aside, wouldn’t be nice to have Firefox render that code snippet as:

[…] More than two thousand miles* all the way […]

Here is a simple Greasemonkey script (please, refer to the nice documentation on how to install it) able to do just that: parse a web page for measurement units and provide an alternate display for them, as specified by the page author, in a non-intrusive way.

// ==UserScript==
// @name           UnitFormat
// @namespace      http://brucknerite.net
// @description    Unit microformat processing and conversion.
// ==/UserScript==

// License:
//
// Copyright (c) 2007 Ivan Rivera
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
// 
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
// 
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.

// Select all <abbr> elements with an "unit" class using XPath
var allUnits = document.evaluate(
    '//abbr[contains(@class,"unit")]',
    document,
    null,
    XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
    null);
// Send everyone of them off to Google for conversion
for (var i = 0; i < allUnits.snapshotLength; i++) {
    queryGoogle(allUnits.snapshotItem(i));
}

/**
 * Queries Google via xmlHttpRequest for unit conversions. Origin and 
 * destination unit are specified in the class attribute of <abbr> elements 
 * with the following syntax:
 *      class="unit [origin] [destination]"
 * where origin and destination are mutually interchangeable units in a format 
 * acceptable to Google Calc (whichever that means). The response is handled by 
 * processResponseData(unitElem, html, altUnit). Any kind of error (parsing,
 * connectivity or otherwise) should result in the function returning without
 * side effects.
 *
 * @param unitElem  <abbr> DOM element representing a unit microformat.
 */
function queryGoogle(unitElem) {
    var value = unitElem.getAttribute('title');
    var classes = unitElem.getAttribute('class').split(' ');
    if (classes.length < 3) {
        return;
    }
    var unit = classes[1];
    var alt = classes[2];
    GM_xmlhttpRequest({
        method: 'GET',
        url: 'http://www.google.com/search?q=' + value + '+' + escape(unit) + 
            '+in+' + escape(alt),
        headers: {
            'User-agent': 'Mozilla/4.0 (compatible) Greasemonkey'
        },
        onload: function(response) {
            processResponseData(unitElem, response.responseText, alt);
        }
    });
}

/**
 * Parses Google Calc's responses and creates an element containing the 
 * conversion in the form of an asterisk adjacent to the original unit
 * microformat. The generated HTML is unit-microformat compliant in itself,
 * but contains no target unit:
 *      <abbr title="[destination_unit_value]" class="unit [destination_unit]">
 *          <a title="[destination_unit_value] [destination_unit]">*</a>
 *      </abbr>
 *
 * @param unitElem  <abbr> DOM element representing a unit microformat.
 * @param html      HTML text with Google Calc's response to process.
 * @param altUnit   Destination unit name.
 */
function processResponseData(unitElem, html, altUnit) {
    var re = new RegExp(' = .+</b></td>', 'g');
    var matches = html.match(re);
    if (matches && matches.length > 0) {
        var value = parseHtmlNumber(matches[0]);
        var resultWithUnit = matches[0].split('</b>')[0];
        var badge = document.createElement('sup');
        badge.innerHTML = '<abbr title="' + value + '" class="unit ' + altUnit +
            '"><a title="' + value + ' ' + altUnit + '">*</a></abbr>';
        unitElem.parentNode.insertBefore(badge, unitElem.nextSibling);
    }
}

/**
 * Takes a number expressed as HTML text and returns the corresponding floating 
 * point number Parsing assumes a single number, maybe with presentational HTML 
 * interspersed, perhaps containing some power of ten. Not expecting more than 
 * one separator sign (decimal, maybe "," or ".").
 *
 * @param html  HTML text containing a number.
 * @return      Parsed floating point number.
 */
function parseHtmlNumber(html) {
    var mantissa = 0;
    var exponent = 0;
    // Let's deal with the exponent first
    if (html.indexOf('<sup>') > -1) {
        supSplit = html.split('<sup>');
        exponent = parseInt(supSplit[1]);
        html = supSplit[0];
        if (isNaN(exponent)) {
            exponent = 0;
        }
    }
    // Follow up with the mantissa
    html = html.replace(/<[^<>]+> ?/g, '')
        .replace(/[^0-9,.-]/g, ' ')
        .replace(/,/g, '.');
    mantissa = parseFloat(html);
    if (isNaN(mantissa)) {
        mantissa = 0;
    }
    return mantissa * Math.pow(10, exponent);
}

This link to the UnitFormat script for your convenience.

The code is rather lousy on the edges, chiefly around parseHtmlNumber(html) function (regexes, yuck!), but seems to work for plenty of cases. A particular use for this dawned on me a millisecond after testing the first version of the script. Can you spot it? Until next time!