Greasemonkey and microformats (3)

Welcome to the third installment in this series about microformats and Greasemonkey! In the last episode, after failing spectacularly at providing a not too long rant on the subject at hand (and even omitting completely the expected Greasemonkey bit), I boldly introduced a barebones measurement unit microformat proposal. Today, after deftly inserting another split infinitive just for the sake of it, I’ll tweak it a bit and leave it running full speed on a collision course with an unsuspecting Greasemonkey script. Let’s see what happens!

A microformat shouldn’t be a solution desperately in need of a problem (well, I suppose that could be said of any half-successful technology out there); our tiny measurement unit microformat is not. I, for a fact, coming from a southwestern-european corner without any significant metrology history before the 1800s, have a pretty tough time grokking those strange “Imperial” or “customary” units with a glorious history of intellectual conquest. Intellectual indeed should be, to come up in a snap with how many inches to the mile there are (and just which mile, to be sure?). That mental exercising must do wonders for intelligence. But what has technology in store for me and other zero-carriers and comma-displacers of the world?

In a more serious mood, it would be quite an advantage to have our browsers displaying automatic conversions for units. Microformats are the right tool for the job, without doubts. Perhaps my proposal for a measurement unit microformat won’t quite cut it, but as simple implementations go, it’s quite powerful —with a single modification. Let’s add a target unit to the class attribute, and let’s illustrate it with a verse taken from the classic hit song Route 66:

[...] More than <abbr class="unit mi km" title="2000">two thousand miles</abbr> all the way [...]

Just how long is that? Well, I for one know that a (statute) mile is somewhat around 1.6 kilometres long. The fine author (not Bobby Troup, the hypothetical microformat author) has provided an additional unit name to the class attribute of <abbr>:

unit [source_unit] [target_unit]

Mental calculation ability aside, wouldn’t be nice to have Firefox render that code snippet as:

[…] More than two thousand miles* all the way […]

Here is a simple Greasemonkey script (please, refer to the nice documentation on how to install it) able to do just that: parse a web page for measurement units and provide an alternate display for them, as specified by the page author, in a non-intrusive way.

// ==UserScript==
// @name           UnitFormat
// @namespace      http://brucknerite.net
// @description    Unit microformat processing and conversion.
// ==/UserScript==

// License:
//
// Copyright (c) 2007 Ivan Rivera
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
// 
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
// 
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.

// Select all <abbr> elements with an "unit" class using XPath
var allUnits = document.evaluate(
    '//abbr[contains(@class,"unit")]',
    document,
    null,
    XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
    null);
// Send everyone of them off to Google for conversion
for (var i = 0; i < allUnits.snapshotLength; i++) {
    queryGoogle(allUnits.snapshotItem(i));
}

/**
 * Queries Google via xmlHttpRequest for unit conversions. Origin and 
 * destination unit are specified in the class attribute of <abbr> elements 
 * with the following syntax:
 *      class="unit [origin] [destination]"
 * where origin and destination are mutually interchangeable units in a format 
 * acceptable to Google Calc (whichever that means). The response is handled by 
 * processResponseData(unitElem, html, altUnit). Any kind of error (parsing,
 * connectivity or otherwise) should result in the function returning without
 * side effects.
 *
 * @param unitElem  <abbr> DOM element representing a unit microformat.
 */
function queryGoogle(unitElem) {
    var value = unitElem.getAttribute('title');
    var classes = unitElem.getAttribute('class').split(' ');
    if (classes.length < 3) {
        return;
    }
    var unit = classes[1];
    var alt = classes[2];
    GM_xmlhttpRequest({
        method: 'GET',
        url: 'http://www.google.com/search?q=' + value + '+' + escape(unit) + 
            '+in+' + escape(alt),
        headers: {
            'User-agent': 'Mozilla/4.0 (compatible) Greasemonkey'
        },
        onload: function(response) {
            processResponseData(unitElem, response.responseText, alt);
        }
    });
}

/**
 * Parses Google Calc's responses and creates an element containing the 
 * conversion in the form of an asterisk adjacent to the original unit
 * microformat. The generated HTML is unit-microformat compliant in itself,
 * but contains no target unit:
 *      <abbr title="[destination_unit_value]" class="unit [destination_unit]">
 *          <a title="[destination_unit_value] [destination_unit]">*</a>
 *      </abbr>
 *
 * @param unitElem  <abbr> DOM element representing a unit microformat.
 * @param html      HTML text with Google Calc's response to process.
 * @param altUnit   Destination unit name.
 */
function processResponseData(unitElem, html, altUnit) {
    var re = new RegExp(' = .+</b></td>', 'g');
    var matches = html.match(re);
    if (matches && matches.length > 0) {
        var value = parseHtmlNumber(matches[0]);
        var resultWithUnit = matches[0].split('</b>')[0];
        var badge = document.createElement('sup');
        badge.innerHTML = '<abbr title="' + value + '" class="unit ' + altUnit +
            '"><a title="' + value + ' ' + altUnit + '">*</a></abbr>';
        unitElem.parentNode.insertBefore(badge, unitElem.nextSibling);
    }
}

/**
 * Takes a number expressed as HTML text and returns the corresponding floating 
 * point number Parsing assumes a single number, maybe with presentational HTML 
 * interspersed, perhaps containing some power of ten. Not expecting more than 
 * one separator sign (decimal, maybe "," or ".").
 *
 * @param html  HTML text containing a number.
 * @return      Parsed floating point number.
 */
function parseHtmlNumber(html) {
    var mantissa = 0;
    var exponent = 0;
    // Let's deal with the exponent first
    if (html.indexOf('<sup>') > -1) {
        supSplit = html.split('<sup>');
        exponent = parseInt(supSplit[1]);
        html = supSplit[0];
        if (isNaN(exponent)) {
            exponent = 0;
        }
    }
    // Follow up with the mantissa
    html = html.replace(/<[^<>]+> ?/g, '')
        .replace(/[^0-9,.-]/g, ' ')
        .replace(/,/g, '.');
    mantissa = parseFloat(html);
    if (isNaN(mantissa)) {
        mantissa = 0;
    }
    return mantissa * Math.pow(10, exponent);
}

This link to the UnitFormat script for your convenience.

The code is rather lousy on the edges, chiefly around parseHtmlNumber(html) function (regexes, yuck!), but seems to work for plenty of cases. A particular use for this dawned on me a millisecond after testing the first version of the script. Can you spot it? Until next time!


Comentarios

Comenta en el blog con tu perfil en el Fediverso, simplemente contestando al post correspondiente del perfil @blog@brucknerite.net.

4 respuestas a «Greasemonkey and microformats (3)»

  1. Avatar de Mortimer

    Hello,

    nice script. I also made a conversion script, but for currencies, that doesn’t really rely on a microformat to catch the currencies.

    I was wondering about doing one for the measurements, but yours looks pretty cool 😉

    I also had problems with the number parsing, like how to know if 10,333 is 10333 or 10.333 as the , has a different meaning in different locales…

    Anyway, I was wondering why the microformat specify the DESTINATION conversion? What if I am a user and see a liquid measure in cups on a page and want to know the conversion in litres but the authors thinks oz is more appropriate?

  2. Thank you very much. It’s nice to know someone’s listening, and nicer whenever it’s possible to confront different approaches to a quite similar problem.

    As you see, relying on a microformat frees the developer from having (much) trouble with parsing. You don’t have to look for patterns in your content, instead marking up whatever might make sense —from £10 to ten quid.

    As for the target unit: my intention in specifying the microformat was to make it optional. It isn’t really needed if your consumer is smart enough to suggest an unit or let the user choose among several consistent alternatives. However, my own parser is, as you’ve noticed, rather dumb. Letting the author specify target units is just a shortcut for the fact that a generalized unit parser is quite a programming feat; it requires an understanding of SI base and derived units, their relationships to customary and Imperial and an algebraic expression parser with factoring and simplification abilities, at least.

    The best implementation I know of such an unit parser is the one on HP48 calculator series (and more advanced models). Years ago I struggled to extend it (in its native language, a derivative of Forth by the name System RPL) to process logarithmic units —decibels in various flavours. Incidentally, my tiny parser doesn’t support decibel conversions (i.e.: from W/m^2 to dB(A)) as Google Calculator developer team does not include any electronic engineer. Whoops, just kidding: I really don’t know…

    But I digress. I am thinking of a further version of the microformat, supporting an indefinite list of alternative units just to partially cover the use case you noticed. There’s some thinking to do concerning UI of results: I like them to be as non-intrusive as possible, and a list of possible conversions might not foot the bill. Food for thought, in any case.

  3. Avatar de duracell

    Great work guys!

    I recently was playing around with measures while trying to mock up an hRecipe, including a conversion script.

    < abbr class="hmeasure" title="180 grams" >180g< /abbr >

    is what I used, but I see there is duplication, although the content of the tag could be “large cup”, the data would remain intact as a measure and a unit.

    Example.

    I too see an issue with stating a target measure, not that it’s a bad idea I think the browser should know what culture and unit measure it’s currently in and the browser should provide the cultural context of the end user. It certainly cannot be guessed at from a far.

    I wonder why the browser cannot detect measures for us and offer a conversion without microformats? Perhaps a greasymonkey to detect measures and convert them according to what the locale of the user is. The broswer really is the best place for the conversion to occur for sure.

  4. Thank you! I’m afraid generic measurement unit parsing is too much of a DWIM problem for a browser to handle without microformats help: that’s why I believe an unit microformat would be a powerful addition to authoring platforms (as a side effect, my Greasemonkey script to convert measures can be kept maximally simple!)

    However, I am not so sure that cultural context should be enough to identify target units in conversions. There are rather clear cut cases: comparable units like yards and metres may be easily handled just by looking at the browser’s default locale. Other cases might be more difficult: you may specify 5 fl oz. in a recipe, and my locale (es) might be enough to have that automagically converted to litres (about 0.1478). Not too significant a number, perhaps it would be better to return decilitres or centilitres. Do I want your acres in square metres or in hectares (it’s just comma shifting, but…)? If I talk about degrees Celsius, you may want to have them converted into Fahrenheit or Kelvin.

    Finally, there are definitely non-cultural cases: Joules may be converted into ergs, calories, watts per hour or even electron-volts! Specifying target units may be a shortcut, but I believe it’s an acceptable compromise.