Greasemonkey and microformats (2)

Welcome back! Today I am going to rant —briefly, I promise— about microformats and try to cobble together a simple one to denote measurement units. So there!

As I said in the previous post, microformats are small pieces of self-significant HTML that may be used to embed automatically parseable information in a web page. They are just POSH, but not in the fake Port Out, Starboard Home way: any given microformat is a subset of HTML, and a significant subset (in a semantic way), at that! Let’s try to illustrate that with the simplest of microformats, rel-tag.

This microformat is so tiny it might be called a nanoformat instead. It just consists on one HTML attribute, the little known rel. Valid for links (<a> and <link> tags), rel describes the relationship of the current document to the anchor specified in the href attribute of the tag. There is a bunch of suggested possible values in Section 6.12 of the HTML 4.01 specification, and all rel-tag does, syntax-wise, is adding a value to that list: tag.

The general idea behind rel-tag is providing robots with an easy way to tag content, therefore adding some much needed common sense to searches. The tag itself can come from a variety of tag spaces, one of the most evident ones being Wikipedia. Here is a tag example (extracted from this very blog) using a self defined namespace:

<a rel="tag" href="http://brucknerite.net/search/label/javascript">javascript</a>

In the process of incubating a microformat, the first thing to do is to resist any urges at design for design’s sake. These wise words notwithstanding, I’d like to tell you of this little idea of mine: what about a microformat for measurement units? I don’t want to cheat on anybody by asserting it hasn’t been proposed before, because it has. Trouble is, discussion stopped on it without arriving at a significant consensus several months ago (more than eight, less than ten). I believe there is a real world problem to solve, and not much done in the form of in-the-wild implementations. If anything is about to gain any traction, it should be simple. Dead simple. What about this?

<abbr class="unit EUR" title="1320">1&thinsp;320&nbsp;&euro;</abbr>

It’s just a simple application of two recommended design patterns: class and abbr. The former one seems to be pretty well established, however the latter has some controversy behind it. But I wouldn’t mind having a <span> element there, to be honest. An explanation could be of some help at this point.

The <abbr> element allows, by means of its title attribute, to provide a machine parseable value for the eminently presentational string inside. For (non-vision impaired) humans, the string appears as “1 320 €”: the contents of title, being a straight string representation of the value, allows robots to skip the complexities of the human mind (and, perhaps, to read the number aloud correctly).

The class attribute provides discerning parsers with two fundamental pieces of semantic information: it’s an unit, and the unit is a currency, namely euros. As a valid HTML class literal can be nearly anything (spaces and other small quirks excluded), I’d stay with unit names acceptable to one of the more popular unit name parsers out in the Net: Google Calculator. The rationale of this proposal, and the Greasemonkey bit, to be explained in a further article. See you!

Publicado por

Iván Rivera

Another instance of Homo sapiens.