Skip to content

On choices when adding metadata to Web resources

April 2, 2009

We’ve been hearing a lot about metadata on the Web these days. Adding metadata refers to describing the content of resources on the Web, as opposed to just describing the way they should look. There are a few ways to perform content description (or annotation) – microformats, RDFa, etc. – and some confusion seems to be popping-up. I thought I would give my take on it.

Web languages as contracts

I see the world of publishing and consuming Web resources (e.g. pages) as a collection of contracts between you (the publisher) and an agent (the consumer, e.g. a Web browser). There are a few contract types:

  • Structure contracts: define what are the valid elements and the rules that allow you to structure your Web resource with such elements (e.g. a HTML page can contain a title, body, divs, tables).
  • Style contracts: describe ways to inform agents (e.g. Web browsers) on how to visually present your resource (e.g. CSS defines terms that describe the font color, the positioning of an element on the screen, etc.)
  • Behavior contracts: define a language with events and actions for you to make your content dynamic (e.g. Javascript allows you to react to a mouse over and change elements in the document.)
  • Annotation Syntax Contracts: define a set of rules that govern how to attach your content descriptions to Web resources and elements (e.g. microformats suggest you add your content description to @class attributes of HTML elements).
  • Content Vocabulary Contracts: define the preferred terms for you to use when describing your content (e.g. vCard says that the full name of a person should be described by the term FN.)

Taking into consideration these contract types, you can now pick and choose which specific contracts you will stick to in order to have your Structured Stylistic Behavioral Annotated Content. 😛 You may choose, for example, to have your personal library published on the Web by structuring it in HTML with CSS style descriptions, Javascript code for interaction, RDFa as your annotation syntax and Dublin Core as your vocabulary.

Clearing up a confusion

The confusion out there comes from the fact that some languages encompass multiple contracts. Sometimes, by “signing” one of these multi-type contracts you might limit your ability to pick another conflicting option. But that’s not the case most of the times. Web languages are usually friendly and will let you mix and match several options.

A big discussion that has been out there is the “microformats vs. RDFa”. A microformat (µF) is usually created by a community of people interested in a given area – say, calendars or business cards – to define a Content Vocabulary and very few rules for Annotation Syntax. Some of the few rules could be to define required terms – e.g. fn is required for hCard – or to inform that a certain term should appear as a child of another – latitude comes under geo. RDFa, in turn, does not define a Content Vocabulary, but focuses on a more rigid Annotation Syntax – e.g. @typeof to define the type of an entity, @property to describe an attribute of an entity. Since µFs define less and looser annotation rules than RDFa, µFs are quicker and easier to create. By the other hand, less rigidity makes µFs more ambiguous, and the coupling of Content Vocabulary and Annotation Syntax leads to the requirement of vocabulary-specific parsers and thus are less generic and scalable.

Looking at the conflict this way, it seems to me that the solution is simple: use RDFa syntax with µFs vocabularies to get the best of both worlds.

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: