Semantics Anno 2008

There is a trend that opposites meet and converge to become hybrids.

Single-paradigm programming languages become multi-paradigm programming languages [static meets dynamic, procedural meets declarative, etc]

Devices become Swiss knife equivalents: a mobile phone that is also PC, a TV, a radio, etc.

Distinctions of off-line vs on-line is blurred. Sometimes online, sometimes locally connected via Peer to Peer, sometimes globally connected to the Internet

The operating system stretches out beyond the browser and becomes a platform for interacting with distributed services; it doesn’t matter as much where the service is. On the other hand, the browser and what we associate with the premiere “human” interface to the Web, aquire runtimes to enable Web applications to run at near-native speeds and with rich graphical user-interfaces and off-line caching.

This last observation is of particular interest here, as we see Web sites and services stretch out to the desktop and vice versa, we’ll start to see a semantic enrichment on both ends.

Twine[1] enriches the Web with semantic information harvested from different sites. It recognizes entities embedded in HTML templates, harvests data from these template instances and puts it into an internal semantic graph based on a custom ontology.

This approach is extremely promising as it amounts to a potentially massive reverse engineering of a data-rich but data-hidden Web. The Web is data-driven, but the data is locked-up in back-ends, only sometimes available via Web APIs.

There is enormous potential to push data onto the Web from desktops as well (and vice versa). The ideal is to have a rich desktop UX while not relying on a desktop environment for complete data storage. The desktop can harvest metadata from files and dispatch these onto the Web.

A note of interest here is that Google has begun accepting arbitrary data[2]. It allows anyone to submit data to it. This is a pretty significant event, as it signals the beginning of a transition where Google goes from simple keyword search to semantic search. It too must or should apply the Twine approach to semantic harvesting – if not already. There’s no knowing what it does internally already.

The concept of metadata becomes more and more overloaded. The emphasis of the definition of metadata, data about data, should be “data transform“; because really, what is craved for, is a uniform abstract syntax for expressing data and a uniform semantics for interpreting it.

Microsofts Language Integrated Query (LINQ)[3] also plays a part in unification. It does not force data expression into a uniform syntax per se, rather it dynamically transforms data from one form to another. It is possible to freely add new providers to LINQ and thus extend the semantic reach of LINQ to new domains, new data bases and services.

Data in a particular domain may have an optimal or perceived optimal encoding of this data. However with the advent of the XML Information Set (Infoset) abstract syntax for XML and the optimized binary encoding of the Infoset, it is now possible, more than ever, to stray away from domain-specific encodings in favor of uniform syntactic processing.

Microformats, as I’ve often bashed, is an attempt to hardwire domain specific syntax into the encoding level. This is a mistake. It is a failure of editors that they are not able to facilitate domain-specific syntactic interaction on top of a general and abstract domain-independent encoding. A domain-specific encoding is not as extensible as a generic one and will logically not allow annotation and meshing with other kinds of data to the same extent.

The ends will meet. Data will transform statically into the same encoding but query will also become domain-agnostic and sit on top of domain-specific models or semi-domain specific models, such as the textual encoding of XML, which arguably is not well suited for all kinds of data, with its spatial profile.

One can hope that the next year will deliver a larger semantic harvest then ever before. As a Twine user, I hope it will stretch out its arms in many new directions and enrich its presentation beyond list views.


  1. Twine
  2. Google Base
  3. Language Integrated Query (LINQ)

About xosfaere

Software Developer
This entry was posted in Declarative, Infoset, Operating System, Semantic Graph and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s