Reference Reification

The web is based on addresses called URIs (short for Uniform Resource Identifiers; and soon Internationalized Resource Identifiers (URIs)). These are the links that weave the web we all traverse daily.

But these URIs are not atomic, they are merely treated as such in their syntactic expression. They are what is popularly known as a microformat.

So what is a microformat anyway?

A microformat is a relative term in that thing that has a format is not a microformat in and of itself, but once that format becomes injected into an atom of another format, it becomes a microformat. That is, once a format uses other formats inside itself, we can begin to talk about microformats.

The canonical example for this is XML and URIs.

<Person email=”mailto:bent@example.com”/>

As we see here, the email attribute contains structured information, but the structure is not expressed using structured XML, instead it is simply concatenated inside an attribute.

This means essentially that what we are doing is to inject a domain-specific language, in its own syntax, within a domain-neutral container format – XML. It would be possible to actually abandon the concrete URI syntax and encode the abstract URI syntax into the XML Information Set datamodel directly!

This would have the consequence that the email property in our above example wil have to be reformulated as an element instead; something like

<Person>
<email>
<URI>
<scheme>mailto</scheme>
…..
</URI>
</email>
</Person>

It is worth noticing how the compact concrete URI syntax makes the markup shorter and perhaps easier to read, but it also means we loose one of the benefits of XML – its ability to encode information genercally using structured markup so we can more easily parse the content.

It is clear to me that the reason XML is not used in a more structured way and we have all the microformat mess is simply because XML is a human-readable verbose format. If it were to become more verbose, it would become less readable – to humans.

The upcomming Infoset encoding, the Efficient XML Interchange format (EXI) may change this game and move us away from the microformat game because it will then be able to create a compact and uniformly structured encoding of URIs inside the other data.

Now the URI may not be the ideal example of this, but as a very used identifier string (which is useful of course) and as a structured value as well, it does have some usefulness as an example in this regard.

The SVG path attribute is a better example. As machines are supposed to generate SVG, we should not care about human-readability – if we want readability we create interactive visualizations of the data (text or a drawing).

The Resource Description Framework (RDF) from the W3C has a concept which is somewhat related to this concept of unification: Reification.

In RDF one can create statements about statements. Normally a statement is the quintessential “primitive” of RDF, but what if you want to say something about a statement; you can choose to express the statement via other statements, thereby reifying it.

I’m not saying reification and unification as shown here is always the right thing to do, but I think the world should move more away from microformats and towards unified abstract data models. If the URI is parsed into XML Information Items, then the job of the URI constructor is trivial.

As an identfier a URI may still be valuable in its canonical concrete form, but software could also create this concrete form from the structured form if necessary. It’s more cumbersome the other way around.

Should you want to create an XML or RDFS/OWL datamodel for URIs, here’s the concrete syntax for them, expressed in Backus-Naur form.

Appendix A. Collected ABNF for URI

URI = scheme “:” hier-part [ “?” query ] [ “#” fragment ]

hier-part = “//” authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty

URI-reference = URI / relative-ref

absolute-URI = scheme “:” hier-part [ “?” query ]

relative-ref = relative-part [ “?” query ] [ “#” fragment ]

relative-part = “//” authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty

scheme = ALPHA *( ALPHA / DIGIT / “+” / “-” / “.” )

authority = [ userinfo “@” ] host [ “:” port ]
userinfo = *( unreserved / pct-encoded / sub-delims / “:” )
host = IP-literal / IPv4address / reg-name
port = *DIGIT

IP-literal = “[” ( IPv6address / IPvFuture ) “]”

IPvFuture = “v” 1*HEXDIG “.” 1*( unreserved / sub-delims / “:” )

IPv6address = 6( h16 “:” ) ls32
/ “::” 5( h16 “:” ) ls32
/ [ h16 ] “::” 4( h16 “:” ) ls32
/ [ *1( h16 “:” ) h16 ] “::” 3( h16 “:” ) ls32
/ [ *2( h16 “:” ) h16 ] “::” 2( h16 “:” ) ls32
/ [ *3( h16 “:” ) h16 ] “::” h16 “:” ls32
/ [ *4( h16 “:” ) h16 ] “::” ls32
/ [ *5( h16 “:” ) h16 ] “::” h16
/ [ *6( h16 “:” ) h16 ] “::”

h16 = 1*4HEXDIG
ls32 = ( h16 “:” h16 ) / IPv4address
IPv4address = dec-octet “.” dec-octet “.” dec-octet “.” dec-octet

dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ “1” 2DIGIT ; 100-199
/ “2” %x30-34 DIGIT ; 200-249
/ “25” %x30-35 ; 250-255

reg-name = *( unreserved / pct-encoded / sub-delims )

path = path-abempty ; begins with “/” or is empty
/ path-absolute ; begins with “/” but not “//”
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters

path-abempty = *( “/” segment )
path-absolute = “/” [ segment-nz *( “/” segment ) ]
path-noscheme = segment-nz-nc *( “/” segment )
path-rootless = segment-nz *( “/” segment )
path-empty = 0

segment = *pchar
segment-nz = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / “@” )
; non-zero-length segment without any colon “:”

pchar = unreserved / pct-encoded / sub-delims / “:” / “@”

query = *( pchar / “/” / “?” )

fragment = *( pchar / “/” / “?” )

pct-encoded = “%” HEXDIG HEXDIG

unreserved = ALPHA / DIGIT / “-” / “.” / “_” / “~”
reserved = gen-delims / sub-delims
gen-delims = “:” / “/” / “?” / “#” / “[” / “]” / “@”
sub-delims = “!” / “$” / “&” / “‘” / “(” / “)”
/ “*” / “+” / “,” / “;” / “=”

See also

Advertisements

About xosfaere

Software Developer
This entry was posted in Datamodel, Declarative, Infoset, RDF/RDFS/OWL, Semantic Graph, Uncategorized and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s