The Internet is in flux. Protocols emerge and die. Information emerges and vanishes. In this world of change, how can we say anything about anything in any useful way, if we cannot relate to representations that can be trusted over time?
There are many protocols for finding and identifying things in the world. HTTP, FTP, P2P, etc. The principal way of identifying things on the Internet is by using Uniform Resource Identifiers (URI) or Internationalized Resource Identifiers (IRI). These have schemes defined, typically for various representation access protocols such as HTTP, FTP and such. There are also representation independent schemes such as the URN subspace of URIs.
The URN subspace can be used to express things detached from a particular access protocol. For example one might say
urn:isbn:<my book isbn number>
urn:ssn:<my social security number>
The two above URI/URN schemes can be used to find information regardless of protocol.
A more precise way to find information is to uniquely name the information representation. This is what hash functions do. They map between some sequence of bits and some hash code of lesser length but with sufficient length to reasonably ensure uniqueness in the real world.
Peer to Peer (P2P) networks use hash codes to uniquely name representations as it would otherwise be near impossible to create a coherent representation by downloading from multiple sources at once, simply because each file (and each part of the file) could be named arbitrarily. What data does “my book name” refer to? What if two books are called the same? Sure, we could use ISBN’s, but they are bound to books and so not generic.
By coupling URIs and hash codes, we can uniquely identify data.
urn:sha1:<my hash code for my file>
Hash codes can also be used to conceil information by obscuring the real identity of it to anyone other than the owner. For example
urn:[field:ssn],[sha1:<hash of ssn>]
urn:[field:password],[sha1:<hash of password>]
Apart of obscurement, it also has the natural benefit of relative uniqueness and compression, as the name is shorter than the bits it represents. It may be seen as lossy compression for the purpose of naming – we can compress any data to the point where we mostly can’t reverse engineer the data, but we can reasonably trust the uniqueness of the name with respect to the data it identifies.
This is a crucial benefit for the Semantic Web. It enables one to say things about other data, without binding oneself to a particular access protocol or mechanism and yet future search engines can use hash codes to find any data on any namespace it knows of, including ones invented in the future.
In other words: it’s future proof. – To a point.
The precise hashing mechanism may change over time, but the benefit of hashing is that one can still identify the hashing mechanism via name.
One might even use a hash code to refer to the implementation used to hash
The hash code of the implementation would be hashed using the implementation itself of course! The only problem here is that the exact identity of this implementation is dependent on the implementation itself – to some extent.
I’m somewhat disguisted by names such as
The idea of this HTTP URI is that it can be used to fetch the definition of this term via the protocol used to express the name of the term. That’s beneficial of course. But it also ties the name to the current HTTP protocol. It’s only disguisting in retrospect and not completely so.
Then for each protocol, a search engine should be used to look up the protocol specific address of that data – or indeed even the latest cached representation.
A potential problem, is of course that the expression of the meaning of the term itself, is dependent on some representation language such as RDF/XML. But this is to be expected and hopefully RDF/XML (or better: RDF/EXI (or better still: A triple variant of RDF expressed in EXI)) will long outlive RDF/XML.
Of course old names can be mapped to new names via semantics. So one can say
<urn:data:sha1:<hash>> <same-as/hash-of> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
Search engines of the future may also be used to find and transcode information – even to transparently map the old web to the new web so it’s possible to switch from HTTP to some other protocol more or less seamlessly (ahemn).
So for future proofing data names, there can be only one: Use hash codes, but there are ways to deal with oldschool names.