1 – Extensible Markup Language (XML)
For the past couple of years I’ve been following the development of languages and datamodels at the W3C and elsewhere. It all began with XML which, at the time, was a bit mindblowing for the flexibility it provided. It allowed one to now worry so much about future extensions of domain languages – one would simply extend them later, with no effect on existing software. The existing software would just read the XML elements and attributes it knew, and use them as it always had, whilst newer software would use additional elements and attributes. XML Schema and other schema languages restrict what is possible or meaningful in a domain, which is useful, but should be used with care so as to not preclude future scenarios. One might say: as strict as necessary, but no stricter. This rule of thumb may look familiar for those who program using Design by Contract, where the reasoning goes: require as little as possible but provide as much as possible. The reason being that the fewer requirements a function has and the more guarantees it makes about it’s results, the more useful it becomes – useful in terms of cases where it can be applied and useful in terms of properties it guarantees.
One of the premier debates in the XML community has been the simple question: element or attribute? What to use, and when to use it. Some have proposed an attribute-fre dialect of XML which increases the monotonicity of XML, making it a simpler language and eliminating a path of expression. The problem with attributes are that they are not as flexible as elements. An attribute cannot be extended, it isn’t extensible, and hence not as much in the spirit of XML as elements – a contradiction in terms perhaps, but nonetheless, elements are more in the spirit of the language. Attributes are used for simple things like stating numeric properties.
In recent years, attributes have also been used to store structured information which in term has been the reason for coining the term microformats. It is to XML what SQL is to C# – an opaque string which it has no model of.
There is no particular reason why these cannot be encoded in an element, but there is also no particular reason why they should be, because in this case, they are unstructured. On the other hand, one might do a deep encoding of them in element syntax
<p x=”0″ y=”0″/>
<p x=”1″ y=”0″/>
<p x=”1″ y=”1″/>
<p x=”0″ y=”1″/>
XML is intrinsically a tree language, where elements can only have one parent element and there’s always a root element of all other elements.
So XML has a number of restrictions, limitations and oddities
1 – it has a root
2 – it only understands trees
3 – it has both elements and attributes
4 – it isn’t possible to merge arbitrary XML documents in a meaningful way
As to point 2, remember that even though you can express graphs in XML in some domain model. The XML metamodel (the XML Information Set (infoset)) does not have any intrinsic knowledge of graphs.
Post XML 1.0, namespaces were introduced, this allowed, not merging capabilities, but the ability to distinguish elements and attributes based on global names. Obviously it is thinkable that more than one person thinks up the idea of defining books in XML and obviously more than one person will use the name book, but quite possibly, books will be modelled differently, using a different substructure and possible different subnames. Therefore namespaced elements (e.g. prefixed elements), introduce an extra qualification to elements (hence Qnames) that make them globally unique, assuming noone use URI references under domains and paths they do not own and control.
Next up – RDF.