rod mclaughlin

XML versus YAML

  
Mon, January 22, 2007

According to Extreme Programming, programmers always produce the simplest code necessary to solve the problem.

'What is the simplest thing that could possibly work?' (Kent Beck - Extreme Programming Explained). This is often not true of XML.
XML is eminently machine-readable, but difficult for human beings to maintain. YAML is a much simpler alternative for configuration files, communication between objects, communication of objects, and everything else. Like XML files, YAML files can be self-describing (you can tell if they contain what you expect), but are easier for humans to read. YAML doesn't describe itself as self-describing, but it can be. There is a situation in which 'the simplest thing that could possibly work' is XML - when you are communicating with the outside world, and the outside world insists on XML. Otherwise, choose YAML.

www.yaml.org

I should point out that the authors of YAML specifically deny that it is a substitute for XML. That's their opinion.

Here are six ways of writing the same information, none of them better than the others:


1. <invoice id="12345"><price>6789.10</price></invoice>

2. <invoice><id value="12345"></id><price value="6789.10"></price></invoice>

3. <invoice><id>12345</id><price>6789.10</price></invoice>

4. <element name="invoice" id="12345">
<element name ="price" value="6789.10">Ignore this</element>
</element>

5. <element><element><name>invoice</name><id>12345</id></element>
<element><price><value>6789.10></value></price></element></element>

6. <element>
<name>invoice</name>
<attribute name="id">12345</attribute>
<attribute name="price">6789.10</attribute>
</element>


"There's more than one way to do it - but only one right way"

YAML can contain anything XML can, in about 20% of the characters.

Not only IS YAML human-readable, I would encourage people to read it. You can write anything (invoices, databases, haikus) in YAML. People can easily become accustomed to the unambiguous yet poetic style YAML allows. It is said that XML doesn't need to be human-readable, as there are some good XML viewers/editors available. But this isn't true.

I digress. My point is, YAML is better for data and programs (as in SOAP and Web Services) than XML. The obvious choice of language for programs embedded in YAML is Python, since it uses indentation as structure. The only thing missing from YAML is DTDs for defining what a YAML document must contain (an XML DTD is a specification which says something like "a document which conforms to this DTD must consist of an element named 'breakfast' which contains exactly three instances of the element 'egg' which may or may not have an attribute 'cholesterol'"... etc.). A YAML equivalent of DTDs is on its way.

Taking the invoice example above, it is true that, just as in XML, there are several ways you could represent the data. But unlike in XML, there is just one blindingly obvious way to do it:

invoice: id: 12345 price: 6789.10
I find it difficult to describe YAML. It speaks for itself. For the definition, examples, etc., I can do no better than point you to www.yaml.org.

I wouldn't say XML was slow, but do you know the one about the tortoise who was mugged by two snails? The police asked him what happened, and he said "I don't know, it was all over so quickly...".

YAML documents/streams can contain
DOCS
BITS
CODE
DATA

They are therefore well-suited for 'web services' or 'applets' - code downloaded from websites and run immediately. A YAML stream can contain Python code, its initialization data, binary and text, and its documentation. A Python program 'stub' would be downloaded in YAML, and execute on the user's machine. It would communicate with the real program - its big brother - on the server in YAML as its methods were invoked - just like a .NET web service.

The only thing missing is security - both encryption and code safety. The first requirement is easily met - SSL is just one possible solution - but the second, making a Python program 'safe' in the same way a Java Applet is, would require a safe class-loader type of solution, which restricts network and disk access, etc..

PS. JSON (json.org/) is like YAML but easier for non-technical people to understand. It will probably be more successful.

Portland London