Share via


Knowing the Limitations of XML Schema Validation

I recently stumbled on blog posting by Phil Ringnalda called a little chip in the concept where he notes

Still, I was a bit surprised when Xiven linked to a post to the validator mailing list, pointing out that the utterly wrong HTML <a href=""><b><a href=""></a></b></a>, which is reported as invalid in HTML, is ignored in XHTML. Nesting links is one of those basic, there's absolutely no way you can ever do this, things, but in XHTML if you put a nested link inside an inline element, the validator won't catch it. According to Hixie's answer, it's because the validator uses an XML DTD for XHTML, and an SGML DTD for HTML, and while you can say that a/b/a is wrong in an SGML DTD, you can't in an XML DTD. As he puts it, in XHTML it's XML-valid but non-compliant.

Phil has just stumbled on just one of many limitations of XML schema languages. At first, when people see an XML schema language they expect that they will be able to use it to declaratively describe all the rules of their vocabulary. However this is rarely the case, every XML schema language has limitations in the constraints it can express. For example, W3C XML Schema can't express constraints such as a choice between attributes (either an uptime or downtime attribute appears on an element), DTDs can't express constraints on the range a text value can be (must be an integer between 5 and 10), RELAX NG can't express identity constraints on numeric values (e.g. each book in the inventory must have a unique ISBN) , and so on.

This means that developers using an XML schema language should be very careful when designing XML applications or XML vocabularies about what rules they can validate when they receive an input document. In some cases, the checks performed by schema validation may be so limited for a vocabulary that it is better to check the constraints using custom code or at the very least augment schema validation with some custom checks as well.

The fact is that many XML vocabularies are complex enough that their constraints aren't easily be expressible using a conventional XML schema language. XML vocabulary designers and developers of XML applications should always be on the look out for such cases else incorrect decisions be made in choosing a validation framework for incoming XML documents.

Comments