Partager via


XSD, RELAX NG and Why We Didn't Ship System.Xml.IXmlType

Tim Bray has a post entitled More Relax where he writes

I often caution people against relying too heavily on schema validation. “After all,” I say, “there is lots of obvious run-time checking that schemas can’t do, for example, verifying a part number.” It turns out I was wrong; with a little extra work, you can wire in part-number validation—or pretty well anything else—to RelaxNG. Elliotte Rusty Harold explains how. Further evidence, if any were required, that RelaxNG is the world’s best schema language, and that anyone who who’s using XML but not RelaxNG should be nervous.

Elliote Rusty Harold's article shows how to plug in custom datatype validation into Java RELAX NG validators. This enables one to enforce complex constraints on simple types such as such as "the content of an element is correctly spelled, as determined by consulting a dictionary file" or "the number is prime" to take examples from ERH's article. 

Early in the design of the version 2.0 of the System.Xml namespace in the .NET Framework we considered creating a System.Xml.IXmlType interface. This interface would basically represent the logic for plugging one's custom types into the XSD validation engine. After a couple of months and a number of interesting discussions between myself, Joshua and Andy we got rid of it.

There were two reasons we got rid of this functionality. The simple reason was that we didn't have much demand for this functionality. Whenever we had people complaining about the limitations of XSD validation it was usually due to its inability to define co-occurence constraints (i.e. if some element or attribute has a certain value then the expected content should be blah)  and other aspects of complex type validation than needing more finer grained simple type validation. The other reason was that the primary usage of XSD for many of our technologies is primarily as a type system not as a validation language. There's already the fact that XSD schemas are used to generate .NET Framework classes via the System.Xml.Serialization.XmlSerializer and relational tables via the System.Data.DataSet. However there were already impedence mismatches between these domains and XSD, for example if one defined a type as xs:nonNegativeInteger this constraint was honored in the generated C#/VB.NET classes created by the XmlSerializer or in the relational tables created by the DataSet. Then there was the additional wrinkle that at the time we were working on XQuery which used XSD as its typoe system and we had to factor in the fact that if people could add their own simple types we didn't just have to worry about validation but also how query operators would work on them. What would addition, multiplication or subtraction of such types mean? How would type promotion, casting or polymorphism work with some user's custom type defined outside the rules of XSD?

Eventually we scrapped the class as having too much cost for too little benefit.

This reminds me of Bob DuCharme's XML 2004 talk Documents vs. Data, Schemas vs. Schemas where he advised people on how to view RELAX NG and XSD. He advised viewing RELAX NG as a document validation language and considering XSD as a datatyping language. I tend to agree although I'd probably have injected something in there about using XSD + Schematron for document validation so one could get the best of both worlds.