Regular Expressions and the Schema Object Model
The Schema Object Model (SOM) allows full access to regular expressions specified by the World Wide Web Consortium (W3C) in the XML Schema Part 2: Datatypes Recommendation. In XML Schema, regular expressions are used to constrain a value space to values that match a specific regular expression.
The following code example creates an XML Schema that defines an SKU (Stock Keeping Unit) element that can contain only strings. These strings are further restricted because they must begin with three digits, followed by a hyphen, and must end with two letters of the alphabet.
Imports System.IO
Imports System
Imports System.Xml
Imports System.Xml.Schema
Class RegexSample
Public Shared Sub ValidationCallbackOne(sender As Object, args As ValidationEventArgs)
Console.WriteLine(args.Message)
End Sub 'ValidationCallbackOne
Public Shared Sub Main()
Try
Dim schema As New XmlSchema()
' <xs:element name="quantity">
Dim skuElem As New XmlSchemaElement()
skuElem.Name = "SKU"
schema.Items.Add(skuElem)
' <xs:simpleType name="SKU">
Dim SKUType As New XmlSchemaSimpleType()
skuElem.SchemaType = SKUType
' <xs:restriction base="xs:string">
Dim SKURestriction As New XmlSchemaSimpleTypeRestriction()
SKURestriction.BaseTypeName = New XmlQualifiedName("string", "http://www.w3.org/2001/XMLSchema")
SKUType.Content = SKURestriction
'<xs:pattern value="\d{3}-[A-Z]{2}"/>
Dim SKUpattern As New XmlSchemaPatternFacet()
SKUpattern.Value = "\d{3}-[A-Z]{2}"
SKURestriction.Facets.Add(SKUpattern)
'Compile and print to the screen.
schema.Compile(AddressOf ValidationCallbackOne)
schema.Write(Console.Out)
Catch e As Exception
Console.WriteLine(e)
End Try
End Sub
End Class
[C#]
using System.IO;
using System;
using System.Xml;
using System.Xml.Schema;
class RegexSample {
public static void ValidationCallbackOne(object sender, ValidationEventArgs args) {
Console.WriteLine(args.Message);
}
public static void Main() {
try{
XmlSchema schema = new XmlSchema();
// <xs:element name="quantity">
XmlSchemaElement skuElem = new XmlSchemaElement();
skuElem.Name = "SKU";
schema.Items.Add(skuElem);
// <xs:simpleType name="SKU">
XmlSchemaSimpleType SKUType = new XmlSchemaSimpleType();
skuElem.SchemaType = SKUType;
// <xs:restriction base="xs:string">
XmlSchemaSimpleTypeRestriction SKURestriction =
new XmlSchemaSimpleTypeRestriction();
SKURestriction.BaseTypeName =
new XmlQualifiedName("string", "http://www.w3.org/2001/XMLSchema");
SKUType.Content = SKURestriction;
// <xs:pattern value="\d{3}-[A-Z]{2}"/>
XmlSchemaPatternFacet SKUpattern = new XmlSchemaPatternFacet();
SKUpattern.Value = "\\d{3}-[A-Z]{2}";
SKURestriction.Facets.Add(SKUpattern);
// Compile and print to the screen.
schema.Compile(new ValidationEventHandler(ValidationCallbackOne));
schema.Write(Console.Out);
}catch(Exception e){
Console.WriteLine(e);
}
}
}
The following output, an XML Schema, is generated by the preceding code example.
<?xml version="1.0" encoding="IBM437"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="SKU">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="\d{3}-[A-Z]{2}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
Invalid Regular Expression Pattern Restrictions
It is possible to create a pattern that restricts a type in an invalid manner.
For example, the format of the dateTime type is CCYY-MM-DDThh:mm:ss
where CC
represents the century, YY
the year, MM
the month, and DD
the day, preceded by an optional leading negative (-) sign to indicate a negative number. If the sign is omitted, positive (+) is assumed. The letter T
is the date/time separator and hh
, mm
, and ss
represent the hour, minute, and second respectively.
The following XML Schema example attempts to restrict the dateTime type so that it resembles the time type.
<xs:simpleType name="time">
<xs:restriction base="xs:dateTime">
<xs:pattern value="\d\d:\d\d(:\d\d)?" />
</xs:restriction>
</xs:simpleType>
The restriction in the preceding example is invalid because it does not create a subset of the dateTime type. Instead it creates a new and different type with a set of potential values that does not overlap with the set of potential values of the dateTime type.
**Note **The processor will not warn you if you create an invalid pattern, however it will fail to validate all instance documents that contain content that does not match the base of the restricted type.
For more information about Regular Expressions and XML Schemas, see the appendices of the W3C XML Schema Recommendation on Datatypes, located at http://www.w3.org/TR/xmlschema-2/\#regexs.