Share via


Importing the Microsoft Office 2003 schemas into the database

As promised earlier I'm going to give you an example using a real-world schema. I've chosen to work with the Office 2003 schemas but I'm going to limit myself to Word and Excel.

First, you need to install the schemas from here.

In the T-SQL samples below I'll always assume that the installation was performed at the default location, under "C:\Program Files\Microsoft Office 2003 Developer Resources".

First, we're going to create a collection for all the schemas necessary in order to validate a Word 2003 document. This is fairly easy. For each schema file in folder "WordprocessingML Schemas" we do the following

  1. Load the contents into an XML variable
  2. If necessary remove the xsd:unique, xsd:key, xsd:keyref nodes
  3. Remove any top level processing instruction

When this is done we concatenate all those schema documents by converting them to nvarchar(MAX), using the string concatenation, and converting the result back to XML

Finally, for all lax wildcards we replace the value of the processContents attribute to "skip" and the resulting XML can be used to create the xml schema collection.

 

DECLARE @aml XML

SELECT @aml = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\WordprocessingML Schemas\aml.xsd', SINGLE_NCLOB) as results(xmlCol)

SET @aml.modify('delete //xs:unique')

SET @aml.modify('delete /processing-instruction()')

DECLARE @w10 XML

SELECT @w10 = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\WordprocessingML Schemas\w10.xsd', SINGLE_NCLOB) as results(xmlCol)

SET @w10.modify('delete /processing-instruction()')

DECLARE @wordnetaux XML

SELECT @wordnetaux = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\WordprocessingML Schemas\wordnetaux.xsd', SINGLE_NCLOB) as results(xmlCol)

SET @wordnetaux.modify('delete /processing-instruction()')

DECLARE @xsdlib XML

SELECT @xsdlib = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\WordprocessingML Schemas\xsdlib.xsd', SINGLE_NCLOB) as results(xmlCol)

SET @xsdlib.modify('delete /processing-instruction()')

DECLARE @office XML

SELECT @office = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\WordprocessingML Schemas\office.xsd', SINGLE_NCLOB) as results(xmlCol)

SET @office.modify('delete /processing-instruction()')

DECLARE @vml XML

SELECT @vml = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\WordprocessingML Schemas\vml.xsd', SINGLE_NCLOB) as results(xmlCol)

SET @vml.modify('delete /processing-instruction()')

DECLARE @wordnet XML

SELECT @wordnet = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\WordprocessingML Schemas\wordnet.xsd', SINGLE_NCLOB) as results(xmlCol)

SET @wordnet.modify('delete /processing-instruction()')

DECLARE @xmlSchemas XML

SET @xmlSchemas = CONVERT(XML,

  CONVERT(nvarchar(MAX), @aml) +

  CONVERT(nvarchar(MAX), @w10) +

  CONVERT(nvarchar(MAX), @wordnetaux) +

  CONVERT(nvarchar(MAX), @xsdlib) +

  CONVERT(nvarchar(MAX), @office) +

  CONVERT(nvarchar(MAX), @vml) +

  CONVERT(nvarchar(MAX), @wordnet))

DECLARE @iCount int

DECLARE @i int

SET @iCount = @xmlSchemas.value('count(//@processContents[. = "lax"])','int')

SET @i = 0

WHILE (@i < @iCount)

      BEGIN

            SET @xmlSchemas.modify('replace value of (//@processContents[. = "lax"])[1] with "skip"')

            SET @i = @i + 1

      END

CREATE XML SCHEMA COLLECTION Word2003 AS @xmlSchemas

go

 

Importing the Excel schemas is a similar process, with two differences.

First, not all the schema files in folder "SpreadsheetML Schemas" are unicode. For those who are not, they must be loaded using the OPENROWSET parameter SINGLE_CLOB (as opposed to SINGLE_NCLOB).

The second difference is that one of the schemas imports another schema that's not on disk. File udc.xsd makes references to components from the Soap Envelope schema and relies on the schemaLocation attribute to provide the schema processor with the location of that schema. Unfortunately SQL Server 2005, even though it doesn't reject the schemaLocation attribute, cannot use it to retrieve a missing schema. The solution is very low-tech. It consisted in manually retrieving the schema from the web and fixing it by hand (doubling-up the single quotes for example) so it can be copy-pasted into our T-SQL script. There are probably better solutions, such as making use of the CLR to retrieve the schema directly from the web site, but for a single missing schema, it was easier to deal with it manually.

DECLARE @c XML

SELECT @c = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\c.xsd', SINGLE_CLOB) as results(xmlCol)

SET @c.modify('delete /processing-instruction()')

DECLARE @dt XML

SELECT @dt = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\dt.xsd', SINGLE_CLOB) as results(xmlCol)

SET @dt.modify('delete /processing-instruction()')

DECLARE @excel XML

SELECT @excel = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\excel.xsd', SINGLE_CLOB) as results(xmlCol)

SET @excel.modify('delete /processing-instruction()')

DECLARE @excel2003xml XML

SELECT @excel2003xml = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\excel2003xml.xsd', SINGLE_CLOB) as results(xmlCol)

SET @excel2003xml.modify('delete /processing-instruction()')

DECLARE @excelss XML

SELECT @excelss = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\excelss.xsd', SINGLE_CLOB) as results(xmlCol)

SET @excelss.modify('delete /processing-instruction()')

DECLARE @office XML

SELECT @office = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\office.xsd', SINGLE_NCLOB) as results(xmlCol)

SET @office.modify('delete /processing-instruction()')

DECLARE @rowsetschema XML

SELECT @rowsetschema = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\rowsetschema.xsd', SINGLE_CLOB) as results(xmlCol)

SET @rowsetschema.modify('delete /processing-instruction()')

DECLARE @rowsset XML

SELECT @rowsset = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\rowsset.xsd', SINGLE_CLOB) as results(xmlCol)

SET @rowsset.modify('delete /processing-instruction()')

DECLARE @schema XML

SELECT @schema = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\schema.xsd', SINGLE_CLOB) as results(xmlCol)

SET @schema.modify('delete /processing-instruction()')

DECLARE @udc XML

SELECT @udc = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\udc.xsd', SINGLE_CLOB) as results(xmlCol)

SET @udc.modify('delete /processing-instruction()')

DECLARE @udcsoap XML

SELECT @udcsoap = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\udcsoap.xsd', SINGLE_CLOB) as results(xmlCol)

SET @udcsoap.modify('delete /processing-instruction()')

DECLARE @udcxmlfile XML

SELECT @udcxmlfile = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\udcxmlfile.xsd', SINGLE_CLOB) as results(xmlCol)

SET @udcxmlfile.modify('delete /processing-instruction()')

DECLARE @vml XML

SELECT @vml = xmlCol FROM OPENROWSET(Bulk 'C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\SpreadsheetML Schemas\vml.xsd', SINGLE_NCLOB) as results(xmlCol)

SET @vml.modify('delete /processing-instruction()')

-- udc.xsd has the following import

-- <xsd:import namespace="https://schemas.xmlsoap.org/soap/envelope/" schemaLocation="https://schemas.xmlsoap.org/soap/envelope/" />

-- I retrieved the schema from the web and manually doubled up the single quotes; I also removed the big comment at the beginning.

DECLARE @soapenvelope XML

SET @soapenvelope = ' <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema"

           xmlns:tns="https://schemas.xmlsoap.org/soap/envelope/"

           targetNamespace="https://schemas.xmlsoap.org/soap/envelope/" >

    

  <!-- Envelope, header and body -->

  <xs:element name="Envelope" type="tns:Envelope" />

  <xs:complexType name="Envelope" >

    <xs:sequence>

      <xs:element ref="tns:Header" minOccurs="0" />

      <xs:element ref="tns:Body" minOccurs="1" />

      <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded" processContents="lax" />

    </xs:sequence>

    <xs:anyAttribute namespace="##other" processContents="lax" />

  </xs:complexType>

  <xs:element name="Header" type="tns:Header" />

  <xs:complexType name="Header" >

    <xs:sequence>

      <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded" processContents="lax" />

    </xs:sequence>

    <xs:anyAttribute namespace="##other" processContents="lax" />

  </xs:complexType>

 

  <xs:element name="Body" type="tns:Body" />

  <xs:complexType name="Body" >

    <xs:sequence>

      <xs:any namespace="##any" minOccurs="0" maxOccurs="unbounded" processContents="lax" />

    </xs:sequence>

    <xs:anyAttribute namespace="##any" processContents="lax" >

      <xs:annotation>

      <xs:documentation>

            Prose in the spec does not specify that attributes are allowed on the Body element

            </xs:documentation>

      </xs:annotation>

      </xs:anyAttribute>

  </xs:complexType>

      

  <!-- Global Attributes. The following attributes are intended to be usable via qualified attribute names on any complex type referencing them. -->

  <xs:attribute name="mustUnderstand" >

     <xs:simpleType>

     <xs:restriction base=''xs:boolean''>

      <xs:pattern value=''0|1'' />

      </xs:restriction>

   </xs:simpleType>

  </xs:attribute>

  <xs:attribute name="actor" type="xs:anyURI" />

  <xs:simpleType name="encodingStyle" >

    <xs:annotation>

      <xs:documentation>

      ''encodingStyle'' indicates any canonicalization conventions followed in the contents of the containing element. For example, the value ''https://schemas.xmlsoap.org/soap/encoding/'' indicates the pattern described in SOAP specification

      </xs:documentation>

      </xs:annotation>

    <xs:list itemType="xs:anyURI" />

  </xs:simpleType>

  <xs:attribute name="encodingStyle" type="tns:encodingStyle" />

  <xs:attributeGroup name="encodingStyle" >

    <xs:attribute ref="tns:encodingStyle" />

  </xs:attributeGroup>

  <xs:element name="Fault" type="tns:Fault" />

  <xs:complexType name="Fault" final="extension" >

    <xs:annotation>

      <xs:documentation>

      Fault reporting structure

      </xs:documentation>

      </xs:annotation>

    <xs:sequence>

      <xs:element name="faultcode" type="xs:QName" />

      <xs:element name="faultstring" type="xs:string" />

      <xs:element name="faultactor" type="xs:anyURI" minOccurs="0" />

      <xs:element name="detail" type="tns:detail" minOccurs="0" />

    </xs:sequence>

  </xs:complexType>

  <xs:complexType name="detail">

    <xs:sequence>

      <xs:any namespace="##any" minOccurs="0" maxOccurs="unbounded" processContents="lax" />

    </xs:sequence>

    <xs:anyAttribute namespace="##any" processContents="lax" />

  </xs:complexType>

</xs:schema>

'

SET @soapenvelope.modify('delete /processing-instruction()')

DECLARE @xmlSchemas XML

SET @xmlSchemas = CONVERT(XML,

  CONVERT(nvarchar(MAX), @c) +

  CONVERT(nvarchar(MAX), @dt) +

  CONVERT(nvarchar(MAX), @excel) +

  CONVERT(nvarchar(MAX), @excel2003xml) +

  CONVERT(nvarchar(MAX), @excelss) +

  CONVERT(nvarchar(MAX), @office) +

  CONVERT(nvarchar(MAX), @rowsetschema) +

  CONVERT(nvarchar(MAX), @rowsset) +

  CONVERT(nvarchar(MAX), @schema) +

  CONVERT(nvarchar(MAX), @udc) +

  CONVERT(nvarchar(MAX), @udcsoap) +

  CONVERT(nvarchar(MAX), @udcxmlfile) +

  CONVERT(nvarchar(MAX), @vml) +

  CONVERT(nvarchar(MAX), @soapenvelope))

DECLARE @iCount int

DECLARE @i int

SET @iCount = @xmlSchemas.value('count(//@processContents[. = "lax"])','int')

SET @i = 0

WHILE (@i < @iCount)

      BEGIN

            SET @xmlSchemas.modify('replace value of (//@processContents[. = "lax"])[1] with "skip"')

            SET @i = @i + 1

      END

CREATE XML SCHEMA COLLECTION Excel2003 AS @xmlSchemas

go

 

 

-
Disclaimer:
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.