WWSAPI to WCF interop 8: invalid XML characters (part 1)

Although all Unicode characters can be carried in XML document, not all characters are considered legal according to XML 1.0 spec, the version used by SOAP and supported by WWSAPI. As you can see in the production copied below, ASCII characters under 32 except tab, carriage return and line feed are considered invalid.

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

WCF’s XML reader/writer will process the characters that are not consider legal (as long as they are represented properly as character references), but WWSAPI enforces the XML rule and rejects invalid characters. For example, if a text encoding WS_XML_READER reads a text string that contains \u0000 (represented as �), it will fail with error code WS_E_INVALID_FORMAT with corresponding error string “The character reference '�' is not valid.”

 

If your solution is using WWSAPI client to communicate with WCF service and the service might return string of arbitrary characters, you’d have to turn off the invalid character checking at WWSAPI. The corresponding property is WS_XML_READER_PROPERTY_ALLOW_INVALID_CHARACTER_REFERENCES for XML reader and WS_XML_WRITER_PROPERTY_ALLOW_INVALID_CHARACTER_REFERENCES for XML writer. Since you typically don’t deal with XML reader/writer directly when making calls and sending/receiving messages, you’d have to set these properties through message property, which is then set through some other property. For example, if you using WS_SERVICE_PROXY, here is what you need to do:

 

        // define the writer property array and put it in a writer properties struct

        BOOL allowInvalidCharacters = TRUE;

        WS_XML_WRITER_PROPERTY writerPropertyArray[] = {

            {WS_XML_WRITER_PROPERTY_ALLOW_INVALID_CHARACTER_REFERENCES, &allowInvalidCharacters, sizeof(allowInvalidCharacters)}

        };

        WS_XML_WRITER_PROPERTIES writerProperties = {writerPropertyArray, WsCountOf(writerPropertyArray)};

        // define the reader property array and put it in a reader properties struct

        WS_XML_READER_PROPERTY readerPropertyArray[] = {

            {WS_XML_READER_PROPERTY_ALLOW_INVALID_CHARACTER_REFERENCES, &allowInvalidCharacters, sizeof(allowInvalidCharacters)}

        };

        WS_XML_READER_PROPERTIES readerProperties = {readerPropertyArray, WsCountOf(readerPropertyArray)};

        // set the reader/writer properties in message property array, which is then put in a message properties struct

        WS_MESSAGE_PROPERTY messagePropertyArray[] = {

            {WS_MESSAGE_PROPERTY_XML_WRITER_PROPERTIES, &writerProperties, sizeof(writerProperties)},

            {WS_MESSAGE_PROPERTY_XML_READER_PROPERTIES, &readerProperties, sizeof(readerProperties)}

        };

        WS_MESSAGE_PROPERTIES messageProperties = {messagePropertyArray, WsCountOf(messagePropertyArray)};

        // now set the message properties in the proxy property array

        WS_PROXY_PROPERTY proxyPropertyArray[] = {

            {WS_PROXY_PROPERTY_MESSAGE_PROPERTIES, &messageProperties, sizeof(messageProperties)}

        };

 

Then you pass the proxy property array to WsCreateServiceProxy. Similarly, if you use WS_SERVICE_HOST on a WWSAPI server and want to allow invalid characters, you just set the message properties in WS_SERVICE_ENDPOINT_PROPERTY_MESSAGE_PROPERTIES on the corresponding service endpoint.

Comments

  • Anonymous
    April 22, 2009
    The comment has been removed
  • Anonymous
    April 27, 2009
    In part 1 of this topic, I explained that some Unicode characters would be rejected by WWSAPI’s XML reader