Special Character Conversion when Writing XML Content
The XmlWriter includes a method, WriteRaw, which allows you to write out raw markup manually. This method prevents special characters from being escaped. This is in contrast to the WriteString method, which escapes some strings to their equivalent entity reference. The characters that are escaped are given in the XML 1.0 recommendation in section 2.4 Character Data and Markup, and section 3.3.3 Attribute-Value Normalization of the Extensible Markup Language (XML) 1.0 (fourth edition) recommendation. If the WriteString method is called when writing an attribute value, it escapes ' and ". Character values 0x-0x1F are encoded as numeric character entities � through , except for the white space characters 0x9, 0x10, and 0x13.
Therefore, the guiding principle of when to use WriteString or WritingRaw is that WriteString is used when you need to walk through every character looking for entity characters, and WriteRaw writes exactly what it is given.
The WriteNode method copies everything from the current node, and the reader is positioned at the writer. The reader is then advanced to the next sibling node for further processing. The WriteNode method is a quick way to extract information from one document into another.
The following table shows the supported NodeTypes for the WriteNode method.
Node type |
Description |
---|---|
Element |
Writes out the element node and all attribute nodes. |
Attribute |
No operation. Use WriteStartAttribute or WriteAttributeString to write the attribute. |
Text |
Writes out the text node. |
CDATA |
Writes out the CDATA section node. |
EntityReference |
Writes the Entity Ref node. |
ProcessingInstruction |
Writes the PI node. |
Comment |
Writes the Comment node. |
DocumentType |
Writes the DocType Node. |
Whitespace |
Writes the Whitespace node. |
SignificantWhitespace |
Writes out the Whitespace node. |
EndElement |
No operation. |
EndEntity |
No operation. |
The following example shows the difference between the WriteString and WriteRaw method when given the "<" character. This code sample uses WriteString.
w.WriteStartElement("myRoot")
w.WriteString("<")
w.WriteEndElement()
Dim tw As New XmlTextWriter(Console.Out)
tw.WriteDocType(name, pubid, sysid, subset)
w.WriteStartElement("myRoot");
w.WriteString("<");
w.WriteEndElement();
XmlTextWriter tw = new XmlTextWriter(Console.Out);
tw.WriteDocType(name, pubid, sysid, subset);
Output
<myRoot><</myRoot>
This code sample uses WriteRaw, and the output has an illegal character as the element content.
w.WriteStartElement("myRoot")
w.WriteRaw("<")
w.WriteEndElement()
w.WriteStartElement("myRoot");
w.WriteRaw("<");
w.WriteEndElement();
Output
<myRoot><</myRoot>
The following example shows how to convert an XML document from an element-centric document to an attribute-centric document. You can also convert an XML attribute-centric document back to an element-centric document. An element-centric mode means the XML document was designed to have many elements but few attributes. An attribute-centric design has fewer elements, and what would be elements in an element-centric design have been made attributes of elements. So there are fewer elements, but more attributes per element.
This sample is useful if you have designed the XML data in either mode, as this will allow it to be converted to the other mode.
The following XML is using an element-centric document. The elements contain no attributes.
Input - centric.xml
<?xml version='1.0' encoding='UTF-8'?>
<root>
<Customer>
<firstname>Jerry</firstname>
<lastname>Larson</lastname>
<Order>
<OrderID>Ord-12345</OrderID>
<OrderDetail>
<Quantity>1301</Quantity>
<UnitPrice>$3000</UnitPrice>
<ProductName>Computer</ProductName>
</OrderDetail>
</Order>
</Customer>
</root>
The following sample application does the conversion.
' The program will convert an element-centric document to an
' attribute-centric document or element-centric to attribute-centric.
Imports System
Imports System.Xml
Imports System.IO
Imports System.Text
Imports System.Collections
Class ModeConverter
Private bufferSize As Integer = 2048
Friend Class ElementNode
Private _name As [String]
Private _prefix As [String]
Private _namespace As [String]
Private _startElement As Boolean
Friend Sub New()
Me._name = Nothing
Me._prefix = Nothing
Me._namespace = Nothing
Me._startElement = False
End Sub 'New
Friend Sub New(prefix As [String], name As [String], [nameSpace] As [String])
Me._name = name
Me._prefix = prefix
Me._namespace = [nameSpace]
End Sub 'New
Public ReadOnly Property name() As [String]
Get
Return _name
End Get
End Property
Public ReadOnly Property prefix() As [String]
Get
Return _prefix
End Get
End Property
Public ReadOnly Property [nameSpace]() As [String]
Get
Return _namespace
End Get
End Property
Public Property startElement() As Boolean
Get
Return _startElement
End Get
Set
_startElement = value
End Set
End Property
End Class 'ElementNode
' Entry point which delegates to C-style main Private Function.
Public Overloads Shared Sub Main()
Main(System.Environment.GetCommandLineArgs())
End Sub
Overloads Public Shared Sub Main(args() As [String])
Dim modeConverter As New ModeConverter()
If args(0) Is Nothing Or args(0) = "?" Or args.Length < 2 Then
modeConverter.Usage()
Return
End If
Dim sourceFile As New FileStream(args(1), FileMode.Open, FileAccess.Read, FileShare.Read)
Dim targetFile As New FileStream(args(2), FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite)
If args(0) = "-a" Then
modeConverter.ConertToAttributeCentric(sourceFile, targetFile)
Else
modeConverter.ConertToElementCentric(sourceFile, targetFile)
End If
Return
End Sub 'Main
Public Sub Usage()
Console.WriteLine("? This help message " + ControlChars.Lf)
Console.WriteLine("Convert -mode sourceFile, targetFile " + ControlChars.Lf)
Console.WriteLine(ControlChars.Tab + " mode: e element centric" + ControlChars.Lf)
Console.WriteLine(ControlChars.Tab + " mode: a attribute centric" + ControlChars.Lf)
End Sub 'Usage
Public Sub ConertToAttributeCentric(sourceFile As FileStream, targetFile As FileStream)
' Stack is used to track how many.
Dim stack As New Stack()
Dim reader As New XmlTextReader(sourceFile)
reader.Read()
Dim writer As New XmlTextWriter(targetFile, reader.Encoding)
writer.Formatting = Formatting.Indented
Do
Select Case reader.NodeType
Case XmlNodeType.XmlDeclaration
writer.WriteStartDocument((Nothing = reader.GetAttribute("standalone") Or "yes" = reader.GetAttribute("standalone")))
Case XmlNodeType.Element
Dim element As New ElementNode(reader.Prefix, reader.LocalName, reader.NamespaceURI)
If 0 = stack.Count Then
writer.WriteStartElement(element.prefix, element.name, element.nameSpace)
element.startElement = True
End If
stack.Push(element)
Case XmlNodeType.Attribute
Throw New Exception("We should never been here!")
Case XmlNodeType.Text
Dim attribute As New ElementNode()
attribute = CType(stack.Pop(), ElementNode)
element = CType(stack.Peek(), ElementNode)
If Not element.startElement Then
writer.WriteStartElement(element.prefix, element.name, element.nameSpace)
element.startElement = True
End If
writer.WriteStartAttribute(attribute.prefix, attribute.name, attribute.nameSpace)
writer.WriteRaw(reader.Value)
reader.Read() 'jump over the EndElement
Case XmlNodeType.EndElement
writer.WriteEndElement()
stack.Pop()
Case XmlNodeType.CDATA
writer.WriteCData(reader.Value)
Case XmlNodeType.Comment
writer.WriteComment(reader.Value)
Case XmlNodeType.ProcessingInstruction
writer.WriteProcessingInstruction(reader.Name, reader.Value)
Case XmlNodeType.EntityReference
writer.WriteEntityRef(reader.Name)
Case XmlNodeType.Whitespace
writer.WriteWhitespace(reader.Value);
Case XmlNodeType.None
writer.WriteRaw(reader.Value)
Case XmlNodeType.SignificantWhitespace
writer.WriteWhitespace(reader.Value)
Case XmlNodeType.DocumentType
writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value)
Case XmlNodeType.EndEntity
Case Else
Console.WriteLine(("UNKNOWN Node Type = " + CInt(reader.NodeType)))
End Select
Loop While reader.Read()
writer.WriteEndDocument()
reader.Close()
writer.Flush()
writer.Close()
End Sub 'ConertToAttributeCentric
' Use the WriteNode to simplify the process.
Public Sub ConertToElementCentric(sourceFile As FileStream, targetFile As FileStream)
Dim reader As New XmlTextReader(sourceFile)
reader.Read()
Dim writer As New XmlTextWriter(targetFile, reader.Encoding)
writer.Formatting = Formatting.Indented
Do
Select Case reader.NodeType
Case XmlNodeType.Element
writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI)
If reader.MoveToFirstAttribute() Then
Do
writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI)
writer.WriteRaw(reader.Value)
writer.WriteEndElement()
Loop While reader.MoveToNextAttribute()
writer.WriteEndElement()
End If
Case XmlNodeType.Attribute
Throw New Exception("We should never been here!")
Case XmlNodeType.Whitespace
writer.WriteWhitespace(reader.Value)
Case XmlNodeType.EndElement
writer.WriteEndElement()
Case XmlNodeType.Text
Throw New Exception("The input document is not a attribute centric document" + ControlChars.Lf)
Case Else
Console.WriteLine(reader.NodeType)
writer.WriteNode(reader, False)
End Select
Loop While reader.Read()
reader.Close()
writer.Flush()
writer.Close()
End Sub 'ConertToElementCentric
End Class 'ModeConverter
// The program will convert an element-centric document to an
// attribute-centric document or element-centric to attribute-centric.
using System;
using System.Xml;
using System.IO;
using System.Text;
using System.Collections;
class ModeConverter {
private const int bufferSize=2048;
internal class ElementNode {
String _name;
String _prefix;
String _namespace;
bool _startElement;
internal ElementNode() {
this._name = null;
this._prefix = null;
this._namespace = null;
this._startElement = false;
}
internal ElementNode(String prefix, String name, String nameSpace) {
this._name = name;
this._prefix = prefix;
this._namespace = nameSpace;
}
public String name{
get { return _name; }
}
public String prefix{
get { return _prefix; }
}
public String nameSpace{
get { return _namespace; }
}
public bool startElement{
get { return _startElement; }
set { _startElement = value;}
}
}
public static void Main(String[] args) {
ModeConverter modeConverter = new ModeConverter();
if (args[0]== null || args[0]== "?" || args.Length < 2 ) {
modeConverter.Usage();
return;
}
FileStream sourceFile = new FileStream(args[1], FileMode.Open, FileAccess.Read, FileShare.Read);
FileStream targetFile = new FileStream(args[2], FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
if (args[0] == "-a") {
modeConverter.ConertToAttributeCentric(sourceFile, targetFile);
} else {
modeConverter.ConertToElementCentric(sourceFile, targetFile);
}
return;
}
public void Usage() {
Console.WriteLine("? This help message \n");
Console.WriteLine("Convert -mode sourceFile, targetFile \n");
Console.WriteLine("\t mode: e element centric\n");
Console.WriteLine("\t mode: a attribute centric\n");
}
public void ConertToAttributeCentric(FileStream sourceFile, FileStream targetFile) {
// Stack is used to track how many.
Stack stack = new Stack();
XmlTextReader reader = new XmlTextReader(sourceFile);
reader.Read();
XmlTextWriter writer = new XmlTextWriter(targetFile, reader.Encoding);
writer.Formatting = Formatting.Indented;
do {
switch (reader.NodeType) {
case XmlNodeType.XmlDeclaration:
writer.WriteStartDocument(null == reader.GetAttribute("standalone") || "yes" == reader.GetAttribute("standalone"));
break;
case XmlNodeType.Element:
ElementNode element = new ElementNode(reader.Prefix, reader.LocalName, reader.NamespaceURI);
if (0 == stack.Count) {
writer.WriteStartElement(element.prefix, element.name, element.nameSpace);
element.startElement=true;
}
stack.Push(element);
break;
case XmlNodeType.Attribute:
throw new Exception("We should never been here!");
case XmlNodeType.Text:
ElementNode attribute = new ElementNode();
attribute = (ElementNode)stack.Pop();
element = (ElementNode)stack.Peek();
if (!element.startElement) {
writer.WriteStartElement(element.prefix, element.name, element.nameSpace);
element.startElement=true;
}
writer.WriteStartAttribute(attribute.prefix, attribute.name, attribute.nameSpace);
writer.WriteRaw(reader.Value);
reader.Read(); //jump over the EndElement
break;
case XmlNodeType.EndElement:
writer.WriteEndElement();
stack.Pop();
break;
case XmlNodeType.CDATA:
writer.WriteCData(reader.Value);
break;
case XmlNodeType.Comment:
writer.WriteComment(reader.Value);
break;
case XmlNodeType.ProcessingInstruction:
writer.WriteProcessingInstruction(reader.Name, reader.Value);
break;
case XmlNodeType.EntityReference:
writer.WriteEntityRef( reader.Name);
break;
case XmlNodeType.Whitespace:
writer.WriteWhitespace(reader.Value);
break;
case XmlNodeType.None:
writer.WriteRaw(reader.Value);
break;
case XmlNodeType.SignificantWhitespace:
writer.WriteWhitespace(reader.Value);
break;
case XmlNodeType.DocumentType:
writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
break;
case XmlNodeType.EndEntity:
break;
default:
Console.WriteLine("UNKNOWN Node Type = " + ((int)reader.NodeType));
break;
}
} while (reader.Read());
writer.WriteEndDocument();
reader.Close();
writer.Flush();
writer.Close();
}
// Use the WriteNode to simplify the process.
public void ConertToElementCentric(FileStream sourceFile, FileStream targetFile) {
XmlTextReader reader = new XmlTextReader(sourceFile);
reader.Read();
XmlTextWriter writer = new XmlTextWriter(targetFile, reader.Encoding);
writer.Formatting = Formatting.Indented;
do {
switch (reader.NodeType) {
case XmlNodeType.Element:
writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
if (reader.MoveToFirstAttribute()) {
do {
writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
writer.WriteRaw(reader.Value);
writer.WriteEndElement();
} while(reader.MoveToNextAttribute());
writer.WriteEndElement();
}
break;
case XmlNodeType.Attribute:
throw new Exception("We should never been here!");
case XmlNodeType.Whitespace:
writer.WriteWhitespace(reader.Value);
break;
case XmlNodeType.EndElement:
writer.WriteEndElement();
break;
case XmlNodeType.Text:
throw new Exception("The input document is not a attribute centric document\n");
default:
Console.WriteLine(reader.NodeType);
writer.WriteNode(reader, false);
break;
}
} while (reader.Read());
reader.Close();
writer.Flush();
writer.Close();
}
}
After the code is compiled, run it from the command line by typing in <compiled name> -a centric.xml <output file name>. The output file must exist and can be an empty text file.
For the following output, assuming the C# program was compiled to centric_cs, the command line is C:\centric_cs -a centric.xml centric_out.xml.
The mode of -a tells the application to convert the input XML to attribute-centric, while a mode of -e will change it to element-centric. The output below is the new attribute-centric output generated using the -a mode. The elements now contain attributes instead of nested elements.
Output: centric_out.xml
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<root>
<Customer firstname="Jerry" lastname="Larson">
<Order OrderID="Ord-12345">
<OrderDetail Quantity="1301" UnitPrice="$3000" ProductName="Computer" />
</Order>
</Customer>
</root>
See Also
Reference
Concepts
Well-Formed XML Creation with the XmlTextWriter