White Space
The World Wide Web Consortium (W3C) XML specification normalizes different line-ending conventions to a single convention but preserves all other white space, except in attribute values. XML also provides a set of tools that documents can use to signal to applications if white space must be preserved.
White Space and the XML Declaration
According to the current XML 1.0 standard, white space is not allowed before the XML declaration.
<?xml version="1.0"?>
<BOOK>
<BOOKNAME>XML</BOOKNAME>
</BOOK>
If white space appears before the XML declaration, it will be treated as a processing instruction. The information, particularly the encoding, may not be used by the parser.
For more information about the XML declaration, see XML Declaration.
White Space in Element Content
XML parsers are required to report all white space that appears in element content within a document. For this reason, the following three documents are different to an XML parser.
<document>
<data>1</data>
<data>2</data>
<data>3</data>
</document>
and:
<document><data>1</data><data>2</data><data>3</data></document>
and:
<document><data>1</data> <data>2</data> <data>3</data></document>
For some applications, the values of the three data points matter more than the pretty-printing. For document-oriented XML applications, white space preservation can be critical.
Document authors can use the xml:space
attribute to identify portions of documents where white space is considered important. Style sheets can also use the xml:space
attribute as a hook to preserve white space in presentation. However, because many XML applications do not understand the xml:space
attribute, its use is considered advisory.
The xml:space
attribute accepts two values.
default
This value allows the application to handle white space as necessary. Not including anxml:space
attribute produces the same result as using thedefault
value.
preserve
This value instructs the application to maintain white space as is, suggesting that it might have meaning.
The values of xml:space
attributes apply to all descendants of the element containing the attribute unless overridden by one of the child elements.
For example, the following documents specify the same white space behavior.
<poem xml:space="default">
<author>
<givenName>Alix</givenName>
<familyName>Krakowski</familyName>
</author>
<verse xml:space="preserve">
<line>Roses are red,</line>
<line>Violets are blue.</line>
<signature xml:space="default">-Alix</signature>
</verse>
</poem>
and:
<poem xml:space="default">
<author xml:space="default">
<givenName xml:space="default">Alix</givenName>
<familyName xml:space="default">Krakowski</familyName>
</author>
<verse xml:space="preserve">
<line xml:space="preserve">Roses are red,</line>
<line xml:space="preserve">Violets are blue.</line>
<signature xml:space="default">-Alix</signature>
</verse>
</poem>
In both examples, the application is notified that all of the white space in the lines of the poem must be preserved, but that white space in other parts of the document can be handled as necessary.
Like its language-indicating counterpart, xml:lang
, the xml:space
attribute must be declared in a document type definition (DTD) if used in a validating environment. The xml namespace does not need to be declared because it is reserved by the XML specification.
By default, Microsoft XML Core Services (MSXML) does not honor the xml:space
attribute. If an application must honor the xml:space
attribute, the preserveWhiteSpace
property of the DOMDocument
object must be set to True
prior to parsing.
xmldoc= new ActiveXObject("Msxml2.DOMDocument.5.0");
xmldoc.preserveWhiteSpace = true;
xmldoc.load(url);
MSXML also provides settings that let you delegate application white space handling to the parser. For more information, see the topic entitled "White Space and the DOM" in MSXML SDK documentation.
Note |
---|
Preserving white space information can significantly increase the size of Document Object Model (DOM) trees because of the overhead involved in preserving white space nodes between elements. |
White Space in Attributes
Although XML processors preserve all white space in element content, they frequently normalize it in attribute values. Tabs, carriage returns, and spaces are reported as single spaces. In certain types of attributes, they trim white space that comes before or after the main body of the value and reduce white space within the value to single spaces. (If a DTD is available, this trimming will be performed on all attributes that are not of type CDATA.)
For example, an XML document might contain the following:
<whiteSpaceLoss note1="this is a note." note2="this
is
a
note.">
An XML parser reports both attribute values as "this is a note."
, converting the line breaks to single spaces.
Note: Neither DOM nor SAX in MSXML3 normalize white space. DOM in MSXML6 does not normalize white space, but SAX does.
If there is a DTD for the document, attributes that are declared to be of types other than CDATA have spaces removed from the beginning and end of the attribute value; all white space clusters inside the value are replaced with single spaces. If there is no DTD, the parser assumes that all attributes are of type CDATA.
End of Line Handling
XML processors treat the character sequence Carriage Return-Line Feed (CRLF) like single CR or LF characters. All are reported as a single LF character. Applications can save documents using the appropriate line-ending convention.