Frequently Asked Questions on XML in .NET - Part 1
As part of a separate task, the XML team came up with a list of frequently encountered issues in System.XML; mainly points that we felt were interesting because they were the source of a lot of difficulty for our users. These questions ranged from rarely used (or misused) methods to difficult XML constructs. We focused specifically on scenarios that were particularly difficult to debug. When we had completed the exercise, it occurred to the team that we should publish the list.
So what follows is the first in a multi-part series in which we will outline each of the questions we defined and provide an explanation of the correct usage, along with some sample code. However, our motives are not all selfless in this exercise; we are hoping to hear back from you in the comments section if there are any good questions we have missed. Let us know!
Q1: Invalid Literals
There are a number of reserved characters in XML that cannot be included as literals in an XML string – characters such as “&” and “<”. These characters must be escaped and the XML standard provides three methods for doing this: characters references (&), entity references (&) and CDATA sections.
This idea will seem like basic XML101 information for experienced XML users but it can be a difficult pain point for those new to XML. The problem is compounded because the error messages returned by the XML processor can be confusing. The error is also a simple one for a new user to stumble upon because the characters are often contained in regular text that may be used as the source for the XML data. The invalid literals are:
· & (&)
· < (<)
· > (>)
· ‘ (')
· “ (")
Here is a sample list of incorrect literal strings, along with the correct usage and the exception message that is given for the incorrect usage. The messages all produce a line and position number which greatly assist the debugging process.
Character String |
Correct Usage |
Exception Message |
A & B |
A & B |
An error occurred while parsing EntityName. Line X, position Y. |
A &c B |
A &c B |
' ' is an unexpected token. The expected token is ';'. Line X, position Y. |
A &# B |
A &# B |
Invalid syntax for a decimal numeric entity reference. Line X, position Y. |
A < B |
A < B |
Name cannot begin with the ' ' character, hexadecimal value 0x20. Line X, position Y. |
Refer to section 2.4 of the XML Standard for more information on invalid literals.
Note: There will be a later post that deals with invalid XML characters in general and the correct way to deal with them.
Some future FAQ post topics include but are not limited to:
- Conformance Levels
- Reporting Validation Warnings
- Correct Encodings
- ProhibitDTD setting
- XLinq Bridge Classes
Shayne Burgess
Program Manager | Data Programmability
Comments
Anonymous
September 10, 2008
PingBack from http://hoursfunnywallpaper.cn/?p=5574Anonymous
September 15, 2008
Thanks. This is the sort of information that is very helpful. Everyone gets HTML encoding wrong the first time. And, in this case, HTML URLs are particularly problematic. URLs use "&" to delimit query string parameters, but is not valid XML. Please continue to include more blogs on common errors. Often the XML parser error messages are cryptic and too technical to display to the user.Anonymous
September 25, 2008
Check the Part I and Part II of the Frequently asked questions around XML and .NET integration.