Udostępnij za pośrednictwem


Evil encoding configuration

This post is the final one for the XmlWriter and StreamWriter encoding series (see parts one and two first).

If you ran the original example, you'll notice that the last combination of encodings we wrote produced this output.

Stream Encoding: (no stream)
  XML Encoding:  Norwegian (IA5)

Unhandled Exception: System.Text.EncoderFallbackException: Unable to translate Unicode character \u0023 at index 55 to specified code page.
   at System.Text.EncoderExceptionFallbackBuffer.Fallback(Char charUnknown, Int32 index)
   at System.Xml.CharEntityEncoderFallbackBuffer.Fallback(Char charUnknown, Int32 index)
   at System.Text.EncoderFallbackBuffer.InternalFallback(Char ch, Char*& chars)
   at System.Text.SBCSCodePageEncoding.GetBytes(Char* chars, Int32 charCount, Byte* bytes, Int32 byteCount, EncoderNLS encoder)
   at System.Text.EncoderNLS.Convert(Char* chars, Int32 charCount, Byte* bytes, Int32 byteCount, Boolean flush, Int32& charsUsed, Int32& bytesUsed, Boolean& completed)
   at System.Text.EncoderNLS.Convert(Char[] chars, Int32 charIndex, Int32 charCount, Byte[] bytes, Int32 byteIndex, Int32 byteCount, Boolean flush, Int32& charsUsed, Int32& bytesUsed, Boolean& complet
ed)
   at System.Xml.XmlEncodedRawTextWriter.EncodeChars(Int32 startOffset, Int32 endOffset, Boolean writeAllToStream)
   at System.Xml.XmlEncodedRawTextWriter.FlushBuffer()
   at System.Xml.XmlEncodedRawTextWriter.Flush()
   at System.Xml.XmlWellFormedWriter.Flush()
   at Cs.Cs.WriteXml(XmlWriter xmlWriter)
   at Cs.Cs.WriteEncodedXml(Encoding streamEncoding, Encoding xmlEncoding, Stream stream)

This is very much an edge case, and it takes a couple of minutes to figure out what's going on.

Let's start from the code that produces this problem:

  Encoding muhaha = Encoding.GetEncoding(
    "x-IA5-Norwegian",
    new EncoderExceptionFallback(),
    new DecoderExceptionFallback());

This encoding is built up as follows. First, it specifies the x-IA5-Norwegian encoding. Then, it specifies that if it cannot map a character to this encoding, it should throw an exception. Typically you can configure encodings to fall back to writing a best-fit character or a generic '?' character, but depending on your system, this may be the wrong thing to do - think, for example, that you cannot reliably round-trip data any more. So, to be safe, we're setting the encoding to fail if that's unavailable.

Now, XML has a pretty nifty way of dealing with characters that cannot be directly encoded, by using character references. This allows you to write 	 instead of a tab character, for example. So even if the encoding doesn't support a character, XML can still represent it using this escape hatch.

And there's the rub. x-IA-Norwegian is one of the extremely rare encodings that doesn't have the '#' character in it's repertoire! So when the XML writer sees a character that's not in the encoding (I used '#' itself for extra irony points), it tries to write the reference, and then the encoder fails again to write '#'. At that point, the writer gives up and allows the exception to bubble out unhandled, which in our simple program just print out the exception to the console.

Enjoy!

Comments

  • Anonymous
    March 19, 2010
    The comment has been removed
  • Anonymous
    March 20, 2010
    The comment has been removed
  • Anonymous
    March 22, 2010
    I don't have any problems with it...