Поделиться через


Understanding Encodings

Microsoft Silverlight will reach end of support after October 2021. Learn more.

A character encoding system describes the rules by which a character set can be translated into numbers. Any character encoding system consists of two separate components:

  • An encoder, which translates a sequence of characters into a sequence of numeric values (a sequence of bytes).

  • A decoder, which translates a sequence of bytes into a series of characters.

Encodings in the .NET Framework for Silverlight

The .NET Framework for Silverlight includes two encoding classes:

  • The UTF8Encoding class, which uses UTF-8 encoding to represent a character in from one to three bytes. UTF8Encoding has been tuned to be as fast as possible and should be faster than any other encoding.

  • The UnicodeEncoding class, which uses UTF-16 encoding to represent a character in either two or four bytes. UTF-16 encoded bytes can be in either little-endian format (least significant byte first) or big-endian format (most significant byte first). For example, the space character (\u0020) is encoded as 0x20 0x00 in little-endian format and as 0x00 0x20 in big-endian format. Internally, the .NET Framework stores text using UTF-16 encoding in a little-endian format.

Both of these classes inherit from the Encoding class.

If you require an encoding that is not available in the .NET Framework for Silverlight, you have two options:

Choosing a Fallback Strategy

When a method tries an encoding or decoding operation but no mapping exists, it must implement a fallback strategy, which determines how the failed mapping should be handled. There are two types of fallback strategies:

  • Default

    If the attempt to encode a character fails, it is replaced by the byte sequence for the REPLACEMENTCHARACTER character. This is 0xFD 0xFF for little-endian Unicode, 0xFF 0xFD for big-endian Unicode, and 0xEF 0xBF 0xBD for UTF-7. The default fallback is used with all Encoding objects except those instantiated by calling the UTF8Encoding.UTF8Encoding(Boolean, Boolean) and UnicodeEncoding.UnicodeEncoding(Boolean, Boolean, Boolean) constructors with the throwOnInvalidBytes parameter set to true.

  • Application-defined

    If an Encoding object is instantiated by calling the UTF8Encoding.UTF8Encoding(Boolean, Boolean) or UnicodeEncoding.UnicodeEncoding(Boolean, Boolean, Boolean) constructor with the throwOnInvalidBytes parameter set to true, an EncoderFallbackException is thrown if an encoding method cannot successfully map a character to a byte sequence, and a DecoderFallbackException is thrown if a decoding method cannot successfully map a byte sequence to a character. By handling the exception, the application can define a substitute byte sequence for an encoding operation or a specific replacement character for a decoding operation.