What's with Encoding.GetMaxByteCount() and Encoding.GetMaxCharCount()? Part 2

Artikel
06/21/2006

A little over a year ago I wrote What's with Encoding.GetMaxByteCount() and Encoding.GetMaxCharCount()? to address the question "Why does GetMaxCharCount(1) for my favorite Encoding return 2 instead of 1." (Short answer is that the Decoder/Encoder could have stored data from a previous call).

To follow up, what about the special case of zero? It seems that GetMaxByte/CharCount(0) should always be 0. The answer again is because of the encoder/decoder and the fallback.

Consider that a call to Decoder.GetChars() ends with a lead byte for UTF-8. The decoder is going to remember that lead byte, expecting the next call to GetChars() to contain the remaining byte(s) necessary to decode a complete UTF-8 sequence.

However if the next call passes in an empty input buffer, yet requests that the buffer get flushed, then the decoder's going to have to process that lonely lead byte anyway. This happens for example at the end of a sequence. In this case, the decoder's going to call the fallback for the lone lead byte, which by default for UTF-8 will now return a U+FFFD. So even with an empty input buffer, UTF-8 can return a character.

Similar cases happen with most other encodings, although there are a few cases where encodings don't have left over bytes when decoding.

Comments

Anonymous
July 05, 2006

At the office we're doing a lot of development in Visual
Studio 2005 which targets version 1.1 of...

Freigeben über

What's with Encoding.GetMaxByteCount() and Encoding.GetMaxCharCount()? Part 2

Comments

Zusätzliche Ressourcen