Making a StreamWriter usable even after given garbage characters

I recently got a question from a customer using a StreamWriter with a UTF-8 encoding. The StreamWriter threw an EncoderFallbackException on an attempt to write “garbage” Unicode characters. For example, on an attempt to write U+DFC9, which is only half of a Unicode character (not a complete surrogate pair) an EncoderFallbackException was thrown.

That part seemed fine since the input was bogus. However, after that exception is thrown, the StreamWriter instance became effectively unusable; even calling WriteLine() on it threw EncoderFallbackException. So the customer asked how to make the writer usable even after the exception.

This behavior seems bad but it isn't a bug that, by default, the StreamWriter becomes unusable after getting bogus data. This was a design decision (from long ago) to make StreamWriter tolerant of encoding errors when reading but very strict when writing. Anything you do subsequently –- Flush(), Close(), etc would hit the encoding error again. The idea is, when you encounter an initial error, you should probably be concerned about fidelity of the rest of the stream, so just bail out as soon as you detect the stream is corrupt.

In this case, the customer was fine with not attempting to write garbage characters, but didn't want to StreamWriter to become unusable; for example to avoid losing the previous data.

Fortunately there's a solution since the encoding's EncoderFallback property can be set to emit fallback characters instead of throwing an exception. In this example, the encoding's default fallback behavior was to throw an exception; however, you can set the property to use a replacement character, e.g.: Encoding.EncoderFallback = EncoderFallback.ReplacementFallback. Then, instead of getting an EncoderFallbackException, the bogus characters are replaced with the fallback, and the StreamWriter continues to be usable.

Comments