What's my Encoding Called?
There is a bit of confusion about the System.Text.Encoding names, primarily "Which name do I use for my Encoding?"
The Encoding class has 3 hame properties: BodyName, WebName and HeaderName, and the EncodingInfo objects returned by Encoding.GetEncodings have an additional Name property. The examples in the MSDN documentation list a table.
EncodingInfo Encoding
Name CodePage BodyName HeaderName WebName EncodingName
shift_jis 932 iso-2022-jp iso-2022-jp shift_jis Japanese (Shift-JIS)
windows-1250 1250 iso-8859-2 windows-1250 windows-1250 Central European (Windows)
windows-1251 1251 koi8-r windows-1251 windows-1251 Cyrillic (Windows)
Windows-1252 1252 iso-8859-1 Windows-1252 Windows-1252 Western European (Windows)
windows-1253 1253 iso-8859-7 windows-1253 windows-1253 Greek (Windows)
windows-1254 1254 iso-8859-9 windows-1254 windows-1254 Turkish (Windows)
csISO2022JP 50221 iso-2022-jp iso-2022-jp csISO2022JP Japanese (JIS-Allow 1 byte Kana)
iso-2022-kr 50225 iso-2022-kr euc-kr iso-2022-kr Korean (ISO)
The short answer is that if you use the WebName in code that round-trips the encoding, like Encoding.GetEncoding(myEncoding.WebName()), you'll end up with the same encoding you started with. The WebName is also the same name that is used by EncodingInfo.Name. The WebName is the name you should use if you need to recreate the same encoding later (fallbacks and other optional flags would be lost, but otherwise the behavior would be the same).
So if you're "supposed" to use WebName, what's the BodyName and the HeaderName for?
The idea behind the Header and Body names are to support email applications. Not all encodings behave well in the body or header of an email, so these encodings shoudl be used instead. For example, if you have the Encoding from Encoding.GetEncoding("iso-2022-kr") at the bottom of the list, then the WebName would allow you to round trip that name, however if you had data that you wanted to encode in the Header of an e-mail, then you should call Encoding.GetEncoding(myEncoding.HeaderName) and use that encoding for the header data.
The key is that the Header and Body names describe which encoding to use to support a similar set of characters to the Encoding in question. Of course I'd recommend using UTF-8 &/or worst case UTF-7 whenever possible in e-mail. For that matter I'd recommend using Unicode whenever possible, but sometimes older protocols or other limitations prevent that.
Comments
- Anonymous
November 06, 2006
I was asked about our use of the windows "ansi" code page names, as used in things like MIME types, http