.NET で文字エンコーディングクラスを使用する方法

[アーティクル]
05/10/2023

この記事では、.NET に用意されている、さまざまなエンコードスキームを使用してテキストをエンコードおよびデコードするためのクラスの使用方法について説明します。この手順は、.NET での文字エンコードの概要に関する記事を既に読んでいることを前提としています。

エンコーダーとデコーダー

.NET には、さまざまなエンコードシステムを使用してテキストをエンコードおよびデコードする、エンコーディングクラスが用意されています。たとえば、UTF8Encoding クラスには、UTF-8 へのエンコードと UTF-8 からのデコードに関する規則が記述されています。 .NET では、string インスタンスに対して UTF-16 エンコード (UnicodeEncoding クラスによって表されます) が使用されます。エンコーダーとデコーダーは、他のエンコードスキームでも使用できます。

エンコードとデコードに検証を含めることもできます。たとえば、UnicodeEncoding クラスでは、サロゲート範囲内のすべての char インスタンスがチェックされ、有効なサロゲートペアが構成されていることが確認されます。エンコーダーが無効な文字を処理する方法や、デコーダーが無効なバイトを処理する方法は、フォールバックストラテジによって決まります。

警告

.NET のエンコーディングクラスは、文字データを格納および変換するためのものです。バイナリデータを文字列形式で格納する目的では使用しないでください。エンコーディングクラスを使用してバイナリデータを文字列形式に変換すると、使用されているエンコーディングによっては、予期しない動作が発生したり、不正確なデータや破損したデータが生成されたりする可能性があります。バイナリデータを文字列形式に変換するには、 Convert.ToBase64String メソッドを使用します。

.NET のすべての文字エンコーディングクラスは、すべての文字エンコーディングに共通の機能を定義する抽象クラスの System.Text.Encoding クラスを継承します。 .NET に実装されている個々のエンコーディングオブジェクトにアクセスするには次の方法があります。

Encoding クラスの静的プロパティを使います。これらのプロパティは、.NET で使用できる標準の文字エンコーディング (ASCII、UTF-7、UTF-8、UTF-16、および UTF-32) を表すオブジェクトを返します。たとえば、 Encoding.Unicode プロパティは UnicodeEncoding オブジェクトを返します。各オブジェクトでは、エンコードできない文字列とデコードできないバイトを処理するために、置換フォールバックが使用されます詳細については、「Replacement Fallback」をご覧ください。
エンコーディングのクラスコンストラクターを呼び出します。 ASCII、UTF-7、UTF-8、UTF-16、および UTF-32 の各エンコーディングのオブジェクトは、この方法でインスタンス化できます。既定では、各オブジェクトはエンコードできない文字列とデコードできないバイトを処理するために置換フォールバックを使用します。ただし、代わりに例外がスローされるように指定することもできます詳細については、「Replacement Fallback」および「Exception Fallback」セクションをご覧ください。
Encoding(Int32) コンストラクターを呼び出して、エンコーディングを表す整数を渡します。エンコードできない文字列とデコードできないバイトの処理には、標準エンコーディングのエンコーディングオブジェクトでは置換フォールバックが、コードページエンコーディングと 2 バイト文字セット (DBCS) エンコーディングのエンコーディングオブジェクトでは最適フォールバックが使用されます詳細については、「Best-Fit Fallback」をご覧ください。
Encoding.GetEncoding メソッドを呼び出します。このメソッドは、.NET で使用できる任意のエンコーディング (標準、コードページ、または DBCS) を返します。オーバーロードを使用すると、エンコーダーおよびデコーダーの両方のフォールバックオブジェクトを指定できます。

.NET で使えるすべてのエンコーディングに関する情報を取得するには、Encoding.GetEncodings メソッドを呼び出します。 .NET でサポートされている文字エンコードスキームを、次の表に示します。

エンコーディングクラス	説明
ASCII	バイトの下位 7 ビットを使用して、限られた範囲の文字をエンコードします。 ASCII エンコーディングでは、`U+0000` から `U+007F` までの文字値しかサポートされていないため、ほとんどの場合、国際対応アプリケーションでは ASCII エンコーディングの使用は不適切です。
UTF-7	文字を 7 ビット ASCII 文字のシーケンスとして表します。 ASCII 以外の Unicode 文字は、ASCII 文字のエスケープシーケンスによって表します。 UTF-7 では、メールやニュースグループなどのプロトコルがサポートされています。ただし、UTF-7 は特に安全でも堅牢でもありません。場合によっては、1 ビットの変更により、UTF-7 文字列全体の解釈が完全に変わる場合があります。他の場合には、異なる UTF-7 文字列がエンコードによって同じテキストになる可能性もあります。 ASCII 以外の文字を含むシーケンスの場合、UTF-7 は UTF-8 よりも多くの空間を必要とし、エンコードとデコードに時間がかかります。したがって、可能であれば、UTF-7 ではなく UTF-8 を使用してください。
UTF-8	各 Unicode コードポイントが、1 バイトから 4 バイトのシーケンスとして表現されます。 UTF-8 では、8 ビットデータサイズがサポートされており、既存の多くのオペレーティングシステムに対応できます。 ASCII 範囲の文字については、UTF-8 は ASCII エンコーディングと一致し、より広範な文字を提供します。ただし、中国語、日本語、韓国語 (CJK) スクリプトでは、UTF-8 の各文字に 3 バイトが必要となることがあり、データサイズが UTF-16 より大きくなる可能性があります。場合によっては、HTML タグなどの ASCII データのサイズによって、CJK 範囲によるサイズの増加が相殺されることがあります。
UTF-16	各 Unicode コードポイントが、1 つまたは 2 つの 16 ビット整数のシーケンスとして表現されます。ほとんどの一般的な Unicode 文字で必要とされる UTF-16 コードポイントは 1 つだけです。ただし、Unicode の補助文字 (U+10000 以上) には 2 つの UTF-16 サロゲートコードポイントが必要です。リトルエンディアンとビッグエンディアンの両方のバイト順をサポートしています。 UTF-16 エンコーディングは、共通言語ランタイムでは Char および String の値を表現するために、Windows オペレーティングシステムでは `WCHAR` の値を表現するために使用されています。
UTF-32	各 Unicode コードポイントが 32 ビット整数として表現されます。リトルエンディアンとビッグエンディアンの両方のバイト順をサポートしています。 UTF-32 エンコーディングは、エンコードされた空白がきわめて重要な意味を持つオペレーティングシステムで、アプリケーションが UTF-16 エンコーディングのサロゲートコードポイント動作を回避する必要がある場合に使用します。ディスプレイ上でレンダリングされる 1 つのグリフも複数の UTF-32 文字でエンコードされることがあります。
ANSI/ISO エンコード	さまざまなコードページがサポートされています。 Windows オペレーティングシステムでは、特定の言語または言語グループをサポートするためにコードページが使用されます。 .NET でサポートされているコードページの一覧表については、Encoding クラスを参照してください。特定のコードページのエンコーディングオブジェクトを取得するには、 Encoding.GetEncoding(Int32) メソッドを呼び出します。コードページには、0 から始まる 256 個のコードポイントが含まれています。コードポイント 0 ～ 127 は、ほとんどのコードページで ASCII 文字セットを表しますが、コードポイント 128 ～ 255 が表す文字は、コードページによって異なります。たとえば、コードページ 1252 には、英語、ドイツ語、フランス語などのラテン語書記体系の文字を表す文字コードが含まれています。このコードページ 1252 の最後の 128 個のコードポイントには、アクセント記号付き文字が含まれています。また、コードページ 1253 には、ギリシャ語書記体系で必要とされる文字コードが含まれています。このコードページ 1253 の最後の 128 個のコードポイントには、ギリシャ文字が含まれています。このため、ANSI コードページに依存するアプリケーションでギリシャ語とドイツ語を同じテキストストリームに格納する場合には、参照先コードページを示す識別子を含める必要があります。
2 バイト文字セット (DBCS) エンコーディング	中国語、日本語、韓国語など、256 個以上の文字から成る言語をサポートします。 DBCS では、コードポイントのペア (2 バイト) が 1 つの文字を表します。 DBCS エンコーディングでは、 Encoding.IsSingleByte プロパティは `false` を返します。特定の DBCS のエンコーディングオブジェクトを取得するには、 Encoding.GetEncoding(Int32) メソッドを呼び出します。アプリケーションで DBCS データを処理する場合、DBCS 文字の最初のバイト (先頭バイト) は、その後に続く後続バイトと組み合わせて処理されます。このスキームでは、日本語や中国語など、2 種類の言語を組み合わせて同じデータストリームで使用することはできません。これは、2 バイトコードポイントのペアが表す文字が、コードページによって異なるためです。

これらのエンコーディングを使用することにより、Unicode 文字だけでなく、レガシアプリケーションで最もよく使用されているエンコーディングにも対応できます。また、 Encoding から派生するクラスを定義し、そのメンバーをオーバーライドして、カスタムエンコーディングを作成することもできます。

.NET Core でのエンコードのサポート

既定で、.NET Core では、コードページ 28591 以外のコードページエンコーディングや Unicode エンコーディング (UTF-8 や UTF-16 など) を使用できません。ただし、使うアプリに、.NET を対象とする標準の Windows アプリに含まれているコードページエンコーディングを追加できます。詳細については、「CodePagesEncodingProvider」を参照してください。

エンコーディングクラスの選択

アプリケーションで使用するエンコーディングを選択できる場合は、Unicode エンコーディング (できれば UTF8Encoding または UnicodeEncoding) を使用するようにしてください (.NET でサポートされている Unicode エンコーディングには、そのほかに UTF32Encoding もあります)。

ASCII エンコーディング (ASCIIEncoding) を使用しようとしている場合は、代わりに UTF8Encoding を選択してください。この 2 つのエンコーディングは、ASCII 文字セットに対する動作は変わりませんが、 UTF8Encoding には次のような利点があります。

すべての Unicode 文字を表現できます ( ASCIIEncoding でサポートされているのは U+0000 ～ U+007F の Unicode 文字値だけです)。
エラー検出に対応しており、セキュリティも強化されます。
できるだけ高速になるように調整されているため、他のエンコーディングよりも高速です。全体が ASCII のコンテンツの場合でも、 UTF8Encoding で実行される演算は、 ASCIIEncodingで実行される演算よりも高速になります。

ASCIIEncoding は、レガシアプリケーションの場合にのみ使用を検討するようにしてください。ただし、レガシアプリケーションでも、次のような理由で UTF8Encoding の方が適していることもあります (既定の設定の場合)。

厳密には ASCII でないコンテンツがアプリケーションに含まれている場合、それを ASCIIEncodingでエンコードすると、ASCII 以外の各文字は疑問符 (?) としてエンコードされます。アプリケーションがこのデータをデコードすると、情報は失われます。
厳密には ASCII でないコンテンツがアプリケーションに含まれている場合、それを UTF8Encodingでエンコードすると、結果を ASCII として解釈しようとしても一見理解不能になります。ただし、アプリケーションがこのデータを UTF-8 デコーダーを使用してデコードすると、データのラウンドトリップが正常に行われます。

Web アプリケーションでは、Web 要求への応答としてクライアントに送信される文字に、クライアントで使用されているエンコーディングが反映されるようにする必要があります。ほとんどの場合は、ユーザーが期待するエンコーディングでテキストを表示するために、 HttpResponse.ContentEncoding プロパティを HttpRequest.ContentEncoding プロパティの戻り値に設定する必要があります。

エンコーディングオブジェクトの使用

エンコーダーは、文字列 (通常は Unicode 文字) を対応する数値 (バイト) に変換します。たとえば、ASCII エンコーダーを使用すると、Unicode 文字を ASCII に変換してコンソールに表示することができます。この変換を実行するには、 Encoding.GetBytes メソッドを呼び出します。エンコードされた文字列を格納するために必要なバイト数をエンコードの実行前に確認するには、 GetByteCount メソッドを呼び出します。

次の例では、1 つのバイト配列を使用して、文字列を 2 つの個別の操作でエンコードしています。バイト配列内の次の ASCII エンコード済みバイトセットの開始位置を示すインデックスが保持されています。この例では、 ASCIIEncoding.GetByteCount(String) メソッドを呼び出して、エンコードされた文字列を格納するために十分な大きさがバイト配列にあるかどうかを確認します。次に、 ASCIIEncoding.GetBytes(String, Int32, Int32, Byte[], Int32) メソッドを呼び出して、文字列の文字をエンコードします。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      string[] strings= { "This is the first sentence. ",
                          "This is the second sentence. " };
      Encoding asciiEncoding = Encoding.ASCII;

      // Create array of adequate size.
      byte[] bytes = new byte[49];
      // Create index for current position of array.
      int index = 0;

      Console.WriteLine("Strings to encode:");
      foreach (var stringValue in strings) {
         Console.WriteLine($"   {stringValue}");

         int count = asciiEncoding.GetByteCount(stringValue);
         if (count + index >=  bytes.Length)
            Array.Resize(ref bytes, bytes.Length + 50);

         int written = asciiEncoding.GetBytes(stringValue, 0,
                                              stringValue.Length,
                                              bytes, index);

         index = index + written;
      }
      Console.WriteLine("\nEncoded bytes:");
      Console.WriteLine("{0}", ShowByteValues(bytes, index));
      Console.WriteLine();

      // Decode Unicode byte array to a string.
      string newString = asciiEncoding.GetString(bytes, 0, index);
      Console.WriteLine($"Decoded: {newString}");
   }

   private static string ShowByteValues(byte[] bytes, int last )
   {
      string returnString = "   ";
      for (int ctr = 0; ctr <= last - 1; ctr++) {
         if (ctr % 20 == 0)
            returnString += "\n   ";
         returnString += String.Format("{0:X2} ", bytes[ctr]);
      }
      return returnString;
   }
}
// The example displays the following output:
//       Strings to encode:
//          This is the first sentence.
//          This is the second sentence.
//
//       Encoded bytes:
//
//          54 68 69 73 20 69 73 20 74 68 65 20 66 69 72 73 74 20 73 65
//          6E 74 65 6E 63 65 2E 20 54 68 69 73 20 69 73 20 74 68 65 20
//          73 65 63 6F 6E 64 20 73 65 6E 74 65 6E 63 65 2E 20
//
//       Decoded: This is the first sentence. This is the second sentence.

Imports System.Text

Module Example
    Public Sub Main()
        Dim strings() As String = {"This is the first sentence. ",
                                    "This is the second sentence. "}
        Dim asciiEncoding As Encoding = Encoding.ASCII

        ' Create array of adequate size.
        Dim bytes(50) As Byte
        ' Create index for current position of array.
        Dim index As Integer = 0

        Console.WriteLine("Strings to encode:")
        For Each stringValue In strings
            Console.WriteLine("   {0}", stringValue)

            Dim count As Integer = asciiEncoding.GetByteCount(stringValue)
            If count + index >= bytes.Length Then
                Array.Resize(bytes, bytes.Length + 50)
            End If
            Dim written As Integer = asciiEncoding.GetBytes(stringValue, 0,
                                                            stringValue.Length,
                                                            bytes, index)

            index = index + written
        Next
        Console.WriteLine()
        Console.WriteLine("Encoded bytes:")
        Console.WriteLine("{0}", ShowByteValues(bytes, index))
        Console.WriteLine()

        ' Decode Unicode byte array to a string.
        Dim newString As String = asciiEncoding.GetString(bytes, 0, index)
        Console.WriteLine("Decoded: {0}", newString)
    End Sub

    Private Function ShowByteValues(bytes As Byte(), last As Integer) As String
        Dim returnString As String = "   "
        For ctr As Integer = 0 To last - 1
            If ctr Mod 20 = 0 Then returnString += vbCrLf + "   "
            returnString += String.Format("{0:X2} ", bytes(ctr))
        Next
        Return returnString
    End Function
End Module
' The example displays the following output:
'       Strings to encode:
'          This is the first sentence.
'          This is the second sentence.
'       
'       Encoded bytes:
'       
'          54 68 69 73 20 69 73 20 74 68 65 20 66 69 72 73 74 20 73 65
'          6E 74 65 6E 63 65 2E 20 54 68 69 73 20 69 73 20 74 68 65 20
'          73 65 63 6F 6E 64 20 73 65 6E 74 65 6E 63 65 2E 20
'       
'       Decoded: This is the first sentence. This is the second sentence.

デコーダーは、特定の文字エンコーディングが反映されたバイト配列を、文字配列または文字列の一連の文字に変換します。バイト配列を文字配列にデコードするには、 Encoding.GetChars メソッドを呼び出します。バイト配列を文字列にデコードするには、 GetString メソッドを呼び出します。デコードされたバイトを格納するために必要な文字数をデコードの実行前に確認するには、 GetCharCount メソッドを呼び出します。

次の例では、3 つの文字列をエンコードした後、1 つの文字配列にデコードしています。文字配列内の次のデコード済み文字セットの開始位置を示すインデックスが保持されています。この例では、 GetCharCount メソッドを呼び出して、デコードされたすべての文字を格納するために十分な大きさが文字配列にあるかどうかを確認します。次に、 ASCIIEncoding.GetChars(Byte[], Int32, Int32, Char[], Int32) メソッドを呼び出して、バイト配列をデコードします。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      string[] strings = { "This is the first sentence. ",
                           "This is the second sentence. ",
                           "This is the third sentence. " };
      Encoding asciiEncoding = Encoding.ASCII;
      // Array to hold encoded bytes.
      byte[] bytes;
      // Array to hold decoded characters.
      char[] chars = new char[50];
      // Create index for current position of character array.
      int index = 0;

      foreach (var stringValue in strings) {
         Console.WriteLine($"String to Encode: {stringValue}");
         // Encode the string to a byte array.
         bytes = asciiEncoding.GetBytes(stringValue);
         // Display the encoded bytes.
         Console.Write("Encoded bytes: ");
         for (int ctr = 0; ctr < bytes.Length; ctr++)
            Console.Write(" {0}{1:X2}",
                          ctr % 20 == 0 ? Environment.NewLine : "",
                          bytes[ctr]);
         Console.WriteLine();

         // Decode the bytes to a single character array.
         int count = asciiEncoding.GetCharCount(bytes);
         if (count + index >=  chars.Length)
            Array.Resize(ref chars, chars.Length + 50);

         int written = asciiEncoding.GetChars(bytes, 0,
                                              bytes.Length,
                                              chars, index);
         index = index + written;
         Console.WriteLine();
      }

      // Instantiate a single string containing the characters.
      string decodedString = new string(chars, 0, index - 1);
      Console.WriteLine("Decoded string: ");
      Console.WriteLine(decodedString);
   }
}
// The example displays the following output:
//    String to Encode: This is the first sentence.
//    Encoded bytes:
//    54 68 69 73 20 69 73 20 74 68 65 20 66 69 72 73 74 20 73 65
//    6E 74 65 6E 63 65 2E 20
//
//    String to Encode: This is the second sentence.
//    Encoded bytes:
//    54 68 69 73 20 69 73 20 74 68 65 20 73 65 63 6F 6E 64 20 73
//    65 6E 74 65 6E 63 65 2E 20
//
//    String to Encode: This is the third sentence.
//    Encoded bytes:
//    54 68 69 73 20 69 73 20 74 68 65 20 74 68 69 72 64 20 73 65
//    6E 74 65 6E 63 65 2E 20
//
//    Decoded string:
//    This is the first sentence. This is the second sentence. This is the third sentence.

Imports System.Text

Module Example
    Public Sub Main()
        Dim strings() As String = {"This is the first sentence. ",
                                    "This is the second sentence. ",
                                    "This is the third sentence. "}
        Dim asciiEncoding As Encoding = Encoding.ASCII
        ' Array to hold encoded bytes.
        Dim bytes() As Byte
        ' Array to hold decoded characters.
        Dim chars(50) As Char
        ' Create index for current position of character array.
        Dim index As Integer

        For Each stringValue In strings
            Console.WriteLine("String to Encode: {0}", stringValue)
            ' Encode the string to a byte array.
            bytes = asciiEncoding.GetBytes(stringValue)
            ' Display the encoded bytes.
            Console.Write("Encoded bytes: ")
            For ctr As Integer = 0 To bytes.Length - 1
                Console.Write(" {0}{1:X2}", If(ctr Mod 20 = 0, vbCrLf, ""),
                                            bytes(ctr))
            Next
            Console.WriteLine()

            ' Decode the bytes to a single character array.
            Dim count As Integer = asciiEncoding.GetCharCount(bytes)
            If count + index >= chars.Length Then
                Array.Resize(chars, chars.Length + 50)
            End If
            Dim written As Integer = asciiEncoding.GetChars(bytes, 0,
                                                            bytes.Length,
                                                            chars, index)
            index = index + written
            Console.WriteLine()
        Next

        ' Instantiate a single string containing the characters.
        Dim decodedString As New String(chars, 0, index - 1)
        Console.WriteLine("Decoded string: ")
        Console.WriteLine(decodedString)
    End Sub
End Module
' The example displays the following output:
'    String to Encode: This is the first sentence.
'    Encoded bytes:
'    54 68 69 73 20 69 73 20 74 68 65 20 66 69 72 73 74 20 73 65
'    6E 74 65 6E 63 65 2E 20
'    
'    String to Encode: This is the second sentence.
'    Encoded bytes:
'    54 68 69 73 20 69 73 20 74 68 65 20 73 65 63 6F 6E 64 20 73
'    65 6E 74 65 6E 63 65 2E 20
'    
'    String to Encode: This is the third sentence.
'    Encoded bytes:
'    54 68 69 73 20 69 73 20 74 68 65 20 74 68 69 72 64 20 73 65
'    6E 74 65 6E 63 65 2E 20
'    
'    Decoded string:
'    This is the first sentence. This is the second sentence. This is the third sentence.

Encoding から派生するクラスのエンコードメソッドとデコードメソッドは、データ全体を処理するように設計されています。つまり、エンコードまたはデコードするすべてのデータが 1 回のメソッド呼び出しで渡されます。ただし、場合によっては、データがストリームで提供され、エンコードまたはデコードするデータを複数の読み取り操作で取得しなければならないこともあります。このような場合は、エンコード操作またはデコード操作で、前回の実行時に保存した状態を呼び出す必要があります。 Encoder および Decoder から派生するクラスのメソッドでは、複数のメソッド呼び出しにまたがるエンコード操作とデコード操作を処理できます。

特定のエンコーディングの Encoder オブジェクトは、そのエンコーディングの Encoding.GetEncoder プロパティから取得できます。特定のエンコーディングの Decoder オブジェクトは、そのエンコーディングの Encoding.GetDecoder プロパティから取得できます。デコード操作の場合、 Decoder から派生するクラスに含まれるのは Decoder.GetChars メソッドだけで、 Encoding.GetStringに対応するメソッドはありません。

次の例は、Unicode のバイト配列のデコードに Encoding.GetString メソッドを使用する場合と Decoder.GetChars メソッドを使用する場合の違いを示しています。この例では、いくつかの Unicode 文字を含む文字列をファイルにエンコードした後、この 2 つのデコードメソッドを使用して一度に 10 バイトずつデコードしています。 10 番目と 11 番目のバイトに出現するサロゲートペアは、別のメソッド呼び出しでデコードされます。出力を見るとわかるように、 Encoding.GetString メソッドではこれらのバイトが正しくデコードされず、U+FFFD (REPLACEMENT CHARACTER) に置き換えられます。一方、 Decoder.GetChars メソッドでは、このバイト配列が正しくデコードされて、元の文字列を取得できます。

using System;
using System.IO;
using System.Text;

public class Example
{
   public static void Main()
   {
      // Use default replacement fallback for invalid encoding.
      UnicodeEncoding enc = new UnicodeEncoding(true, false, false);

      // Define a string with various Unicode characters.
      string str1 = "AB YZ 19 \uD800\udc05 \u00e4";
      str1 += "Unicode characters. \u00a9 \u010C s \u0062\u0308";
      Console.WriteLine("Created original string...\n");

      // Convert string to byte array.
      byte[] bytes = enc.GetBytes(str1);

      FileStream fs = File.Create(@".\characters.bin");
      BinaryWriter bw = new BinaryWriter(fs);
      bw.Write(bytes);
      bw.Close();

      // Read bytes from file.
      FileStream fsIn = File.OpenRead(@".\characters.bin");
      BinaryReader br = new BinaryReader(fsIn);

      const int count = 10;            // Number of bytes to read at a time.
      byte[] bytesRead = new byte[10]; // Buffer (byte array).
      int read;                        // Number of bytes actually read.
      string str2 = String.Empty;      // Decoded string.

      // Try using Encoding object for all operations.
      do {
         read = br.Read(bytesRead, 0, count);
         str2 += enc.GetString(bytesRead, 0, read);
      } while (read == count);
      br.Close();
      Console.WriteLine("Decoded string using UnicodeEncoding.GetString()...");
      CompareForEquality(str1, str2);
      Console.WriteLine();

      // Use Decoder for all operations.
      fsIn = File.OpenRead(@".\characters.bin");
      br = new BinaryReader(fsIn);
      Decoder decoder = enc.GetDecoder();
      char[] chars = new char[50];
      int index = 0;                   // Next character to write in array.
      int written = 0;                 // Number of chars written to array.
      do {
         read = br.Read(bytesRead, 0, count);
         if (index + decoder.GetCharCount(bytesRead, 0, read) - 1 >= chars.Length)
            Array.Resize(ref chars, chars.Length + 50);

         written = decoder.GetChars(bytesRead, 0, read, chars, index);
         index += written;
      } while (read == count);
      br.Close();
      // Instantiate a string with the decoded characters.
      string str3 = new String(chars, 0, index);
      Console.WriteLine("Decoded string using UnicodeEncoding.Decoder.GetString()...");
      CompareForEquality(str1, str3);
   }

   private static void CompareForEquality(string original, string decoded)
   {
      bool result = original.Equals(decoded);
      Console.WriteLine("original = decoded: {0}",
                        original.Equals(decoded, StringComparison.Ordinal));
      if (! result) {
         Console.WriteLine("Code points in original string:");
         foreach (var ch in original)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));
         Console.WriteLine();

         Console.WriteLine("Code points in decoded string:");
         foreach (var ch in decoded)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));
         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//    Created original string...
//
//    Decoded string using UnicodeEncoding.GetString()...
//    original = decoded: False
//    Code points in original string:
//    0041 0042 0020 0059 005A 0020 0031 0039 0020 D800 DC05 0020 00E4 0055 006E 0069 0063 006F
//    0064 0065 0020 0063 0068 0061 0072 0061 0063 0074 0065 0072 0073 002E 0020 00A9 0020 010C
//    0020 0073 0020 0062 0308
//    Code points in decoded string:
//    0041 0042 0020 0059 005A 0020 0031 0039 0020 FFFD FFFD 0020 00E4 0055 006E 0069 0063 006F
//    0064 0065 0020 0063 0068 0061 0072 0061 0063 0074 0065 0072 0073 002E 0020 00A9 0020 010C
//    0020 0073 0020 0062 0308
//
//    Decoded string using UnicodeEncoding.Decoder.GetString()...
//    original = decoded: True

Imports System.IO
Imports System.Text

Module Example
    Public Sub Main()
        ' Use default replacement fallback for invalid encoding.
        Dim enc As New UnicodeEncoding(True, False, False)

        ' Define a string with various Unicode characters.
        Dim str1 As String = String.Format("AB YZ 19 {0}{1} {2}",
                                           ChrW(&hD800), ChrW(&hDC05), ChrW(&h00e4))
        str1 += String.Format("Unicode characters. {0} {1} s {2}{3}",
                              ChrW(&h00a9), ChrW(&h010C), ChrW(&h0062), ChrW(&h0308))
        Console.WriteLine("Created original string...")
        Console.WriteLine()

        ' Convert string to byte array.                     
        Dim bytes() As Byte = enc.GetBytes(str1)

        Dim fs As FileStream = File.Create(".\characters.bin")
        Dim bw As New BinaryWriter(fs)
        bw.Write(bytes)
        bw.Close()

        ' Read bytes from file.
        Dim fsIn As FileStream = File.OpenRead(".\characters.bin")
        Dim br As New BinaryReader(fsIn)

        Const count As Integer = 10      ' Number of bytes to read at a time. 
        Dim bytesRead(9) As Byte         ' Buffer (byte array).
        Dim read As Integer              ' Number of bytes actually read. 
        Dim str2 As String = ""          ' Decoded string.

        ' Try using Encoding object for all operations.
        Do
            read = br.Read(bytesRead, 0, count)
            str2 += enc.GetString(bytesRead, 0, read)
        Loop While read = count
        br.Close()
        Console.WriteLine("Decoded string using UnicodeEncoding.GetString()...")
        CompareForEquality(str1, str2)
        Console.WriteLine()

        ' Use Decoder for all operations.
        fsIn = File.OpenRead(".\characters.bin")
        br = New BinaryReader(fsIn)
        Dim decoder As Decoder = enc.GetDecoder()
        Dim chars(50) As Char
        Dim index As Integer = 0         ' Next character to write in array.
        Dim written As Integer = 0       ' Number of chars written to array.
        Do
            read = br.Read(bytesRead, 0, count)
            If index + decoder.GetCharCount(bytesRead, 0, read) - 1 >= chars.Length Then
                Array.Resize(chars, chars.Length + 50)
            End If
            written = decoder.GetChars(bytesRead, 0, read, chars, index)
            index += written
        Loop While read = count
        br.Close()
        ' Instantiate a string with the decoded characters.
        Dim str3 As New String(chars, 0, index)
        Console.WriteLine("Decoded string using UnicodeEncoding.Decoder.GetString()...")
        CompareForEquality(str1, str3)
    End Sub

    Private Sub CompareForEquality(original As String, decoded As String)
        Dim result As Boolean = original.Equals(decoded)
        Console.WriteLine("original = decoded: {0}",
                          original.Equals(decoded, StringComparison.Ordinal))
        If Not result Then
            Console.WriteLine("Code points in original string:")
            For Each ch In original
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()

            Console.WriteLine("Code points in decoded string:")
            For Each ch In decoded
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module
' The example displays the following output:
'    Created original string...
'    
'    Decoded string using UnicodeEncoding.GetString()...
'    original = decoded: False
'    Code points in original string:
'    0041 0042 0020 0059 005A 0020 0031 0039 0020 D800 DC05 0020 00E4 0055 006E 0069 0063 006F
'    0064 0065 0020 0063 0068 0061 0072 0061 0063 0074 0065 0072 0073 002E 0020 00A9 0020 010C
'    0020 0073 0020 0062 0308
'    Code points in decoded string:
'    0041 0042 0020 0059 005A 0020 0031 0039 0020 FFFD FFFD 0020 00E4 0055 006E 0069 0063 006F
'    0064 0065 0020 0063 0068 0061 0072 0061 0063 0074 0065 0072 0073 002E 0020 00A9 0020 010C
'    0020 0073 0020 0062 0308
'    
'    Decoded string using UnicodeEncoding.Decoder.GetString()...
'    original = decoded: True

フォールバックストラテジの選択

メソッドから文字のエンコードまたはデコードを行おうとしたときにマッピングが存在しない場合は、失敗したマッピングの処理方法を決めるフォールバックストラテジを実装する必要があります。次の 3 種類のフォールバックストラテジがあります。

Best-Fit Fallback
Replacement Fallback
Exception Fallback

重要

エンコード操作で最も一般的な問題は、Unicode 文字を特定のコードページエンコーディングにマップできない場合に発生します。デコード操作で最も一般的な問題は、無効なバイトシーケンスを有効な Unicode 文字に変換できない場合に発生します。そのため、個々のエンコーディングオブジェクトで使用されるフォールバックストラテジを把握しておく必要があります。エンコーディングオブジェクトをインスタンス化するときには、可能な限り、そのオブジェクトで使用されるフォールバックストラテジを指定するようにしてください。

Best-Fit Fallback

ターゲットエンコード内に厳密な一致がない文字について、エンコーダーは類似した文字へのマッピングを試みることができます (最適フォールバックは主にエンコード時の問題であり、デコード時の問題ではありません。Unicode に正常にマッピングできない文字を含むコードページはほとんどありません)。最適フォールバックは、Encoding.GetEncoding(Int32) および Encoding.GetEncoding(String) の各オーバーロードによって取得されるコードページエンコーディングと 2 バイト文字セットエンコーディングの既定のフォールバックストラテジです。

Note

.NET の Unicode エンコーディングクラス (UTF8Encoding、UnicodeEncoding、および UTF32Encoding) では、理論上すべての文字セットのすべての文字がサポートされているため、これらのクラスを使用すると最適フォールバックの問題を解消できます。

最適なストラテジはコードページごとに異なります。たとえば、全角のアルファベットがより一般的な半角のアルファベットにマッピングされるコードページもあれば、そのようなマッピングが行われないコードページもあります。積極的な最適ストラテジでも、一部のエンコーディングの一部の文字には可能な対応がない場合があります。たとえば、中国語の漢字からコードページ 1252 への適切なマッピングはありません。その場合は、置換文字列が使用されます。既定では、この文字列は単一の QUESTION MARK (疑問符) (U+003F) です。

Note

最適なストラテジは、詳細には文書化されていません。ただし、いくつかのコードページは、Unicode コンソーシアムの Web サイトで文書化されています。マッピングファイルを解釈する方法について詳しくは、そのフォルダーの readme.txt ファイルをご覧ください。

次の例では、コードページ 1252 (西ヨーロッパ言語の Windows コードページ) を使用して、最適マッピングとその欠点を示しています。まず、 Encoding.GetEncoding(Int32) メソッドを使用して、コードページ 1252 のエンコーディングオブジェクトを取得します。このエンコーディングオブジェクトは、サポートされていない Unicode 文字に対して既定で最適マッピングを使用します。次に、スペースで区切られた 3 つの非 ASCII 文字 (CIRCLED LATIN CAPITAL LETTER S (U+24C8)、SUPERSCRIPT FIVE (U+2075)、および INFINITY (U+221E)) を含む文字列をインスタンス化します。出力を見るとわかるように、この文字列をエンコードすると、スペースを除く元の 3 つの文字が、QUESTION MARK (U+003F)、DIGIT FIVE (U+0035)、および DIGIT EIGHT (U+0038) に置き換えられます。 DIGIT EIGHT は、サポートされていない INFINITY 文字の代替として最適とは言えません。QUESTION MARK は、元の文字に対応するマッピングがなかったことを示します。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      // Get an encoding for code page 1252 (Western Europe character set).
      Encoding cp1252 = Encoding.GetEncoding(1252);

      // Define and display a string.
      string str = "\u24c8 \u2075 \u221e";
      Console.WriteLine("Original string: " + str);
      Console.Write("Code points in string: ");
      foreach (var ch in str)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine("\n");

      // Encode a Unicode string.
      Byte[] bytes = cp1252.GetBytes(str);
      Console.Write("Encoded bytes: ");
      foreach (byte byt in bytes)
         Console.Write("{0:X2} ", byt);
      Console.WriteLine("\n");

      // Decode the string.
      string str2 = cp1252.GetString(bytes);
      Console.WriteLine("String round-tripped: {0}", str.Equals(str2));
      if (! str.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));
      }
   }
}
// The example displays the following output:
//       Original string: Ⓢ ⁵ ∞
//       Code points in string: 24C8 0020 2075 0020 221E
//
//       Encoded bytes: 3F 20 35 20 38
//
//       String round-tripped: False
//       ? 5 8
//       003F 0020 0035 0020 0038

Imports System.Text

Module Example
    Public Sub Main()
        ' Get an encoding for code page 1252 (Western Europe character set).
        Dim cp1252 As Encoding = Encoding.GetEncoding(1252)

        ' Define and display a string.
        Dim str As String = String.Format("{0} {1} {2}", ChrW(&h24c8), ChrW(&H2075), ChrW(&h221E))
        Console.WriteLine("Original string: " + str)
        Console.Write("Code points in string: ")
        For Each ch In str
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Encode a Unicode string.
        Dim bytes() As Byte = cp1252.GetBytes(str)
        Console.Write("Encoded bytes: ")
        For Each byt In bytes
            Console.Write("{0:X2} ", byt)
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Decode the string.
        Dim str2 As String = cp1252.GetString(bytes)
        Console.WriteLine("String round-tripped: {0}", str.Equals(str2))
        If Not str.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
        End If
    End Sub
End Module
' The example displays the following output:
'       Original string: Ⓢ ⁵ ∞
'       Code points in string: 24C8 0020 2075 0020 221E
'       
'       Encoded bytes: 3F 20 35 20 38
'       
'       String round-tripped: False
'       ? 5 8
'       003F 0020 0035 0020 0038

最適マッピングは、Unicode データをコードページデータにエンコードする Encoding オブジェクトの既定の動作です。レガシアプリケーションの中には、この動作に依存するものがあります。ただし、ほとんどの新しいアプリケーションでは、セキュリティ上の理由から、最適動作の使用を避ける必要があります。たとえば、アプリケーションで、最適エンコードを使用してドメイン名を付けないでください。

Note

エンコーディングに対してカスタムの最適フォールバックマッピングを実装することもできます。詳しくは、「 Implementing a Custom Fallback Strategy 」セクションをご覧ください。

エンコーディングオブジェクトの既定のフォールバックストラテジが最適フォールバックである場合、 Encoding オブジェクトを取得するときに別のフォールバックストラテジを選択することもできます。そのためには、 Encoding.GetEncoding(Int32, EncoderFallback, DecoderFallback) または Encoding.GetEncoding(String, EncoderFallback, DecoderFallback) のオーバーロードを呼び出します。次のセクションでは、コードページ 1252 にマップできない文字をアスタリスク (*) に置き換える例を紹介します。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      Encoding cp1252r = Encoding.GetEncoding(1252,
                                  new EncoderReplacementFallback("*"),
                                  new DecoderReplacementFallback("*"));

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      foreach (var ch in str1)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine();

      byte[] bytes = cp1252r.GetBytes(str1);
      string str2 = cp1252r.GetString(bytes);
      Console.WriteLine("Round-trip: {0}", str1.Equals(str2));
      if (! str1.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//       Ⓢ ⁵ ∞
//       24C8 0020 2075 0020 221E
//       Round-trip: False
//       * * *
//       002A 0020 002A 0020 002A

Imports System.Text

Module Example
    Public Sub Main()
        Dim cp1252r As Encoding = Encoding.GetEncoding(1252,
                                           New EncoderReplacementFallback("*"),
                                           New DecoderReplacementFallback("*"))

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&h24C8), ChrW(&h2075), ChrW(&h221E))
        Console.WriteLine(str1)
        For Each ch In str1
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()

        Dim bytes() As Byte = cp1252r.GetBytes(str1)
        Dim str2 As String = cp1252r.GetString(bytes)
        Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
        If Not str1.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module
' The example displays the following output:
'       Ⓢ ⁵ ∞
'       24C8 0020 2075 0020 221E
'       Round-trip: False
'       * * *
'       002A 0020 002A 0020 002A

Replacement Fallback

ターゲットスキームに厳密な一致がなく、マップできる適切な文字もない文字について、アプリケーションで置換文字または置換文字列を指定することができます。これは Unicode デコーダーの既定の動作です。Unicode デコーダーでは、デコードできない 2 バイトのシーケンスが REPLACEMENT_CHARACTER (U+FFFD) に置き換えられます。また、 ASCIIEncoding クラスの既定の動作でもあり、その場合はエンコードまたはデコードできない文字が疑問符に置き換えられます。次の例は、前の例の Unicode 文字列に対する文字置換を示しています。出力を見るとわかるように、ASCII バイト値にデコードできない文字は 0x3F (疑問符に対応する ASCII コード) に置き換えられます。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      Encoding enc = Encoding.ASCII;

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      foreach (var ch in str1)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine("\n");

      // Encode the original string using the ASCII encoder.
      byte[] bytes = enc.GetBytes(str1);
      Console.Write("Encoded bytes: ");
      foreach (var byt in bytes)
         Console.Write("{0:X2} ", byt);
      Console.WriteLine("\n");

      // Decode the ASCII bytes.
      string str2 = enc.GetString(bytes);
      Console.WriteLine("Round-trip: {0}", str1.Equals(str2));
      if (! str1.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//       Ⓢ ⁵ ∞
//       24C8 0020 2075 0020 221E
//
//       Encoded bytes: 3F 20 3F 20 3F
//
//       Round-trip: False
//       ? ? ?
//       003F 0020 003F 0020 003F

Imports System.Text

Module Example
    Public Sub Main()
        Dim enc As Encoding = Encoding.Ascii

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&h24C8), ChrW(&h2075), ChrW(&h221E))
        Console.WriteLine(str1)
        For Each ch In str1
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Encode the original string using the ASCII encoder.
        Dim bytes() As Byte = enc.GetBytes(str1)
        Console.Write("Encoded bytes: ")
        For Each byt In bytes
            Console.Write("{0:X2} ", byt)
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Decode the ASCII bytes.
        Dim str2 As String = enc.GetString(bytes)
        Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
        If Not str1.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module
' The example displays the following output:
'       Ⓢ ⁵ ∞
'       24C8 0020 2075 0020 221E
'       
'       Encoded bytes: 3F 20 3F 20 3F
'       
'       Round-trip: False
'       ? ? ?
'       003F 0020 003F 0020 003F

.NET には、エンコード操作またはデコード操作で正確にマップできない文字を置換文字列に置き換える EncoderReplacementFallback クラスと DecoderReplacementFallback クラスが含まれています。この置換文字列は、既定では疑問符ですが、クラスコンストラクターのオーバーロードを呼び出して別の文字列を選択することもできます。通常は単一の文字を使用しますが、単一でなくてもかまいません。次の例では、置換文字列としてアスタリスク (*) を使用する EncoderReplacementFallback オブジェクトをインスタンス化して、コードページ 1252 のエンコーダーの動作を変更しています。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      Encoding cp1252r = Encoding.GetEncoding(1252,
                                  new EncoderReplacementFallback("*"),
                                  new DecoderReplacementFallback("*"));

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      foreach (var ch in str1)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine();

      byte[] bytes = cp1252r.GetBytes(str1);
      string str2 = cp1252r.GetString(bytes);
      Console.WriteLine("Round-trip: {0}", str1.Equals(str2));
      if (! str1.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//       Ⓢ ⁵ ∞
//       24C8 0020 2075 0020 221E
//       Round-trip: False
//       * * *
//       002A 0020 002A 0020 002A

Imports System.Text

Module Example
    Public Sub Main()
        Dim cp1252r As Encoding = Encoding.GetEncoding(1252,
                                           New EncoderReplacementFallback("*"),
                                           New DecoderReplacementFallback("*"))

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&h24C8), ChrW(&h2075), ChrW(&h221E))
        Console.WriteLine(str1)
        For Each ch In str1
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()

        Dim bytes() As Byte = cp1252r.GetBytes(str1)
        Dim str2 As String = cp1252r.GetString(bytes)
        Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
        If Not str1.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module
' The example displays the following output:
'       Ⓢ ⁵ ∞
'       24C8 0020 2075 0020 221E
'       Round-trip: False
'       * * *
'       002A 0020 002A 0020 002A

Note

エンコーディング用の置換クラスを実装することもできます。詳しくは、「 Implementing a Custom Fallback Strategy 」セクションをご覧ください。

置換文字列としては、QUESTION MARK (U+003F) のほか、特に Unicode 文字列に正しく変換できないバイトシーケンスをデコードする場合に、Unicode REPLACEMENT CHARACTER (U+FFFD) がよく使用されます。ただし、置換文字列は自由に選択できます。置換文字列に複数の文字を含めることもできます。

Exception Fallback

最適フォールバックや置換文字列を提供する代わりに、エンコーダーで一連の文字をエンコードできない場合に EncoderFallbackException をスローしたり、デコーダーでバイト配列をデコードできない場合に DecoderFallbackException をスローしたりすることもできます。エンコード操作およびデコード操作で例外をスローするには、 EncoderExceptionFallback メソッドに DecoderExceptionFallback オブジェクトおよび Encoding.GetEncoding(String, EncoderFallback, DecoderFallback) オブジェクトを渡します。次の例は、 ASCIIEncoding クラスによる例外フォールバックを示しています。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      Encoding enc = Encoding.GetEncoding("us-ascii",
                                          new EncoderExceptionFallback(),
                                          new DecoderExceptionFallback());

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      foreach (var ch in str1)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine("\n");

      // Encode the original string using the ASCII encoder.
      byte[] bytes = {};
      try {
         bytes = enc.GetBytes(str1);
         Console.Write("Encoded bytes: ");
         foreach (var byt in bytes)
            Console.Write("{0:X2} ", byt);

         Console.WriteLine();
      }
      catch (EncoderFallbackException e) {
         Console.Write("Exception: ");
         if (e.IsUnknownSurrogate())
            Console.WriteLine("Unable to encode surrogate pair 0x{0:X4} 0x{1:X3} at index {2}.",
                              Convert.ToUInt16(e.CharUnknownHigh),
                              Convert.ToUInt16(e.CharUnknownLow),
                              e.Index);
         else
            Console.WriteLine("Unable to encode 0x{0:X4} at index {1}.",
                              Convert.ToUInt16(e.CharUnknown),
                              e.Index);
         return;
      }
      Console.WriteLine();

      // Decode the ASCII bytes.
      try {
         string str2 = enc.GetString(bytes);
         Console.WriteLine("Round-trip: {0}", str1.Equals(str2));
         if (! str1.Equals(str2)) {
            Console.WriteLine(str2);
            foreach (var ch in str2)
               Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

            Console.WriteLine();
         }
      }
      catch (DecoderFallbackException e) {
         Console.Write("Unable to decode byte(s) ");
         foreach (byte unknown in e.BytesUnknown)
            Console.Write("0x{0:X2} ");

         Console.WriteLine($"at index {e.Index}");
      }
   }
}
// The example displays the following output:
//       Ⓢ ⁵ ∞
//       24C8 0020 2075 0020 221E
//
//       Exception: Unable to encode 0x24C8 at index 0.

Imports System.Text

Module Example
    Public Sub Main()
        Dim enc As Encoding = Encoding.GetEncoding("us-ascii",
                                                   New EncoderExceptionFallback(),
                                                   New DecoderExceptionFallback())

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&h24C8), ChrW(&h2075), ChrW(&h221E))
        Console.WriteLine(str1)
        For Each ch In str1
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Encode the original string using the ASCII encoder.
        Dim bytes() As Byte = {}
        Try
            bytes = enc.GetBytes(str1)
            Console.Write("Encoded bytes: ")
            For Each byt In bytes
                Console.Write("{0:X2} ", byt)
            Next
            Console.WriteLine()
        Catch e As EncoderFallbackException
            Console.Write("Exception: ")
            If e.IsUnknownSurrogate() Then
                Console.WriteLine("Unable to encode surrogate pair 0x{0:X4} 0x{1:X3} at index {2}.",
                                  Convert.ToUInt16(e.CharUnknownHigh),
                                  Convert.ToUInt16(e.CharUnknownLow),
                                  e.Index)
            Else
                Console.WriteLine("Unable to encode 0x{0:X4} at index {1}.",
                                  Convert.ToUInt16(e.CharUnknown),
                                  e.Index)
            End If
            Exit Sub
        End Try
        Console.WriteLine()

        ' Decode the ASCII bytes.
        Try
            Dim str2 As String = enc.GetString(bytes)
            Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
            If Not str1.Equals(str2) Then
                Console.WriteLine(str2)
                For Each ch In str2
                    Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
                Next
                Console.WriteLine()
            End If
        Catch e As DecoderFallbackException
            Console.Write("Unable to decode byte(s) ")
            For Each unknown As Byte In e.BytesUnknown
                Console.Write("0x{0:X2} ")
            Next
            Console.WriteLine("at index {0}", e.Index)
        End Try
    End Sub
End Module
' The example displays the following output:
'       Ⓢ ⁵ ∞
'       24C8 0020 2075 0020 221E
'       
'       Exception: Unable to encode 0x24C8 at index 0.

Note

エンコード操作用のカスタム例外ハンドラーを実装することもできます。詳しくは、「 Implementing a Custom Fallback Strategy 」セクションをご覧ください。

EncoderFallbackException オブジェクトと DecoderFallbackException オブジェクトは、例外を引き起こした状況について以下の情報を提供します。

EncoderFallbackException オブジェクトに含まれている IsUnknownSurrogate メソッドにより、エンコードできない文字が不明なサロゲートペアか (この場合は trueが返されます)、不明な単一文字か (この場合は falseが返されます) が示されます。サロゲートペアの文字は、 EncoderFallbackException.CharUnknownHigh プロパティと EncoderFallbackException.CharUnknownLow プロパティから取得できます。不明な単一文字は、 EncoderFallbackException.CharUnknown プロパティから取得できます。また、 EncoderFallbackException.Index プロパティにより、エンコードできない最初の文字が見つかった文字列内の位置が示されます。
DecoderFallbackException オブジェクトに含まれている BytesUnknown プロパティにより、デコードできないバイト配列が返されます。また、 DecoderFallbackException.Index プロパティにより、不明なバイトの開始位置が示されます。

EncoderFallbackException オブジェクトと DecoderFallbackException オブジェクトでは、例外に関する診断情報は十分に入手できますが、エンコードバッファーやデコードバッファーにアクセスすることはできません。したがって、エンコードメソッド内またはデコードメソッド内で無効なデータを置換したり修正したりすることはできません。

Implementing a Custom Fallback Strategy

.NET には、コードページによって内部的に実装される最適マッピングに加えて、フォールバックストラテジを実装するための次のクラスが含まれています。

EncoderReplacementFallback および EncoderReplacementFallbackBuffer 。エンコード操作中に文字を置換します。
DecoderReplacementFallback および DecoderReplacementFallbackBuffer 。デコード操作中に文字を置換します。
EncoderExceptionFallback および EncoderExceptionFallbackBuffer 。文字をエンコードできない場合に EncoderFallbackException をスローします。
DecoderExceptionFallback および DecoderExceptionFallbackBuffer 。文字をデコードできない場合に DecoderFallbackException をスローします。

さらに、次の手順に従って、最適フォールバック、置換フォールバック、または例外フォールバックを使用するカスタムソリューションを実装できます。

エンコード操作の場合は EncoderFallback 、デコード操作の場合は DecoderFallback の派生クラスを作成します。
エンコード操作の場合は EncoderFallbackBuffer 、デコード操作の場合は DecoderFallbackBuffer の派生クラスを作成します。
例外フォールバックにおいて、あらかじめ定義されている EncoderFallbackException クラスと DecoderFallbackException クラスが目的に合わない場合は、 Exception や ArgumentExceptionなどの例外オブジェクトから派生クラスを作成します。

EncoderFallback または DecoderFallback からの派生

カスタムフォールバックソリューションを実装するには、エンコード操作の場合は EncoderFallback 、デコード操作の場合は DecoderFallback を継承するクラスを作成する必要があります。これらのクラスのインスタンスは Encoding.GetEncoding(String, EncoderFallback, DecoderFallback) メソッドに渡され、エンコーディングクラスとフォールバックの実装の仲介役として機能します。

エンコーダーまたはデコーダーのカスタムフォールバックソリューションを作成するときには、次のメンバーを実装する必要があります。

EncoderFallback.MaxCharCount プロパティまたは DecoderFallback.MaxCharCount プロパティ。最適、置換、例外の各フォールバックで単一の文字を置き換えるために返すことのできる文字の最大数を返します。カスタム例外フォールバックの場合は 0 になります。
EncoderFallback.CreateFallbackBuffer メソッドまたは DecoderFallback.CreateFallbackBuffer メソッド。 EncoderFallbackBuffer または DecoderFallbackBuffer のカスタム実装を返します。このメソッドは、エンコーダーで正しくエンコードできない文字が初めて検出されたとき、またはデコーダーで正しくデコードできないバイトが初めて検出されたときに呼び出されます。

EncoderFallbackBuffer または DecoderFallbackBuffer からの派生

カスタムフォールバックソリューションを実装するには、エンコード操作の場合は EncoderFallbackBuffer 、デコード操作の場合は DecoderFallbackBuffer を継承するクラスを作成する必要もあります。これらのクラスのインスタンスは、CreateFallbackBuffer および EncoderFallback クラスの DecoderFallback メソッドによって返されます。 EncoderFallback.CreateFallbackBuffer メソッドは、エンコーダーでエンコードできない文字が初めて検出されたときに呼び出され、 DecoderFallback.CreateFallbackBuffer メソッドは、デコーダーでデコードできないバイトが検出されたときに呼び出されます。 EncoderFallbackBuffer クラスと DecoderFallbackBuffer クラスは、フォールバックの実装を提供します。各インスタンスは、エンコードできない文字またはデコードできないバイトシーケンスを置き換えるフォールバック文字を含むバッファーを表します。

エンコーダーまたはデコーダーのカスタムフォールバックソリューションを作成するときには、次のメンバーを実装する必要があります。

EncoderFallbackBuffer.Fallback メソッドまたは DecoderFallbackBuffer.Fallback メソッド。 EncoderFallbackBuffer.Fallback は、エンコーダーによって呼び出され、エンコードできない文字に関する情報をフォールバックバッファーに提供します。エンコードされる文字はサロゲートペアである場合もあるため、このメソッドはオーバーロードされます。最初のオーバーロードには、エンコードされる文字と、その文字列内のインデックスが渡されます。 2 番目のオーバーロードには、上位および下位のサロゲートと、その文字列内のインデックスが渡されます。 DecoderFallbackBuffer.Fallback メソッドは、デコーダーによって呼び出され、デコードできないバイトに関する情報をフォールバックバッファーに提供します。このメソッドには、デコードできないバイト配列と、最初のバイトのインデックスが渡されます。フォールバックメソッドは、フォールバックバッファーが最適な文字または置換文字を提供できる場合は true 、それ以外の場合は falseを返します。例外フォールバックの場合は例外をスローします。
EncoderFallbackBuffer.GetNextChar メソッドまたは DecoderFallbackBuffer.GetNextChar メソッド。フォールバックバッファーから次の文字を取得するために、エンコーダーまたはデコーダーによって繰り返し呼び出されます。すべてのフォールバック文字を返し終わったら、このメソッドは U+0000 を返す必要があります。
EncoderFallbackBuffer.Remaining プロパティまたは DecoderFallbackBuffer.Remaining プロパティ。フォールバックバッファー内の残りの文字数を返します。
EncoderFallbackBuffer.MovePrevious メソッドまたは DecoderFallbackBuffer.MovePrevious メソッド。フォールバックバッファー内の現在の位置を前の文字に移動します。
EncoderFallbackBuffer.Reset メソッドまたは DecoderFallbackBuffer.Reset メソッド。フォールバックバッファーを再初期化します。

フォールバックの実装が最適フォールバックまたは置換フォールバックの場合は、 EncoderFallbackBuffer と DecoderFallbackBuffer の派生クラスで、2 つのプライベートインスタンスフィールド (バッファー内の正確な文字数と、次に返される文字のインデックス) も保持します。

EncoderFallback の例

前の例では、置換フォールバックを使用して、対応する ASCII 文字がない Unicode 文字をアスタリスク (*) に置き換えました。次の例では、代わりに最適フォールバックのカスタム実装を使用して、非 ASCII 文字のマッピングを改善しています。

次のコードでは、 CustomMapper から派生する EncoderFallback という名前のクラスを定義して、非 ASCII 文字の最適マッピングを処理します。このクラスの CreateFallbackBuffer メソッドは、 CustomMapperFallbackBuffer の実装を提供する EncoderFallbackBuffer オブジェクトを返します。 CustomMapper クラスは、 Dictionary<TKey,TValue> オブジェクトを使用して、サポートされていない Unicode 文字 (キー値) と、それらに対応する 8 ビット文字とのマッピングを格納します (64 ビット整数の 2 つの連続するバイトに格納されます)。このマッピングをフォールバックバッファーで使用できるようにするには、 CustomMapper のインスタンスを CustomMapperFallbackBuffer のクラスコンストラクターにパラメーターとして渡します。最も長いマッピングは Unicode 文字 U+221E に対応する文字列 "INF" なので、 MaxCharCount プロパティは 3 を返します。

public class CustomMapper : EncoderFallback
{
   public string DefaultString;
   internal Dictionary<ushort, ulong> mapping;

   public CustomMapper() : this("*")
   {
   }

   public CustomMapper(string defaultString)
   {
      this.DefaultString = defaultString;

      // Create table of mappings
      mapping = new Dictionary<ushort, ulong>();
      mapping.Add(0x24C8, 0x53);
      mapping.Add(0x2075, 0x35);
      mapping.Add(0x221E, 0x49004E0046);
   }

   public override EncoderFallbackBuffer CreateFallbackBuffer()
   {
      return new CustomMapperFallbackBuffer(this);
   }

   public override int MaxCharCount
   {
      get { return 3; }
   }
}

Public Class CustomMapper : Inherits EncoderFallback
    Public DefaultString As String
    Friend mapping As Dictionary(Of UShort, ULong)

    Public Sub New()
        Me.New("?")
    End Sub

    Public Sub New(ByVal defaultString As String)
        Me.DefaultString = defaultString

        ' Create table of mappings
        mapping = New Dictionary(Of UShort, ULong)
        mapping.Add(&H24C8, &H53)
        mapping.Add(&H2075, &H35)
        mapping.Add(&H221E, &H49004E0046)
    End Sub

    Public Overrides Function CreateFallbackBuffer() As System.Text.EncoderFallbackBuffer
        Return New CustomMapperFallbackBuffer(Me)
    End Function

    Public Overrides ReadOnly Property MaxCharCount As Integer
        Get
            Return 3
        End Get
    End Property
End Class

次のコードでは、 CustomMapperFallbackBuffer から派生する EncoderFallbackBufferクラスを定義しています。 CustomMapper インスタンスで定義されている、最適マッピングを含むディクショナリは、クラスコンストラクターから取得できます。このクラスの Fallback メソッドは、ASCII エンコーダーでエンコードできない Unicode 文字がマッピングディクショナリで定義されている場合は true を返し、それ以外の場合は falseを返します。フォールバックのたびに、プライベート変数 count は返される残りの文字数を示し、プライベート変数 index は次に返される文字の文字列バッファー内 ( charsToReturn内) の位置を示します。

public class CustomMapperFallbackBuffer : EncoderFallbackBuffer
{
   int count = -1;                   // Number of characters to return
   int index = -1;                   // Index of character to return
   CustomMapper fb;
   string charsToReturn;

   public CustomMapperFallbackBuffer(CustomMapper fallback)
   {
      this.fb = fallback;
   }

   public override bool Fallback(char charUnknownHigh, char charUnknownLow, int index)
   {
      // Do not try to map surrogates to ASCII.
      return false;
   }

   public override bool Fallback(char charUnknown, int index)
   {
      // Return false if there are already characters to map.
      if (count >= 1) return false;

      // Determine number of characters to return.
      charsToReturn = String.Empty;

      ushort key = Convert.ToUInt16(charUnknown);
      if (fb.mapping.ContainsKey(key)) {
         byte[] bytes = BitConverter.GetBytes(fb.mapping[key]);
         int ctr = 0;
         foreach (var byt in bytes) {
            if (byt > 0) {
               ctr++;
               charsToReturn += (char) byt;
            }
         }
         count = ctr;
      }
      else {
         // Return default.
         charsToReturn = fb.DefaultString;
         count = 1;
      }
      this.index = charsToReturn.Length - 1;

      return true;
   }

   public override char GetNextChar()
   {
      // We'll return a character if possible, so subtract from the count of chars to return.
      count--;
      // If count is less than zero, we've returned all characters.
      if (count < 0)
         return '\u0000';

      this.index--;
      return charsToReturn[this.index + 1];
   }

   public override bool MovePrevious()
   {
      // Original: if count >= -1 and pos >= 0
      if (count >= -1) {
         count++;
         return true;
      }
      else {
         return false;
      }
   }

   public override int Remaining
   {
      get { return count < 0 ? 0 : count; }
   }

   public override void Reset()
   {
      count = -1;
      index = -1;
   }
}

Public Class CustomMapperFallbackBuffer : Inherits EncoderFallbackBuffer

    Dim count As Integer = -1        ' Number of characters to return
    Dim index As Integer = -1        ' Index of character to return
    Dim fb As CustomMapper
    Dim charsToReturn As String

    Public Sub New(ByVal fallback As CustomMapper)
        MyBase.New()
        Me.fb = fallback
    End Sub

    Public Overloads Overrides Function Fallback(ByVal charUnknownHigh As Char, ByVal charUnknownLow As Char, ByVal index As Integer) As Boolean
        ' Do not try to map surrogates to ASCII.
        Return False
    End Function

    Public Overloads Overrides Function Fallback(ByVal charUnknown As Char, ByVal index As Integer) As Boolean
        ' Return false if there are already characters to map.
        If count >= 1 Then Return False

        ' Determine number of characters to return.
        charsToReturn = String.Empty

        Dim key As UShort = Convert.ToUInt16(charUnknown)
        If fb.mapping.ContainsKey(key) Then
            Dim bytes() As Byte = BitConverter.GetBytes(fb.mapping.Item(key))
            Dim ctr As Integer
            For Each byt In bytes
                If byt > 0 Then
                    ctr += 1
                    charsToReturn += Chr(byt)
                End If
            Next
            count = ctr
        Else
            ' Return default.
            charsToReturn = fb.DefaultString
            count = 1
        End If
        Me.index = charsToReturn.Length - 1

        Return True
    End Function

    Public Overrides Function GetNextChar() As Char
        ' We'll return a character if possible, so subtract from the count of chars to return.
        count -= 1
        ' If count is less than zero, we've returned all characters.
        If count < 0 Then Return ChrW(0)

        Me.index -= 1
        Return charsToReturn(Me.index + 1)
    End Function

    Public Overrides Function MovePrevious() As Boolean
        ' Original: if count >= -1 and pos >= 0
        If count >= -1 Then
            count += 1
            Return True
        Else
            Return False
        End If
    End Function

    Public Overrides ReadOnly Property Remaining As Integer
        Get
            Return If(count < 0, 0, count)
        End Get
    End Property

    Public Overrides Sub Reset()
        count = -1
        index = -1
    End Sub
End Class

次のコードでは、 CustomMapper オブジェクトをインスタンス化して、そのインスタンスを Encoding.GetEncoding(String, EncoderFallback, DecoderFallback) メソッドに渡しています。出力を見るとわかるように、この最適フォールバックの実装では、元の文字列の 3 つの非 ASCII 文字が正しく処理されます。

using System;
using System.Collections.Generic;
using System.Text;

class Program
{
   static void Main()
   {
      Encoding enc = Encoding.GetEncoding("us-ascii", new CustomMapper(), new DecoderExceptionFallback());

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      for (int ctr = 0; ctr <= str1.Length - 1; ctr++) {
         Console.Write("{0} ", Convert.ToUInt16(str1[ctr]).ToString("X4"));
         if (ctr == str1.Length - 1)
            Console.WriteLine();
      }
      Console.WriteLine();

      // Encode the original string using the ASCII encoder.
      byte[] bytes = enc.GetBytes(str1);
      Console.Write("Encoded bytes: ");
      foreach (var byt in bytes)
         Console.Write("{0:X2} ", byt);

      Console.WriteLine("\n");

      // Decode the ASCII bytes.
      string str2 = enc.GetString(bytes);
      Console.WriteLine("Round-trip: {0}", str1.Equals(str2));
      if (! str1.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

         Console.WriteLine();
      }
   }
}

Imports System.Text
Imports System.Collections.Generic

Module Module1

    Sub Main()
        Dim enc As Encoding = Encoding.GetEncoding("us-ascii", New CustomMapper(), New DecoderExceptionFallback())

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&H24C8), ChrW(&H2075), ChrW(&H221E))
        Console.WriteLine(str1)
        For ctr As Integer = 0 To str1.Length - 1
            Console.Write("{0} ", Convert.ToUInt16(str1(ctr)).ToString("X4"))
            If ctr = str1.Length - 1 Then Console.WriteLine()
        Next
        Console.WriteLine()

        ' Encode the original string using the ASCII encoder.
        Dim bytes() As Byte = enc.GetBytes(str1)
        Console.Write("Encoded bytes: ")
        For Each byt In bytes
            Console.Write("{0:X2} ", byt)
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Decode the ASCII bytes.
        Dim str2 As String = enc.GetString(bytes)
        Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
        If Not str1.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module

次の方法で共有

.NET で文字エンコーディングクラスを使用する方法

エンコーダーとデコーダー

.NET Core でのエンコードのサポート

エンコーディングクラスの選択

エンコーディングオブジェクトの使用

フォールバックストラテジの選択

Best-Fit Fallback

Replacement Fallback

Exception Fallback

Implementing a Custom Fallback Strategy

EncoderFallback または DecoderFallback からの派生

EncoderFallbackBuffer または DecoderFallbackBuffer からの派生

EncoderFallback の例

関連項目

その他のリソース

次の方法で共有

.NET で文字エンコーディング クラスを使用する方法

エンコーダーとデコーダー

.NET Core でのエンコードのサポート

エンコーディング クラスの選択

エンコーディング オブジェクトの使用

フォールバック ストラテジの選択

Best-Fit Fallback

Replacement Fallback

Exception Fallback

Implementing a Custom Fallback Strategy

EncoderFallback または DecoderFallback からの派生

EncoderFallbackBuffer または DecoderFallbackBuffer からの派生

EncoderFallback の例

関連項目

その他のリソース

.NET で文字エンコーディングクラスを使用する方法

エンコーディングクラスの選択

エンコーディングオブジェクトの使用

フォールバックストラテジの選択