使用 Unicode 編碼方式

發行項
08/21/2008

更新：2007 年 11 月

將 Common Language Runtime 設定為目標的應用程式會使用編碼方式，將字元表示從原生 (Native) 字元配置 (Unicode) 對應為其他配置。應用程式使用解碼方式從非原生配置 (非 Unicode) 對應到原生配置。System.Text 命名空間提供數個類別，供您的應用程式編碼和解碼字元。了解編碼提供這些類別的簡介。

Unicode 轉換格式

Unicode 標準會指派字碼指標 (數字) 給每個受支援指令碼中的每個字元。Unicode 轉換格式 (UTF) 是用來編碼該字碼指標的格式。Unicode 標準 3.2 版使用 UTF 和下表中所定義的其他編碼。對於所有編碼，內部 .NET Framework 字串都是原生 UTF-16 字串。

Unicode UTF-32 編碼方式
將 Unicode 字元表示成 32 位元整數序列。應用程式可以使用 UTF32Encoding 類別，將字元轉換成 UTF-32 編碼或是從其轉換回來。

當編碼空間對作業系統十分重要，而應用程式需要在作業系統上避免 UTF-16 的 Surrogate 字碼指標行為，可以使用 UTF-32。請注意，在畫面上呈現的單個「圖像」(Glyph) 仍可以使用一個以上的 UTF-32 字元編碼。目前會受到此行為影響的補充字元目前要比 Unicode BMP 字元少許多。
Unicode UTF-16 編碼方式
將 Unicode 字元表示成 16 位元整數序列。您的應用程式可以使用 UnicodeEncoding 類別，將字元轉換成 UTF-16 編碼或是從其轉換回來。

UTF-16 通常會以原生形式使用，如在 Microsoft.Net char 型別、Windows WCHAR 型別和其他常用型別。大多數常用 Unicode 字碼指標只能接受一個 UTF-16 字碼指標 (2 個位元組)。U+10000 及以上的 Unicode 補充字元仍需要兩個 UTF-16 Surrogate 字碼指標。
Unicode UTF-8 編碼方式
將 Unicode 字元表示成 8 位元的位元組序列。您的應用程式可以使用 UTF8Encoding 類別，將字元轉換成 UTF-8 或是從其轉換回來。

UTF-8 可使用 8 位元的資料大小編碼，而且適用於許多現有的作業系統。對於 ASCII 字元範圍，UTF-8 與 ASCII 編碼完全相同，而且允許範圍更廣的字元集。不過，對於 CJK 指令碼，UTF-8 則是每個字元需要三個位元組，資料大小可能比 UTF-16 還大。請注意，ASCII 資料量 (如 HTML 標記) 有時可能是 CJK 範圍大小增加的原因。
Unicode UTF-7 編碼方式
將 Unicode 字元表示成 7 位元 ASCII 字元序列。您的應用程式可以使用 UTF7Encoding 類別，將字元轉換成 UTF-7 或是從其轉換回來。非 ASCII 的 Unicode 字元則以 ASCII 字元的逸出序列 (Escape Sequence) 表示。

UTF-7 支援需要這種編碼的特定通訊協定，這些多半是電子郵件和新聞群組通訊協定。然而，UTF-7 並沒有特別安全或穩固。在某些情況下，變更一個位元便可能會徹底改變整個 UTF-7 字串的解譯。在其他情況下，不同的 UTF-7 字串可能會編碼成相同的文字。對於包含非 ASCII 字元的序列而言，UTF-7 的有效空間 (Space-Efficient) 效率比 UTF-8 低很多，而且編碼/解碼的速度比較慢。因此，您的應用程式通常應優先選擇 UTF-8 而非 UTF-7。
ASCII 編碼方式
將拉丁字母編碼成單一的 7 位元 ASCII 字元。由於這個編碼方式僅支援 U+0000 到 U+007F 之間的字元值，因此大部分的情況下，這並不適用於國際化的應用程式。您的應用程式可以使用 ASCIIEncoding 類別，將字元轉換成 ASCII 編碼或是從其轉換回來。如需在程式碼中使用這個類別的範例，請參閱將基底型別編碼。
ANSI/ISO 編碼方式
用於非 Unicode 編碼方式。Encoding 類別支援各式各樣的 ANSI/ISO 編碼方式。

在字串中傳遞二進位資料

不論是位元組或字元，數字的隨機集合並無法構成有效的字串或有效的 Unicode。您的應用程式無法將位元組陣列轉換成 Unicode 或是從其轉換回來，且預期它能正常運作。某些字元和字碼指標序列在 Unicode 5.0 中是不合法的，而且無法使用任何 Unicode 編碼方式轉換。如果您的應用程式必須以字串格式傳遞二進位資料，就應使用 Base 64 或專為此設計的其他格式。

使用 Encoding 類別

您的應用程式可以使用 GetEncoding 方法傳回指定編碼方式的編碼方式物件。應用程式可以使用 GetBytes 方法，將 Unicode 字串轉換成指定編碼方式中的位元組表示。

下列程式碼範例使用 GetEncoding 方法建立指定字碼頁的目標編碼方式物件。GetBytes 方法將在目標編碼方式物件上呼叫，將 Unicode 字串轉換成其在目標編碼方式中的位元組表示。然後顯示指定字碼頁的字串位元表示。

Imports System
Imports System.IO
Imports System.Globalization
Imports System.Text

Public Class Encoding_UnicodeToCP
   Public Shared Sub Main()
      ' Converts ASCII characters to bytes.
      ' Displays the string's byte representation in the 
      ' specified code page.
      ' Code page 1252 represents Latin characters.
      PrintCPBytes("Hello, World!", 1252)
      ' Code page 932 represents Japanese characters.
      PrintCPBytes("Hello, World!", 932)
      
      ' Converts Japanese characters.
      PrintCPBytes("\u307b,\u308b,\u305a,\u3042,\u306d",1252)
      PrintCPBytes("\u307b,\u308b,\u305a,\u3042,\u306d",932)
   End Sub

   Public Shared Sub PrintCPBytes(str As String, codePage As Integer)
      Dim targetEncoding As Encoding
      Dim encodedChars() As Byte      
      
      ' Gets the encoding for the specified code page.
      targetEncoding = Encoding.GetEncoding(codePage)
      
      ' Gets the byte representation of the specified string.
      encodedChars = targetEncoding.GetBytes(str)
      
      ' Prints the bytes.
      Console.WriteLine("Byte representation of '{0}' in CP '{1}':", _
         str, codePage)
      Dim i As Integer
      For i = 0 To encodedChars.Length - 1
         Console.WriteLine("Byte {0}: {1}", i, encodedChars(i))
      Next i
   End Sub
End Class

using System;
using System.IO;
using System.Globalization;
using System.Text;

public class Encoding_UnicodeToCP
{
   public static void Main()
   {
      // Converts ASCII characters to bytes.
      // Displays the string's byte representation in the 
      // specified code page.
      // Code page 1252 represents Latin characters.
      PrintCPBytes("Hello, World!",1252);
      // Code page 932 represents Japanese characters.
      PrintCPBytes("Hello, World!",932);

      // Converts Japanese characters to bytes.
      PrintCPBytes("\u307b,\u308b,\u305a,\u3042,\u306d",1252);
      PrintCPBytes("\u307b,\u308b,\u305a,\u3042,\u306d",932);
   }

   public static void PrintCPBytes(string str, int codePage)
   {
      Encoding targetEncoding;
      byte[] encodedChars;

      // Gets the encoding for the specified code page.
      targetEncoding = Encoding.GetEncoding(codePage);

      // Gets the byte representation of the specified string.
      encodedChars = targetEncoding.GetBytes(str);

      // Prints the bytes.
      Console.WriteLine
               ("Byte representation of '{0}' in Code Page  '{1}':", str, 
                  codePage);
      for (int i = 0; i < encodedChars.Length; i++)
               Console.WriteLine("Byte {0}: {1}", i, encodedChars[i]);
   }
}

注意事項：
如果您在主控台應用程式 (Console Application) 中使用這段程式碼，指定的 Unicode 文字項目可能無法正確顯示。主控台環境中的 Unicode 字元支援會根據執行的 Windows 作業系統版本而有所不同。

您可以在 ASP.NET 應用程式中使用這些方法，來決定回應字元所使用的編碼方式。應用程式應該將 ContentEncoding 屬性值設為適當方法所傳回的值。下列程式碼範例示範如何設定 HttpResponse.ContentEncoding。

' Explicitly sets ContentEncoding to UTF-8.
Response.ContentEncoding = Encoding.UTF8

' Sets ContentEncoding using the name of an encoding.
Response.ContentEncoding = Encoding.GetEncoding(name)

' Sets ContentEncoding using a code page number.
Response.ContentEncoding = Encoding.GetEncoding(codepageNumber)

// Explicitly sets the encoding to UTF-8.
Response.ContentEncoding = Encoding.UTF8;

// Sets ContentEncoding using the name of an encoding.
Response.ContentEncoding = Encoding.GetEncoding(name);

// Sets ContentEncoding using a code page number.
Response.ContentEncoding = Encoding.GetEncoding(codepageNumber);

對於多數的 ASP.NET 應用程式，如果要以使用者所預期的編碼方式顯示文字，您應該使 ContentEncoding 屬性與 ContentEncoding 屬性相符。

如需在 ASP.NET 中使用編碼方式的詳細資訊，請參閱通用工作快速入門中的多重編碼方式範例，以及 ASP.NET 快速入門中的設定文化特性和編碼方式範例。

請參閱

概念

了解編碼

.NET Framework 中的 Unicode

共用方式為

使用 Unicode 編碼方式

Unicode 轉換格式

在字串中傳遞二進位資料

使用 Encoding 類別

請參閱

概念

其他資源