Comparing and Sorting Data for a Specific Culture
Updated: August 2011
Conventions for sorting and ordering data vary from culture to culture. For example, sort order may be case-sensitive or case-insensitive. It may be based on phonetics or on the visual representation of characters. In East Asian languages, characters are sorted by the stroke and radical of ideographs. Sorting also depends on the order languages and cultures use for the alphabet. For example, the Swedish language has an "Æ" character that it sorts after "Z" in the alphabet. The German language also has this character, but sorts it like "ae", after "A" in the alphabet. A world-ready application must be able to compare and sort data on a per-culture basis to support culture-specific and language-specific sorting conventions.
Note In some scenarios, culture-sensitive behavior is not desirable. For more information about when and how to perform culture-insensitive operations, see Culture-Insensitive String Operations.
Comparing Strings
The CompareInfo class provides methods you can use to perform culture-sensitive string comparisons. The CultureInfo class has a CompareInfo property that gets an instance of the CompareInfo class. A CompareInfo object defines how to compare and sort strings for a specific culture. The String.Compare method uses the information in a culture's CompareInfo object to compare strings.
The following example illustrates how the String.Compare method evaluates two strings ("Apple" and "Æble") differently, depending on the culture used for the comparison. First, the System.Threading.Thread.CurrentThread.CurrentCulture property is set to da-DK for the Danish (Denmark) culture. The Danish language treats the character "Æ" as an individual letter and sorts it after "Z" in the alphabet. Therefore, the string "Æble" is determined to be greater than "Apple" for the Danish (Denmark) culture. Next, the System.Threading.Thread.CurrentThread.CurrentCulture property is set to en-US for the English (United States) culture, The English language treats the character "Æ" as a special symbol and sorts it before the letter "A" in the alphabet. Therefore, the string "Æble" is determined to be less than "Apple" for the English (United States) culture.
Imports System.Globalization
Imports System.Threading
Public Class TestClass
Public Shared Sub Main()
Dim str1 As String = "Apple"
Dim str2 As String = "Æble"
' Set the current culture to Danish in Denmark.
Thread.CurrentThread.CurrentCulture = New CultureInfo("da-DK")
Dim result1 As Integer = [String].Compare(str1, str2)
Console.WriteLine("When the CurrentCulture is ""da-DK"",")
Console.WriteLine("the result of comparing_{0} with {1} is: {2}",
str1, str2, result1)
' Set the current culture to English in the U.S.
Thread.CurrentThread.CurrentCulture = New CultureInfo("en-US")
Dim result2 As Integer = [String].Compare(str1, str2)
Console.WriteLine("When the CurrentCulture is""en-US"",")
Console.WriteLine("the result of comparing {0} with {1} is: {2}",
str1, str2,result2)
End Sub
End Class
' The example displays the following output:
' When the CurrentCulture is "da-DK",
' the result of comparing Apple with Æble is: -1
'
' When the CurrentCulture is "en-US",
' the result of comparing Apple with Æble is: 1
using System;
using System.Globalization;
using System.Threading;
public class CompareStringSample
{
public static void Main()
{
string str1 = "Apple";
string str2 = "Æble";
// Sets the CurrentCulture to Danish in Denmark.
Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
// Compares the two strings.
int result1 = String.Compare(str1, str2);
Console.WriteLine("\nWhen the CurrentCulture is \"da-DK\",\nthe " +
"result of comparing {0} with {1} is: {2}", str1, str2,
result1);
// Sets the CurrentCulture to English in the U.S.
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
// Compares the two strings.
int result2 = String.Compare(str1, str2);
Console.WriteLine("\nWhen the CurrentCulture is \"en-US\",\nthe " +
"result of comparing {0} with {1} is: {2}", str1, str2,
result2);
}
}
// The example displays the following output:
// When the CurrentCulture is "da-DK",
// the result of comparing Apple with Æble is: -1
//
// When the CurrentCulture is "en-US",
// the result of comparing Apple with Æble is: 1
For more information about comparing strings, see Comparing Strings.
Using Alternate Sort Orders
Some cultures support more than one sort order. For example, the zh-CN (Chinese - PRC) culture supports two sort orders: by pronunciation (default) and by stroke count. When you create a CultureInfo object by using a culture name (for example, zh-CN), the default sort order is used. To specify the alternate sort order, create a CultureInfo object by calling the CultureInfo.CultureInfo(Int32) or CultureInfo.CultureInfo(Int32, Boolean) constructors and using the identifier for the alternate sort order, and then obtain a CompareInfo object from the CompareInfo property to use in string comparisons. Alternatively, you can create a CompareInfo object directly by using the CompareInfo.GetCompareInfo method, and specify the identifier for the alternate sort order.
The following table lists the cultures that support alternate sort orders and the identifiers for the default and alternate sort orders.
Culture name |
Culture |
Default sort name and identifier |
Alternate sort name and identifier |
---|---|---|---|
es-ES |
Spanish (Spain) |
International: 0x00000C0A |
Traditional: 0x0000040A |
zh-TW |
Chinese (Taiwan) |
Stroke Count: 0x00000404 |
Bopomofo: 0x00030404 |
zh-CN |
Chinese (PRC) |
Pronunciation: 0x00000804 |
Stroke Count: 0x00020804 |
zh-HK |
Chinese (Hong Kong SAR) |
Stroke Count: 0x00000c04 |
Stroke Count: 0x00020c04 |
zh-SG |
Chinese (Singapore) |
Pronunciation: 0x00001004 |
Stroke Count: 0x00021004 |
zh-MO |
Chinese (Macao SAR) |
Pronunciation: 0x00001404 |
Stroke Count: 0x00021404 |
ja-JP |
Japanese (Japan) |
Default: 0x00000411 |
Unicode: 0x00010411 |
ko-KR |
Korean (Korea) |
Default: 0x00000412 |
Korean Xwansung - Unicode: 0x00010412 |
de-DE |
German (Germany) |
Dictionary: 0x00000407 |
Phone Book Sort DIN: 0x00010407 |
hu-HU |
Hungarian (Hungary) |
Default: 0x0000040e |
Technical Sort: 0x0001040e |
ka-GE |
Georgian (Georgia) |
Traditional: 0x00000437 |
Modern Sort: 0x00010437 |
Searching Strings
You can call the overloaded CompareInfo.IndexOf method to retrieve the zero-based index of a character or substring within a specified string. The method returns -1 if the character or substring is not found. When searching for a specified character, the IndexOf overloads that accept a parameter of type CompareOptions may perform the comparison differently from the method overloads that do not accept this parameter. The method overloads without this parameter perform a culture-sensitive, case-sensitive search. For example, a Unicode value that represents a precomposed character such as the ligature "Æ" (\u00C6) might be considered equivalent to any occurrence of its components in the correct sequence, such as "AE" (\u0041\u0045), depending on the culture. To perform an ordinal (culture-insensitive) search for exact Unicode values, use one of the CompareInfo.IndexOf overloads that take a parameter of type CompareOptions and set the parameter to Ordinal.
You can also call overloads of the String.IndexOf method that search for a character to perform an ordinal (culture-insensitive) search. Note that the overloads of this method that search for a string perform a culture-sensitive search.
The following example illustrates the difference in the results returned by the CompareInfo.IndexOf method depending on culture. The example creates a CultureInfo object for the Danish (Denmark) and English (United States) cultures and uses the overloads of the CompareInfo.IndexOf method to search for the character "æ" in the strings "æble" and "aeble". For the Danish (Denmark) culture, the CompareInfo.IndexOf(String, Char) method and the CompareInfo.IndexOf(String, Char, CompareOptions) method that has a comparison option of CompareOptions.Ordinal return the same value for each string. This indicates that the character "æ" is considered equivalent only to the Unicode value \u00E6. For the English (United States) culture, the two overloads return different results when searching for "æ" in the string "aeble". This indicates that the culture-sensitive comparison performed by the CompareInfo.IndexOf(String, Char) method evaluates the character "æ" as equivalent to its components "a" and "e".
Imports System.Globalization
Imports System.Threading
Public Class Example
Public Shared Sub Main()
Dim str1 As String = "æble"
Dim str2 As String = "aeble"
Dim find As Char = "æ"c
' Create CultureInfo objects representing the Danish (Denmark)
' and English (United States) cultures.
Dim cultures() As CultureInfo = { CultureInfo.CreateSpecificCulture("da-DK"),
CultureInfo.CreateSpecificCulture("en-US") }
For Each ci In cultures
Thread.CurrentThread.CurrentCulture = ci
Dim result1 As Integer = ci.CompareInfo.IndexOf(str1, find)
Dim result2 As Integer = ci.CompareInfo.IndexOf(str2, find)
Dim result3 As Integer = ci.CompareInfo.IndexOf(str1, find, _
CompareOptions.Ordinal)
Dim result4 As Integer = ci.CompareInfo.IndexOf(str2, find, _
CompareOptions.Ordinal)
Console.WriteLine("The current culture is {0}",
CultureInfo.CurrentCulture.Name)
Console.WriteLine()
Console.WriteLine(" CompareInfo.IndexOf(string, char) method:")
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str1, result1)
Console.WriteLine()
Console.WriteLine(" CompareInfo.IndexOf(string, char) method:")
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str2, result2)
Console.WriteLine()
Console.WriteLine(" CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method")
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str1, result3)
Console.WriteLine()
Console.WriteLine(" CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method")
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str2, result4)
Console.WriteLine()
Next
End Sub
End Class
' The example displays the following output:
' The current culture is da-DK
'
' CompareInfo.IndexOf(string, char) method:
' Position of æ in the string æble: 0
'
' CompareInfo.IndexOf(string, char) method:
' Position of æ in the string aeble: -1
'
' CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
' Position of æ in the string æble: 0
'
' CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
' Position of æ in the string aeble: -1
'
' The current culture is en-US
'
' CompareInfo.IndexOf(string, char) method:
' Position of æ in the string æble: 0
'
' CompareInfo.IndexOf(string, char) method:
' Position of æ in the string aeble: 0
'
' CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
' Position of æ in the string æble: 0
'
' CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
' Position of æ in the string aeble: -1
using System;
using System.Globalization;
using System.Threading;
public class Example
{
public static void Main()
{
string str1 = "æble";
string str2 = "aeble";
char find = 'æ';
// Create CultureInfo objects representing the Danish (Denmark)
// and English (United States) cultures.
CultureInfo[] cultures = { CultureInfo.CreateSpecificCulture("da-DK"),
CultureInfo.CreateSpecificCulture("en-US") };
foreach (var ci in cultures) {
Thread.CurrentThread.CurrentCulture = ci;
int result1 = ci.CompareInfo.IndexOf(str1, find);
int result2 = ci.CompareInfo.IndexOf(str2, find);
int result3 = ci.CompareInfo.IndexOf(str1, find,
CompareOptions.Ordinal);
int result4 = ci.CompareInfo.IndexOf(str2, find,
CompareOptions.Ordinal);
Console.WriteLine("\nThe current culture is {0}",
CultureInfo.CurrentCulture.Name);
Console.WriteLine("\n CompareInfo.IndexOf(string, char) method:");
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str1, result1);
Console.WriteLine("\n CompareInfo.IndexOf(string, char) method:");
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str2, result2);
Console.WriteLine("\n CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method");
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str1, result3);
Console.WriteLine("\n CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method");
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str2, result4);
Console.WriteLine();
}
}
}
// The example displays the following output
// The current culture is da-DK
//
// CompareInfo.IndexOf(string, char) method:
// Position of æ in the string æble: 0
//
// CompareInfo.IndexOf(string, char) method:
// Position of æ in the string aeble: -1
//
// CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
// Position of æ in the string æble: 0
//
// CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
// Position of æ in the string aeble: -1
//
//
// The current culture is en-US
//
// CompareInfo.IndexOf(string, char) method:
// Position of æ in the string æble: 0
//
// CompareInfo.IndexOf(string, char) method:
// Position of æ in the string aeble: 0
//
// CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
// Position of æ in the string æble: 0
//
// CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
// Position of æ in the string aeble: -1
Sorting Strings
You can use some of the overloads of the Array.Sort method to sort arrays based on the current culture. The following example creates an array of three strings. First, it sets the System.Threading.Thread.CurrentThread.CurrentCulture property to en-US and calls the Array.Sort(Array) method. The resulting sort order is based on sorting conventions for the English (United States) culture. Next, the example sets the System.Threading.Thread.CurrentThread.CurrentCulture property to da-DK and calls the Array.Sort method again. Notice how the resulting sort order differs from the en-US results because it uses the sorting conventions for Danish (Denmark).
Imports System.Globalization
Imports System.IO
Imports System.Threading
Public Class TextToFile
Public Shared Sub Main()
' Creates and initializes a new array to store
' these date/time objects.
Dim stringArray() As String = { "Apple", "Æble", "Zebra"}
' Displays the values of the array.
Console.WriteLine("The original string array:")
PrintIndexAndValues(stringArray)
' Set the CurrentCulture to "en-US".
Thread.CurrentThread.CurrentCulture = New CultureInfo("en-US")
' Sort the values of the Array.
Array.Sort(stringArray)
' Display the values of the array.
Console.WriteLine("After sorting for the ""en-US"" culture:")
PrintIndexAndValues(stringArray)
' Set the CurrentCulture to "da-DK".
Thread.CurrentThread.CurrentCulture = New CultureInfo("da-DK")
' Sort the values of the Array.
Array.Sort(stringArray)
' Displays the values of the Array.
Console.WriteLine("After sorting for the culture ""da-DK"":")
PrintIndexAndValues(stringArray)
End Sub
Public Shared Sub PrintIndexAndValues(myArray() As String)
For i As Integer = myArray.GetLowerBound(0) To myArray.GetUpperBound(0)
Console.WriteLine("[{0}]: {1}", i, myArray(i))
Next
Console.WriteLine()
End Sub
End Class
' The example displays the following output:
' The original string array:
' [0]: Apple
' [1]: Æble
' [2]: Zebra
'
' After sorting for the "en-US" culture:
' [0]: Æble
' [1]: Apple
' [2]: Zebra
'
' After sorting for the culture "da-DK":
' [0]: Apple
' [1]: Zebra
' [2]: Æble
using System;
using System.Globalization;
using System.Threading;
public class ArraySort
{
public static void Main(String[] args)
{
// Create and initialize a new array to store the strings.
string[] stringArray = { "Apple", "Æble", "Zebra"};
// Display the values of the array.
Console.WriteLine( "The original string array:");
PrintIndexAndValues(stringArray);
// Set the CurrentCulture to "en-US".
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
// Sort the values of the array.
Array.Sort(stringArray);
// Display the values of the array.
Console.WriteLine("After sorting for the culture \"en-US\":");
PrintIndexAndValues(stringArray);
// Set the CurrentCulture to "da-DK".
Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
// Sort the values of the Array.
Array.Sort(stringArray);
// Display the values of the array.
Console.WriteLine("After sorting for the culture \"da-DK\":");
PrintIndexAndValues(stringArray);
}
public static void PrintIndexAndValues(string[] myArray)
{
for (int i = myArray.GetLowerBound(0); i <=
myArray.GetUpperBound(0); i++ )
Console.WriteLine("[{0}]: {1}", i, myArray[i]);
Console.WriteLine();
}
}
// The example displays the following output:
// The original string array:
// [0]: Apple
// [1]: Æble
// [2]: Zebra
//
// After sorting for the "en-US" culture:
// [0]: Æble
// [1]: Apple
// [2]: Zebra
//
// After sorting for the culture "da-DK":
// [0]: Apple
// [1]: Zebra
// [2]: Æble
Using Sort Keys
The .NET Framework uses sort keys to support culturally sensitive sort operations. Each character in a string is given several categories of sort weights, including alphabetic, case, and diacritic. A sort key provides a repository of these weights for a particular string. For example, a sort key might contain a string of alphabetic weights, followed by a string of case weights, and so on. For additional information about sort keys, see the Unicode Standard at the Unicode Technical Standard #10: Unicode Collation Algorithm.
In the .NET Framework, the SortKey class maps strings to their sort keys. You can use the CompareInfo.GetSortKey method to create a sort key for a string that you specify. The result is a sequence of bytes that can differ depending on the CurrentCulture property and the CompareOptions value specified. For example, if you specify the value CompareOptions.IgnoreCase when creating a sort key, a string comparison operation using the sort key is case-insensitive.
After you create a sort key for a string, you can pass it as a parameter to methods provided by the SortKey class. The SortKey.Compare method lets you compare sort keys. Because this method performs a simple byte-by-byte comparison, it is much faster than the String.Compare method. If your application performs a large number of sorting operations, you can improve its performance by generating and storing sort keys for all the strings that it uses. When a sort or comparison operation is required, the application can use the sort keys instead of the strings.
The following example creates sort keys for two strings (str1 and str2) when the CurrentCulture property is set to da-DK. It compares the two strings by using the SortKey.Compare method and displays the results. The method returns a negative integer if str1 is less than str2, 0 (zero) if str1 and str2 are equal, and a positive integer if str1 is greater than str2. Next, the example sets the System.Threading.Thread.CurrentThread.CurrentCulture property to en-US and creates new sort keys for the same strings. The example compares the sort keys and displays the results. Notice that the sort results differ based on the setting for the current culture. Although the results of the following example are identical to the results of the Comparing Strings example earlier in this topic, using the SortKey.Compare method is faster than using the String.Compare method.
Imports System.Globalization
Imports System.Threading
Public Class SortKeySample
Public Shared Sub Main()
Dim str1 As [String] = "Apple"
Dim str2 As [String] = "Æble"
' Set the CurrentCulture to "da-DK".
Dim dk As New CultureInfo("da-DK")
Thread.CurrentThread.CurrentCulture = dk
' Create a culturally sensitive sort key for str1.
Dim sc1 As SortKey = dk.CompareInfo.GetSortKey(str1)
' Create a culturally sensitive sort key for str2.
Dim sc2 As SortKey = dk.CompareInfo.GetSortKey(str2)
' Compare the two sort keys and display the results.
Dim result1 As Integer = SortKey.Compare(sc1, sc2)
Console.WriteLine("When the current culture is ""da-DK"",")
Console.WriteLine("the result of comparing {0} with {1} is: {2}",
str1, str2, result1)
Console.WriteLine()
' Set the CurrentCulture to "en-US".
Dim enus As New CultureInfo("en-US")
Thread.CurrentThread.CurrentCulture = enus
' Create a culturally sensitive sort key for str1.
Dim sc3 As SortKey = enus.CompareInfo.GetSortKey(str1)
' Create a culturally sensitive sort key for str1.
Dim sc4 As SortKey = enus.CompareInfo.GetSortKey(str2)
' Compare the two sort keys and display the results.
Dim result2 As Integer = SortKey.Compare(sc3, sc4)
Console.WriteLine("When the CurrentCulture is ""en-US"",")
Console.WriteLine("the result of comparing {0} with {1} is: {2}",
str1, str2, result2)
End Sub
End Class
' The example displays the following output:
' When the current culture is "da-DK",
' the result of comparing Apple with Æble is: -1
'
' When the CurrentCulture is "en-US",
' the result of comparing Apple with Æble is: 1
using System;
using System.Threading;
using System.Globalization;
public class SortKeySample
{
public static void Main(String[] args)
{
String str1 = "Apple";
String str2 = "Æble";
// Set the CurrentCulture to "da-DK".
CultureInfo dk = new CultureInfo("da-DK");
Thread.CurrentThread.CurrentCulture = dk;
// Create a culturally sensitive sort key for str1.
SortKey sc1 = dk.CompareInfo.GetSortKey(str1);
// Create a culturally sensitive sort key for str2.
SortKey sc2 = dk.CompareInfo.GetSortKey(str2);
// Compare the two sort keys and display the results.
int result1 = SortKey.Compare(sc1, sc2);
Console.WriteLine("When the CurrentCulture is \"da-DK\",");
Console.WriteLine("the result of comparing {0} with {1} is: {2}\n",
str1, str2, result1);
// Set the CurrentCulture to "en-US".
CultureInfo enus = new CultureInfo("en-US");
Thread.CurrentThread.CurrentCulture = enus ;
// Create a culturally sensitive sort key for str1.
SortKey sc3 = enus.CompareInfo.GetSortKey(str1);
// Create a culturally sensitive sort key for str1.
SortKey sc4 = enus.CompareInfo.GetSortKey(str2);
// Compare the two sort keys and display the results.
int result2 = SortKey.Compare(sc3, sc4);
Console.WriteLine("When the CurrentCulture is \"en-US\",");
Console.WriteLine("the result of comparing {0} with {1} is: {2}",
str1, str2, result2);
}
}
// The example displays the following output:
// When the CurrentCulture is "da-DK",
// the result of comparing Apple with Æble is: -1
//
// When the CurrentCulture is "en-US",
// the result of comparing Apple with Æble is: 1
Normalization
You can normalize strings to uppercase or lowercase before sorting them. Rules for string sorting and casing are language-specific, and rules vary even within Latin script-based languages. Only a few languages (including English) provide a sort order that matches the order of the code points; for example, A [65] comes before B [66]. For this reason, do not rely on code points to perform accurate sorting and string comparisons.
The .NET Framework supports all Unicode normalization forms, and does not enforce or guarantee a specific form of normalization. You are responsible for choosing the appropriate normalization for your applications.
For more information about string normalization, see Normalization and Sorting.
See Also
Concepts
Culture-Insensitive String Operations
Other Resources
Change History
Date |
History |
Reason |
---|---|---|
August 2011 |
Revised extensively. |
Information enhancement. |