如何:使用 LINQ 查询字符串

字符串存储为字符序列。 作为字符序列,可以使用 LINQ 查询它们。 本文中提供了几个示例查询,可查询不同字符或字词的字符串、筛选字符串,或将查询与正则表达式混合。

如何查询字符串中的字符

以下示例查询一个字符串以确定它所包含的数字数量。

string aString = "ABCDE99F-J74-12-89A";

// Select only those characters that are numbers
var stringQuery = from ch in aString
                  where Char.IsDigit(ch)
                  select ch;

// Execute the query
foreach (char c in stringQuery)
    Console.Write(c + " ");

// Call the Count method on the existing query.
int count = stringQuery.Count();
Console.WriteLine($"Count = {count}");

// Select all characters before the first '-'
var stringQuery2 = aString.TakeWhile(c => c != '-');

// Execute the second query
foreach (char c in stringQuery2)
    Console.Write(c);
/* Output:
  Output: 9 9 7 4 1 2 8 9
  Count = 8
  ABCDE99F
*/

前面的查询演示了如何将字符串视为字符序列。

如何对某个词在字符串中出现的次数计数

以下示例演示如何使用 LINQ 查询对指定词在字符串中出现的次数进行计数。 要执行计数,请首先调用 Split 方法来创建字词数组。 使用 Split 方法会产生性能成本。 如果仅对字符串执行字词计数操作,请考虑改用 MatchesIndexOf 方法。

string text = """
    Historically, the world of data and the world of objects 
    have not been well integrated. Programmers work in C# or Visual Basic 
    and also in SQL or XQuery. On the one side are concepts such as classes, 
    objects, fields, inheritance, and .NET APIs. On the other side 
    are tables, columns, rows, nodes, and separate languages for dealing with 
    them. Data types often require translation between the two worlds; there are 
    different standard functions. Because the object world has no notion of query, a 
    query can only be represented as a string without compile-time type checking or 
    IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to 
    objects in memory is often tedious and error-prone. 
    """;

string searchTerm = "data";

//Convert the string into an array of words
char[] separators = ['.', '?', '!', ' ', ';', ':', ','];
string[] source = text.Split(separators, StringSplitOptions.RemoveEmptyEntries);

// Create the query.  Use the InvariantCultureIgnoreCase comparison to match "data" and "Data"
var matchQuery = from word in source
                 where word.Equals(searchTerm, StringComparison.InvariantCultureIgnoreCase)
                 select word;

// Count the matches, which executes the query.
int wordCount = matchQuery.Count();
Console.WriteLine($"""{wordCount} occurrences(s) of the search term "{searchTerm}" were found.""");
/* Output:
   3 occurrences(s) of the search term "data" were found.
*/

前面的查询演示了如何在将字符串拆分为单词序列后将字符串视为单词序列。

如何按任意词或字段对文本数据进行排序或筛选

下面的示例演示如何按行中的任何字段对结构化文本(如以逗号分隔的值)行进行排序。 可以在运行时动态指定该字段。 假定 scores.csv 中的字段表示学生的 ID 号,后跟一系列四个测试分数:

111, 97, 92, 81, 60
112, 75, 84, 91, 39
113, 88, 94, 65, 91
114, 97, 89, 85, 82
115, 35, 72, 91, 70
116, 99, 86, 90, 94
117, 93, 92, 80, 87
118, 92, 90, 83, 78
119, 68, 79, 88, 92
120, 99, 82, 81, 79
121, 96, 85, 91, 60
122, 94, 92, 91, 91

以下查询根据存储在第二列中的第一个考试分数对行进行排序:

// Create an IEnumerable data source
string[] scores = File.ReadAllLines("scores.csv");

// Change this to any value from 0 to 4.
int sortField = 1;

Console.WriteLine($"Sorted highest to lowest by field [{sortField}]:");

// Split the string and sort on field[num]
var scoreQuery = from line in scores
                 let fields = line.Split(',')
                 orderby fields[sortField] descending
                 select line;

foreach (string str in scoreQuery)
{
    Console.WriteLine(str);
}
/* Output (if sortField == 1):
   Sorted highest to lowest by field [1]:
    116, 99, 86, 90, 94
    120, 99, 82, 81, 79
    111, 97, 92, 81, 60
    114, 97, 89, 85, 82
    121, 96, 85, 91, 60
    122, 94, 92, 91, 91
    117, 93, 92, 80, 87
    118, 92, 90, 83, 78
    113, 88, 94, 65, 91
    112, 75, 84, 91, 39
    119, 68, 79, 88, 92
    115, 35, 72, 91, 70
 */

前面的查询演示了如何通过将字符串拆分为字段并查询各个字段来操作字符串。

如何查询包含特定字词的句子

下面的示例显示了如何在文本文件中查找包含指定单词集中每个单词的匹配项的句子。 尽管搜索词数组采用硬编码形式,但也可在运行时以动态方式填充它。 查询将返回包含单词“Historically,”、“data,”和“integrated”的句子。

string text = """
Historically, the world of data and the world of objects 
have not been well integrated. Programmers work in C# or Visual Basic 
and also in SQL or XQuery. On the one side are concepts such as classes, 
objects, fields, inheritance, and .NET APIs. On the other side 
are tables, columns, rows, nodes, and separate languages for dealing with 
them. Data types often require translation between the two worlds; there are 
different standard functions. Because the object world has no notion of query, a 
query can only be represented as a string without compile-time type checking or 
IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to 
objects in memory is often tedious and error-prone.
""";

// Split the text block into an array of sentences.
string[] sentences = text.Split(['.', '?', '!']);

// Define the search terms. This list could also be dynamically populated at run time.
string[] wordsToMatch = [ "Historically", "data", "integrated" ];

// Find sentences that contain all the terms in the wordsToMatch array.
// Note that the number of terms to match is not specified at compile time.
char[] separators = ['.', '?', '!', ' ', ';', ':', ','];
var sentenceQuery = from sentence in sentences
                    let w = sentence.Split(separators,StringSplitOptions.RemoveEmptyEntries)
                    where w.Distinct().Intersect(wordsToMatch).Count() == wordsToMatch.Count()
                    select sentence;

foreach (string str in sentenceQuery)
{
    Console.WriteLine(str);
}
/* Output:
Historically, the world of data and the world of objects have not been well integrated
*/

查询首先将文本拆分为句子,然后将每个句子拆分为包含每个单词的字符串数组。 对于每个数组,Distinct 方法将删除所有重复字词,然后查询将对字词数组和 wordsToMatch 数组执行 Intersect 操作。 如果相交数与 wordsToMatch 数组的计数相同,将在单词中找到所有单词并返回原始句子。

调用 Split 将使用标点符号作为分隔符,以从字符串中移除它们。 如果未移除标点符号,例如,你可能有一个字符串“Historically”,它不会与 wordsToMatch 数组中的“Historically”相匹配。 根据在源文本中找到的标点类型,可能需要使用其他分隔符。

如何将 LINQ 查询与正则表达式合并在一起

此示例显示了如何使用 Regex 类在文本字符串中创建正则表达式来实现更复杂的匹配。 通过 LINQ 查询可以轻松地准确筛选要用正则表达式搜索的文件,并对结果进行改良。

string startFolder = """C:\Program Files\dotnet\sdk""";
// Or
// string startFolder = "/usr/local/share/dotnet/sdk";

// Take a snapshot of the file system.
var fileList = from file in Directory.GetFiles(startFolder, "*.*", SearchOption.AllDirectories)
                let fileInfo = new FileInfo(file)
                select fileInfo;

// Create the regular expression to find all things "Visual".
System.Text.RegularExpressions.Regex searchTerm =
    new System.Text.RegularExpressions.Regex(@"microsoft.net.(sdk|workload)");

// Search the contents of each .htm file.
// Remove the where clause to find even more matchedValues!
// This query produces a list of files where a match
// was found, and a list of the matchedValues in that file.
// Note: Explicit typing of "Match" in select clause.
// This is required because MatchCollection is not a
// generic IEnumerable collection.
var queryMatchingFiles =
    from file in fileList
    where file.Extension == ".txt"
    let fileText = File.ReadAllText(file.FullName)
    let matches = searchTerm.Matches(fileText)
    where matches.Count > 0
    select new
    {
        name = file.FullName,
        matchedValues = from System.Text.RegularExpressions.Match match in matches
                        select match.Value
    };

// Execute the query.
Console.WriteLine($"""The term "{searchTerm}" was found in:""");

foreach (var v in queryMatchingFiles)
{
    // Trim the path a bit, then write
    // the file name in which a match was found.
    string s = v.name.Substring(startFolder.Length - 1);
    Console.WriteLine(s);

    // For this file, write out all the matching strings
    foreach (var v2 in v.matchedValues)
    {
        Console.WriteLine($"  {v2}");
    }
}

还可以查询 RegEx 搜索返回 MatchCollection 的对象。 结果中仅生成每个匹配项的值。 但也可以使用 LINQ 对该集合执行筛选、排序和分组等各种操作。 由于 MatchCollection 为非泛型 IEnumerable 集合,因此必须显式声明查询中范围变量的类型。