如何在目录树中查询重复文件 (LINQ) (Visual Basic)
有时,具有相同名称的文件可能位于多个文件夹中。 例如,在 Visual Studio 安装文件夹下,多个文件夹中都有 readme.htm 文件。 此示例显示如何在指定根文件夹下查询此类重复文件名。 第二个示例显示如何查询大小和创建时间都匹配的文件。
示例
Module QueryDuplicateFileNames
Public Sub Main()
Dim path As String = "C:\Program Files\Microsoft Visual Studio 9.0\Common7"
QueryDuplicates1(path)
' Uncomment to run this query instead
' QueryDuplicates2(path)
End Sub
Sub QueryDuplicates1(ByVal root As String)
Dim dir As New System.IO.DirectoryInfo(root)
Dim duplicates = From aFile In dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories) _
Order By aFile.Name _
Group aFile By aFile.Name Into newGroup = Group _
Where newGroup.Count() >= 2 _
Select newGroup
' Page the display so that the results can be read.
Dim trimLength = root.Length
PageOutput(duplicates, trimLength)
End Sub
Sub QueryDuplicates2(ByVal root As String)
' This time a composite key is used. This sub finds all files
' that have been copied into multiple subfolders.
Dim dir As New System.IO.DirectoryInfo(root)
Dim duplicates = From aFile In Dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories) _
Order By aFile.Name _
Group aFile By aFile.Name, aFile.CreationTime, aFile.Length Into newGroup = Group _
Where newGroup.Count() >= 2 _
Select newGroup
' Page the display so that the results can be read.
Dim trimLength = root.Length
PageOutput(duplicates, trimLength)
End Sub
' Pages console display for large query results. No more than one group per page.
' This sub specifically works with group queries of FileInfo objects
' but can be modified for any type.
Sub PageOutput(ByVal groupQuery, ByVal charsToSkip)
' "3" = 1 line for extension key + 1 for "Press any key" + 1 for input cursor.
Dim numLines As Integer = Console.WindowHeight - 3
' Flag to indicate whether there are more results to display
Dim goAgain As Boolean = True
For Each fg As IEnumerable(Of System.IO.FileInfo) In groupQuery
' Start a new extension at the top of a page.
Dim currentLine As Integer = 0
Do While (currentLine < fg.Count())
Console.Clear()
' Get the next page of results
' No more than one filename per page
Dim resultPage = From file In fg _
Skip currentLine Take numLines
' Execute the query. Trim the paths in the output.
For Each line In resultPage
Console.WriteLine(vbTab & line.FullName.Substring(charsToSkip))
Next
' Advance the current position
currentLine = numLines + currentLine
' Give the user a chance to break out of the loop
Console.WriteLine("Press any key for next page or the 'End' key to exit.")
Dim key As ConsoleKey = Console.ReadKey().Key
If key = ConsoleKey.End Then
goAgain = False
Exit For
End If
Loop
Next
End Sub
End Module
第一个查询使用简单键来确定匹配;此查询可以找到名称相同但内容可能不同的文件。 第二个查询使用复合键来匹配 FileInfo 对象的 3 个属性。 此查询更可能找到名称相同且内容相似或相同的文件。
编译代码
创建 Visual Basic 控制台应用程序项目,其中包含用于 System.Linq 命名空间的 Imports
语句。