다음을 통해 공유


Sort and Filter CSV Files with DataTable and DataView


Introduction

Working with data a common need is to manage streams that come from different software. Sometimes there's the necessity to make available for a program the data created by another one, and many times the phase of data exchange is entrusted to text files, like comma separated values files (or CSV). That's perfectly OK when the informations can be acquired the way they present themselves, but if we need to further elaborate them before importing, a text file is not the most comfortable format to work with. That's when DataTables and DataViews from the .NET Framework could come in handy, helping us to manage our data in a tabular form. In this article, we'll see some brief examples using Visual Basic .NET

A simple scenario

Consider a sample pipe-separated CSV file like the following:

1.Name|Surname|Age|Occupation|City
2.John|Doe|30|Developer|New York
3.Jack|NoName|25|DBA|Los Angeles
4.Mario|Mario|42|Plumber|Unknown
5.Laura|Green|25|Developer|Unknown

We have a first line of field headers and four rows of data. Filtering or sorting those data, while mantaining the original textual format, will be a not-so-quick task. We could import the file in a Excel worksheet, in a SQL Server table, and so on, but we may not have access to those tools. Maybe we must develop an interpreter, which takes the original file and produces a second file with data revisited: possible scenarios are many.

In that case a DataTable is surely what we need. Like the MSDN says, the DataTable «represents a table of in-memory data». It is a sort of virtual table, in which we can store data in tabular form (i.e. columns and rows), relying on the peculiarities of such a structure (data access, relations, and so on). A DataTable can be bound to any control (WPF or WinForms) on which the DataSource or ItemSource property is available.

Create a DataTable from a CSV file

Let's create and populate a DataTable with the data above. For the sake of immediacy, i've writter a short snippet to create a file containing our sample data. Consider the following:

01.Const sampleFile As String  = "c:\temp\sample.txt"
02. 
03.'-- Create sample data, writing them to c:\temp\sample.txt
04.Dim _sampleData As String  = "Name|Surname|Age|Occupation|City"  & Environment.NewLine & _
05.                "John|Doe|30|Developer|New York" & Environment.NewLine & _
06.                "Jack|NoName|25|DBA|Los Angeles" & Environment.NewLine & _
07.                "Mario|Mario|42|Plumber|Unknown" & Environment.NewLine & _
08.                "Laura|Green|25|Developer|Unknown"
09. 
10.IO.File.WriteAllText(sampleFile, _sampleData)

The code above will create a file named "sample.txt" under the "c:\temp\ folder, writing the content of _sampleData in it. Starting from such a file, we need to initialize the DataTable, and create the columns representing our data. Since we have the columns names on the first row of the file, we can write a snippet like this:

1.'-- Create a datatable from our text file 
2.Dim dt As New DataTable("Sample")
3. 
4.'-- Opens sample file, read first line, assign 
5.For Each  l As  String In  IO.File.ReadLines(sampleFile)(0).Split("|")
6.     dt.Columns.Add(l)
7.Next

We've called our DataTable "Sample". Then, looping on the string array obtained by splitting by pipe the first line of our file, we've extracted each field name (Name, Surname, Age, Occupation, City). For each of those, a call to the Add function of the DataTable's DataColumnCollection will create the column named as the passed parameter (in our case, the field name itself).

The above creates the structure of our table: now we must fill it with data. That means we will apply a logic similar to the one used for columns, this time on rows, using the file lines from the second to the last one.

1.'-- Read sample data as rows
2.Dim nRow As Boolean  = False
3.For Each  l As  String In  IO.File.ReadLines(sampleFile)
4.    If Not (nRow) Then  nRow = True  : Continue For
5.    dt.Rows.Add(l.Split("|"))
6.Next

Easy enough, we loop on all the file lines (skipping the first, used for column headers), each time adding a row to our DataTable. Since one of the Rows.Add overloads accepts an array of Objects, we can use the array obtained by splitting the read line by its separator (l.Split("|")). At the end of the loop, our DataTable will have a valid structure, and valid data too.

DataTable as DataSource

We can test our DataTable against a DataGridView (or any other control accepting a DataSource), to check everything is OK. I've added to my Form a DataGridView. We can bind the DataTable to the grid by doing:

1.'-- The DataTable could be used as a Data Source
2.DataGridView1.DataSource = dt

Running our program, the result will be:

We can observe everything went OK: our data are correctly shown. They are presented the way the DataTable was populated: no filter applied, no particular sort order. Each line is found at the index it has in the file (i.e. "John Doe" is the first row, "Laura Green" the last one).

Use DataView to sort and filter data ​

As stated by MSDN, a DataView «represents a databindable, customized view of a DataTable for sorting, filtering, searching, editing, and navigation. The DataView does not store data, but instead represents a connected view of its corresponding DataTable». DataViews allow us to customize the way our data are presented.

Let's say we want to order our DataTable by the Age columns, showing the older people first. We could do that easily, by writing:

1.Dim dv As New DataView(dt)
2.dv.Sort = "Age DESC"

And then binding our grid to the DataView

1.DataGridView1.DataSource = dv

If we run our program now, we'll see the rows will be sorted as requested. Sorting can be done on multiple columns: for example, if we want to sort by descendant Age and descending Name, the above Sort code will be changed to:

1.dv.Sort = "Age DESC, Name DESC"

Any other column could be added to Sort property, separating each one of them by a comma. DESC and ASC predicates determines the sort direction, namely descending (from greater to smaller) and ascending (from smaller to greater).

In the screenshot, note the descending sort by Age, and the secondary descending sort by Name in case of equal Age value. DataViews filter's capabilities are pretty straightforward too. Using the RowFilter property, we could write concise expression to determine the data we want to show. For example, let's say we want to extract only those records in which the city is Unknown. We could write:

1.dv.RowFilter = "City = 'Unknown'"

And the result will be

A multi-filter can be set by joining different expression with the logic predicates AND and OR. The expression: 

1.dv.RowFilter = "Age < 30 OR Age=42"

Will return the following

Because we have only two records where Age is < 30, and only one in which it is = 42.

From DataView to Text

Now, let's suppose we want to create a second CSV file, containing only those records in which the field City contains a blank space, sorted by ascending Age field. A simple way to export our data could be:

01.Dim dw As New DataView(dt)
02.dw.Sort = "Age ASC"
03.dw.RowFilter = "City LIKE '% %'"
04. 
05.Dim lines As String  = ""
06.For Each  r As  DataRowView In  dw
07.    lines &= String.Join("|", r.Row.ItemArray) & Environment.NewLine
08.Next
09. 
10.IO.File.WriteAllText("c:\temp\output.txt", lines)

In other words, we declare a DataView based on the previously filled DataTable, specifying Sort and RowFilter properties. Then, executing a loop on each row of the view, we create a pipe-separated string obtained by joining each field of the row. Finally, the text data will be written to an arbitrary text file.

Source Code

The sample code used in the article can be downloaded here: https://code.msdn.microsoft.com/Sort-and-Filter-CSV-files-ee8affda

References

Other languages

The article is available in the following localizations: