Importing a Table from WordprocessingML to SpreadsheetML

In a previous post, I showed you how to import charts from spreadsheets to a Wordprocessing document. Today, as promised in my last post, I am going to show you how to import a table from a Wordprocessing document into a spreadsheet.

Solution

To import a table from a Wordprocessing document to a spreadsheet we need to take the following actions:

  1. Create a template in Word that contains a content control that will be used to demarcate the region of the table to be imported
  2. Open up the Word document via the Open XML SDK
  3. Open up an output Excel workbook, which will be used to contain the imported table, via the Open XML SDK
  4. Locate the content control that contains the Word table and access the table object
  5. Convert the Word table into an Excel spreadsheet table. By convert I am mean manually create an Excel table based on the Word table data
  6. Add a table definition part within the Excel workbook
  7. Give the added table definition a unique name and id
  8. Specify the range of the table based on the inserted data
  9. Add column header information to the table definitions part
  10. Add a style reference to the table definition
  11. Save changes made to the Excel workbook

My post will talk about using version 2 of the SDK.

For the sake of this post, let's say I am starting with the following Word document and table (notice that the table is contained within a content control):

Also, let's say I am starting with a completely empty Excel workbook, which only contains a style definition for tables:

If you just want to jump straight into the code, feel free to download this solution here.

The Code

As described in the solution section above, steps two and three require us to open the Word and Excel files. Below is the code snippet necessary to accomplish these tasks:

//Open Word document using (WordprocessingDocument myDoc = WordprocessingDocument.Open("tables.docx", true)) { MainDocumentPart mainPart = myDoc.MainDocumentPart; //Open spreadhseet using (SpreadsheetDocument mySpreadsheet = SpreadsheetDocument.Open("output.xlsx", true)) { WorkbookPart workbookPart = mySpreadsheet.WorkbookPart; //Import a specific table from the document into a specific sheet within the spreadsheet ImportWordTable(mainPart, "WorldPopulationTable", workbookPart, "Sheet1"); } }

I created a generic method called ImportWordTable, which will accomplish the rest of the steps in the solution section above. The next step is to get access to the Word table object found within the content control, which can be accomplished with the following code:

//Find the content control that contains the specific table SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>() .Where(s => s.SdtProperties.GetFirstChild<Alias>().Val.Value .Equals(sdtName)).First(); Table tbl = sdt.SdtContentBlock.GetFirstChild<Table>();

Now that we have access to the Word table our next task is to create an Excel table based on the data contained within the Word table. This task can be accomplished with the following code snippet:

static Spreadsheet.SheetData ConvertWordTableToExcel(Table tbl) { //sheetdata contains data for table Spreadsheet.SheetData sheetdata = new Spreadsheet.SheetData(); //For every row in my Word table we need to create a row in Excel foreach (TableRow row in tbl.Descendants<TableRow>()) { numRows++; Spreadsheet.Row sheetRow = new Spreadsheet.Row(); //For every Word cell create an Excel cell foreach (TableCell cell in row.Descendants<TableCell>()) { Spreadsheet.Cell sheetCell = new Spreadsheet.Cell(); string textValue = cell.InnerText; double numValue; //Check to see if value is a number rather than a string //First row should be strings because I am going to create a table if ((numRows != 1) && (Double.TryParse(textValue, System.Globalization.NumberStyles.Any, null, out numValue))) { Spreadsheet.CellValue v = new Spreadsheet.CellValue(); textValue = numValue.ToString(); v.Text = textValue; sheetCell.Append(v); } else //value is a string { sheetCell.DataType = Spreadsheet.CellValues.InlineString; Spreadsheet.InlineString inlineString = new Spreadsheet.InlineString(); Spreadsheet.Text t = new Spreadsheet.Text(); t.Text = textValue; inlineString.Append(t); sheetCell.Append(inlineString); } //Need to keep track of column headers for the table //definitions part if (numRows == 1) colHeaders.Add(textValue); sheetRow.Append(sheetCell); }      sheetdata.Append(sheetRow); } return sheetdata; }

The code snippet above simply creates an Excel spreadsheet row and cell for every Word table row and cell. A spreadsheet cell is a bit special as compared to a Word cell because a spreadsheet cell defines the datatype of the cell content. In the example above, we have two types of spreadsheet cells:

  1. Inline string cells
  2. Number value cells

Since this solution is about creating an Excel table, we need to take care of a couple of extra things. For one, we need to make sure that the first row of the table has only inline string cells, since we will be identifying this row as our table header. Secondly, we need to keep track of the header row strings because we need to reuse these strings within the table definitions part.

At the end of this step we have generated a valid Excel spreadsheet that contains all of the data from the Word table:

We could have called this job done at this point, but we can further improve this output by formatting the data as a table.

Looking at step #6 in the solution section above, we need to add a reference to a table definitions part within the worksheet where the data exists. Here is the code snippet that creates a new table definition part and adds a reference to that part from the worksheet that contains the table data:

//Add the table definitions part to make the imported table look like a table TableDefinitionPart tableDefPart = worksheetPart.AddNewPart<TableDefinitionPart>(relId); //Reference the added table part Spreadsheet.TableParts tableParts = new Spreadsheet.TableParts(); Spreadsheet.TablePart tablePart = new Spreadsheet.TablePart(); tablePart.Id = relId; tableParts.Append(tablePart); worksheetPart.Worksheet.Append(tableParts);

The last major set of steps is to create the appropriate xml within our new table definitions part. Every table definition needs a unique name, unique id, display name, and appropriate data range reference in order to get everything to work properly. The following code snippet accomplishes these tasks:

int id = 1; tableDefPart.Table = new Spreadsheet.Table(); tableDefPart.Table.Id = 1; tableDefPart.Table.Name = "Table1"; tableDefPart.Table.DisplayName = "Table1"; char endCol = 'A'; //Note that this approach is good for columns up to Z //Excel can contain more columns, but 26 columns should //be good enough for this demo for (int i = 1; i < numCols; i++) endCol++; //Specify the range of the table string reference = "A1:" + endCol + numRows; tableDefPart.Table.Reference = reference;

One advantage of an Excel table vs. a Word table is an Excel table allows you to filter values within a column. It's pretty easy to add this functionality to our table with the following code:

//Make sure the table has the ability to filter Spreadsheet.AutoFilter autoFilter = new Spreadsheet.AutoFilter(); autoFilter.Reference = reference; Spreadsheet.TableColumns tableColumns = new Spreadsheet.TableColumns(); tableColumns.Count = numCols;

Now let's add the table header information:

//Add the column headers to the table definition part foreach (string s in colHeaders) { Spreadsheet.TableColumn tableColumn = new Spreadsheet.TableColumn(); tableColumn.Id = (uint)id; tableColumn.Name = s; id++; tableColumns.Append(tableColumn); }

Almost done! Let's make this table look good by applying a table style. Once we apply the style we can append all the appropriate elements together and save our changes. These tasks can be accomplished with the following code:

//Apply a nice table style (contained within my template) Spreadsheet.TableStyleInfo tableStyleInfo = new Spreadsheet.TableStyleInfo(); tableStyleInfo.Name = "TableStyleMedium9"; tableStyleInfo.ShowRowStripes = true; tableDefPart.Table.Append(autoFilter, tableColumns, tableStyleInfo); tableDefPart.Table.Save();

End Result

Putting everything together and running this code, we end up with an Excel workbook, which contains a table imported from my Word document.

Here is a screenshot of the final workbook (notice the filtering capability):

Zeyad Rajabi

Comments

  • Anonymous
    April 01, 2009
    PingBack from http://www.anith.com/?p=25612

  • Anonymous
    April 14, 2009
    Hey, Thanks for the sample code, it's really showing me a lot of what is possible with this SDK. I'm having problems getting a specific content control in my own project using the above code. I'm using the new April 2009 release of the SDK, and trying to use it in an ASP Web App ( if that makes any difference ). When I use the code above to get the SdtBlock, I get the following error: 'DocumentFormat.OpenXml.Wordprocessing.SdtBlock' does not contain a definition for 'SdtProperties' and no extension method 'SdtProperties' accepting a first argument of type 'DocumentFormat.OpenXml.Wordprocessing.SdtBlock' could be found (are you missing a using directive or an assembly reference?) If I compile your code it will work, until I do a clean, then a recompile, then I get the same error. I'm assuming there's some kind of versioning error occurring, but if you can offer any suggestions as to how I can get it working, I'd be really appreciative.

  • Anonymous
    May 05, 2009
    In previous posts, like Importing a Table from Word to Excel , I showed you how to retrieve content within

  • Anonymous
    May 19, 2009
    One of the big changes we made in the Open XML SDK v2 April 2009 CTP was improving the Low Level DOM