Extending the Reach of Excel Services with the Open XML SDK

Note: Updated post to point to a YouTube video that is shorter than 10 minutes

In Brian's last blog he talked about Open XML and Office Services as a one-two punch to complete many types of solutions. I am going to start writing some posts that showcase these two technologies working together. Excel is a great tool for connecting to multiple types of data sources, like databases or XML files. Unfortunately, Excel has no concept of connecting to other types of Office documents, like Word files. This fact is important because information workers are pumping out more and more Office documents that contain lots of relevant business data. These documents are becoming just as important as data contained within databases. In today's post, I am going to show you how to extend the reach of Excel Services and hosted charts to gather data from within a library of Word documents.

If you want to jump straight into the code, feel free to download this solution here.

Scenario

As a Program Manager (PM) in Office, one of our tasks is to write specifications for features that we are designing for the product. These specifications exist in different stages depending on how far along we are in the design. For example, a stage 0 specification is essentially just a placeholder document with no content. A stage 5 specification is a document that contains the full design for a feature and is ready to be coded by developers. Along the way we inevitably run into open issues that must be addressed before the feature can be coded properly. In order to help us track open issues within a document we included a custom style called "Issue" as part of our specification template. This style applies a yellowish background color to a paragraph so that the paragraph easily sticks out. Here is a screenshot of what this style looks like:

image

Whenever a PM encounters an open issue he/she applies this custom style to the content within the specification. Wouldn't it be great if you can generate a dynamic chart that shows the number of open issues for every PM per given specification stage? This chart would represent the relative health of a particular specification library.

Solution

The scenario I listed above talks about interrogating a repository of Word documents and counting the number of open issues within each of the documents. In order to accomplish this scenario we will need to take the following actions:

  1. Create a template Excel spreadsheet that represents the look and feel of the chart we want to display
  2. Add an Excel Services web part to a SharePoint site that displays a chart hosted in our template Excel spreadsheet
  3. For every Word document in a SharePoint library find the following information:
    1. Program Manager name assigned to the specification
    2. Specification stage
    3. The number of open issues within the specification
  4. When all the data has been gathered, inject the data into the appropriate location in the template Excel spreadsheet
  5. Create a web part button that allows users to refresh the chart (the refresh button will perform steps 3 and 4)
    1. I am going to skip on showing this part since the link to my solution includes source code that shows you how to do this task

The Template

For the sake of this post and the scenario described above I have created the following template Excel spreadsheet:

image

This spreadsheet contains only one sheet with a blank chart. The chart will display the number of open issues across all the program managers listed within a SharePoint library broken down by the specification status. The chart is given a specific name so that we can reference it within our Excel Services web part. Add this template document to a SharePoint library.

Creating an Excel Services Web Part in SharePoint

Note: I am going to show you how to create a web part within SharePoint 2010, but you should be able to do the same thing within SharePoint 2007 (since Excel Services is not new to 2010).

On a SharePoint site where you are an admin, click on the menu command with your name and select Personalize this Page option:

image

At this point you have the ability to change the look of the page. Click on the Add a Web Part link:

image

In the Categories section select Office Client Applications. In the Web Parts section select Excel Web Access:

image

Now we need to link our Excel Services web part to our template Excel spreadsheet. Click the link to open the tool pane, which allows you to reference the correct Excel spreadsheet. In the Excel Services web part properties pane add the following information:

  • Name of the referenced workbook
  • Name of the chart to be added in the Named Item field

image

At this point we have an Excel Services web part that shows the chart in our Excel spreadsheet template.

Gathering Data from a Library of Word Documents

The first step in this task is to iterate through all the Word documents in a given specification library. Here is a code snippet to accomplish this task:

SPWeb myWebSite = SPContext.Current.Web;   SPDocumentLibrary specLibrary = myWebSite .Lists["Sample Spec Library"] as SPDocumentLibrary;   //Read all Word documents in the library foreach (SPListItem item in specLibrary.Items) { byte[] byteArray = item.File.OpenBinary();   using (MemoryStream mem = new MemoryStream()) { mem.Write(byteArray, 0, (int)byteArray.Length);   //Read the Word document into the Open XML SDK using (WordprocessingDocument myDoc = WordprocessingDocument.Open(mem, true)) { //Do something } } }

Now that we have access to the Word documents we need to count the number of paragraphs that have the style "Issue" applied. I have already written a post on how to accomplish this task. Check out retrieving Word content based on styles. I am going to reuse the same methods I talked about in that previous post for this scenario. I will reuse the method called GetParagraphListByStyle, which returns a list of paragraphs given a main document part and a specific style name.

Gathering the PM associated with the specification and the spec status is a matter of retrieving content within a specific content control. In my solution I am going to store the PM name, the number of open issues in the specification and the status of the specification within a hash table. Here is the code snippet I used to store the information:

Hashtable specTable = new Hashtable();   //Read all Word documents in the library foreach (SPListItem item in specLibrary.Items) { string pmName = ""; int numOpenIssues = 0; int status = 0;   ...   //Read the Word document into the Open XML SDK using (WordprocessingDocument myDoc = WordprocessingDocument.Open(mem, true)) { MainDocumentPart mainPart = myDoc.MainDocumentPart;   //Find out the program manager name associated with the spec Word.DataBinding binding = mainPart.Document .Descendants<Word.DataBinding>() .Where(d => d.XPath.Value.Contains("PM2")) .First();   pmName = binding.Ancestors<Word.SdtCell>().First().InnerText;   //Get all paragraphs in the spec that have a style //applied on the paragraphs called "Issue" IEnumerable<Word.Paragraph> paraIssueList = GetParagraphListByStyle(mainPart, "Issue");   numOpenIssues = paraIssueList.Count();   //Find out the spec status (either 0, 1, 2, 3, 4, or 5) IEnumerable<Word.Paragraph> paraSpecStatusList = GetParagraphListByStyle(mainPart, "SpecStatus");   foreach (Word.Paragraph p in paraSpecStatusList) { string specStatus = p.InnerText; if (specStatus.Contains("0")) status = 0; else if (specStatus.Contains("1")) status = 1; else if (specStatus.Contains("2")) status = 2; else if (specStatus.Contains("3")) status = 3; else if (specStatus.Contains("4")) status = 4; else if (specStatus.Contains("5")) status = 5; }   //Once we find all the data lets add it to a hashtable //We will use this data to build up our chart if (specTable.Contains(pmName)) { int[] specStatusValues = (int[])specTable[pmName]; specStatusValues[status] += numOpenIssues; } else { int[] specStatusValues = { 0, 0, 0, 0, 0, 0 }; specStatusValues[status] += numOpenIssues; specTable.Add(pmName, specStatusValues); } } }

Notice that the specification status can be one of six values, from Stage 0 to Stage 5.

Adding Data to my Spreadsheet

At this point we have all the data from our document library. All that is left is to insert this data into our Excel spreadsheet. This part of the solution is analogous to my previous post on document assembly for SpreadsheetML. We are simply going to create new rows and cells in our spreadsheet for every piece of data within our hash table. Notice that we are adding two types of cells: value based cells and text based cells. These two types of cells are stored differently in Excel. Here is the code snippet used to accomplish this task:

void AddDataToExcel(Hashtable specTable) { SPWeb myWebsite = SPContext.Current.Web;   SPDocumentLibrary templateLibrary = myWebsite .Lists["Excel Template"] as SPDocumentLibrary;   SPFile template = myWebsite .GetFile(@"https://wordteamserver/excelservices/Excel%20Template/template.xlsx"); template.CopyTo(outputFileName, true);   SPFile output = myWebsite.GetFile(template);   byte[] byteArray = output.OpenBinary();   //Open the output workbook so we can add data to it using (MemoryStream mem = new MemoryStream()) { mem.Write(byteArray, 0, (int)byteArray.Length);   using (SpreadsheetDocument myWorkbook = SpreadsheetDocument.Open(mem, true)) { //Get the first sheet in the workbook WorkbookPart workbookPart = myWorkbook.WorkbookPart; WorksheetPart worksheetPart = GetWorkSheetPart(workbookPart, "Sheet1");   //Sheet data contains all the data in the rows/cells SpreadSheet.SheetData sheetData = worksheetPart.Worksheet.GetFirstChild<SpreadSheet.SheetData>();   //Go through our spec hashtable so we can add the data to workbook IDictionaryEnumerator enumerator = specTable.GetEnumerator(); while (enumerator.MoveNext()) { string pmName = (string)enumerator.Key; int[] specStatusValues = (int[])enumerator.Value;   //create a new row for each program manager SpreadSheet.Row r = new SpreadSheet.Row();   //Add a cell to the row with the program manager name //note that the name is a string SpreadSheet.Cell firstCell = new SpreadSheet.Cell(); firstCell.DataType = SpreadSheet.CellValues.InlineString; SpreadSheet.InlineString inlineString = new SpreadSheet.InlineString(); SpreadSheet.Text t = new SpreadSheet.Text(); t.Text = pmName; inlineString.Append(t); firstCell.Append(inlineString); r.Append(firstCell);   //Let's add values to the other cells //specifically we will add the number of open issues //across the different spec status values for (int i = 0; i < 6; i++) { SpreadSheet.Cell sheetCell = new SpreadSheet.Cell(); SpreadSheet.CellValue v = new SpreadSheet.CellValue(); string textValue = specStatusValues[i].ToString(); v.Text = textValue; sheetCell.Append(v); r.Append(sheetCell); } sheetData.Append(r); } worksheetPart.Worksheet.Save();   //Once we have added all the values we need to update the //chart ranges FixChartData(worksheetPart, specTable); }   //Add output file to library SPFolder fldr = myWebsite.Folders["Excel Template"]; fldr.Files.Add(outputFileName, mem, true); } }

The GetWorkSheetPart method is the same method I wrote about in my previous post on how to copy a worksheet within a workbook.

Fix Chart Data

The last step in the solution is to fix the chart data. Again this code snippet is taken directly from my previous post on document assembly for SpreadsheetML. In any case, here is the code snippet used to accomplish this task:

//Fix the range of the chart data void FixChartData(WorksheetPart worksheetPart, Hashtable specTable) { int totalCount = specTable.Count + 1;   ChartPart chartPart = worksheetPart.GetPartsOfType<DrawingsPart>() .First().ChartParts.First();   foreach (Charts.Formula formula in chartPart.ChartSpace .Descendants<Charts.Formula>()) { //Range needs to be updated such that we reference the number of rows if (formula.Text.Contains("$2")) { string s = formula.Text.Split('$')[1]; formula.Text += ":$" + s + "$" + totalCount; } }   chartPart.ChartSpace.Save(); }

End Result

Running this code I will end up with a chart that represents the number of open issues across a library of Word documents. Clicking Update Chart will ensure the chart reflects the data contained within the library.

image

I am excited about this solution because it shows how you can extend the reach of our services to accomplish other types of solutions. In this case, having Excel reflect data contained within Word documents.

Zeyad Rajabi

Comments

  • Anonymous
    December 05, 2009
    Hi, I'm using the open xml format as the base data for a program I'm writing, most of the content I work with is best edited in word, but other parts can't be manipulated in word at all. So, my problem is I need to let the user have the document open both in my application, so he can manipulate the custom data, and in word, so he can manipulate the ordinary data. Is this possible? What I'm having to do at the moment is use a modified text editor to edit the word parts in my program, the user can still edit text in word, but he has to close my program first, which is annoying, to say nothing of the fact that the text editor has a habit of breaking the document format. So my question is, do you know of a good way for me to manipulate the document in my program (which could be a word addin, but isn't at the moment) while the document is open in word?

  • Anonymous
    December 05, 2009
    Interesting scenario Joshua. Take a look at the video in the following post: http://blogs.msdn.com/brian_jones/archive/2009/11/23/open-xml-sdk-word-automation-services-presentation-at-pdc.aspx. Check out the last demo, where I showed people how to manipulate Word documents while a user has the Word document open at the same time.

  • Anonymous
    December 07, 2009
    Thanks, I'll watch the demo when I've got a little time (and a better internet connection) but I had another question, does that use Office.Interop, or DocumentFormat.OpenXml? I was finding OpenXml to be really easy to work with, but Office.Interop has been a little trickier.

  • Anonymous
    December 08, 2009
    Joshua - The solutions that I talk about in this blog all use the Open XML SDK. In other words, there is no automation used within the client.