Introduction to Excel XML Part 1: creating a simple table
I started this blog as a way of responding to news about the new formats, as well as give some insights into the goals and design decisions behind our XML support in Office. I hadn't really planned on doing any tutorial level posts, but I think it actually might we worthwhile to get more people up to speed on XML in Office. I’m going to start to mix in some more introductory type posts for those not as familiar with XML. Today, I wanted to provide a really simple post for people that aren’t yet familiar with Excel’s XML support. As I start to build up some of these examples, I’ll eventually lead into some previews of what similar files would look like with the new formats.
Excel’s SpreadsheetML has been around for about 4 years now (they actually started development of it about 6 years ago). The SpreadsheetML format is pretty easy to work with, as it was designed to be human readable and editable. For example, the files are “pretty printed”, which we don’t plan to do with the new formats in Office 12. You can use Office XP or 2003 for this example since they both have support for SpreadsheetML in Excel.
Let’s create a really simple table that looks like this (ignore the <TBODY> tags if they show up):
First Name | Last Name | Phone Number |
Brian | Jones | (425) 123-4567 |
Chad | Rothschiller | (425) 123-4567 |
Shawn | Villaron | (425) 123-4567 |
Scott | Walker | (425) 123-4567 |
Kevin | Boske | (425) 123-4567 |
You can use any text editor to do this. For the sake of this example, let's just use notepad. Create a blank file called "test.xml" and do the following steps.
Step 1: XML declaration and root element
First thing to do in your file is to type the following:
<?xml version="1.0"?>
This just declares that the file is an XML document adhering to the 1.0 version of the XML spec. That should go at the top of all your XML files. Next thing to do is to add the root element for the document. XML files always have one and only one root element that contains the rest of the document. For SpreadsheetML, the root element is <Workbook>. So, after the XML declaration, add that element so that your file now looks like this:
<?xml version="1.0"?>
<Workbook>
</Workbook>
Step 2: Declare the namespace and add a prefix to the root element
Most XML documents out there today have a namespace associated with them. Declaring the namespace of an XML file makes it a lot easier for people parsing your XML to know what type of XML they are dealing with. Even if Office we have a number of different uses of XML. In order to know when you are parsing a Word XML file as opposed to an Excel XML file, one easy way to tell is to look at the namespace. In Office XP when we created the SpreadsheetML schema, we were still using namespaces in the form "urn:schemas-microsoft-com:office". Going forward, we'll use URL namespaces, like we did with WordML in Office 2003 ("https://schemas.microsoft.com/office" for example). By adding the namespace declaration to the spreadsheet, your file should now look like this:
<?xml version="1.0"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet">
</Workbook>
The last thing we'll do for the namespace is to use a prefix, rather than the default. Since the attributes are qualified for the SpreadsheetML schema, we need to do this if we are going to use any attributes. Let's use "ss" as the prefix. We'll add "ss:" in front of all of our elements, and in our namespace declaration, we'll update it to say that the namespace applies to everything with an "ss:" in front of it, instead of just applying to the default XML elements.
<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
</ss:Workbook>
Notice that the namespace declaration says xmlns:ss= instead of just xmlns= . That's saying that anything with an "ss:" in front of it applies to the spreadsheet namespace.
Step 3: Adding a worksheet
OK, we now have an empty workbook. The next thing we'll do is to declare the spreadsheet grid within the workbook. As you know, workbooks can have multiple worksheets, but we'll just declare one. In addition, let's declare a table inside the worksheet. The Table is where all the grid data will go:
<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>
Step 4: Adding the header row
The first row in the table we wanted to generate has "First Name", "Last Name" and "Phone Number" in the three columns. Let's add a <Row> tag as well as three <Cell> tags. The actual content of the cell is contained within a <Data> tag, so let's add that as well:
<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">First Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Last Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Phone Number</ss:Data>
</ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>
You now have a template for the table that you can open directly in Excel. Go ahead and open it in Excel and take a look. Not too exciting, but it's a start.
Step 5: Adjust the column widths
Notice that the widths of the columns are too narrow for the content. Let's add some XML to our file to specify what width we want the columns represented as.
<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">First Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Last Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Phone Number</ss:Data>
</ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>
Now open the file again in Excel. Notice that the column widths are wider and that the text now fits. There is another attribute you can set on the Column element that tells it to use autofit for the widths. This only works for numbers and dates though. Since our cells are strings, we need to explicitly set the width.
Step 6: Adding the rest of the data
OK, now add those additional rows of data. This should be pretty easy. Just select that first "Row" element and copy it. Then paste it 5 more times so you have 6 total rows. Now go through and update the values of the Rows. If you are familiar with XSLT, you'll see at this point how you could easily generate an XSLT that could be applied to a data set to transform it into SpreadsheetML. Just repeat the row tag for each row in your data set and add the values in each Cells Data tag. After applying all the data, your XML should look like this:
<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">First Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Last Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Phone Number</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Brian</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Jones</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Chad</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Rothschiller</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Shawn</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Villaron</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Scott</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Walker</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Kevin</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Boske</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>
You should now have the full table in Excel.
Step 7: Add some formatting to the first row
For the last step, let's format the first row with bold text, so that it's clear that it's our header row. All we need to do here is generate a style that has bold text, and then reference that style with the first row. So, the first thing we'll do is add the following XML in front of the Worksheet tag:
<ss:Styles>
<ss:Style ss:ID="1">
<ss:Font ss:Bold="1"/>
</ss:Style>
</ss:Styles>
That created a style who's ID is "1" and has bold applied to it. Next, update the first Row element to reference the style ID 1. The Row should now look like this:
<ss:Row ss:StyleID="1">
Your XML should now look like this:
<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Styles>
<ss:Style ss:ID="1">
<ss:Font ss:Bold="1"/>
</ss:Style>
</ss:Styles>
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Row ss:StyleID="1" >
<ss:Cell>
<ss:Data ss:Type="String">First Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Last Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Phone Number</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Brian</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Jones</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Chad</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Rothschiller</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Shawn</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Villaron</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Scott</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Walker</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Kevin</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Boske</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>
In closing
Like I said, that was a pretty simple example, but I wanted to start posting some information for folks new to Office XML (or even new to XML in general). I'll write up a similar example with Word, and then I'll start to get into some of the more complex details of the schemas.
The new XML formats for Excel will look different than what I just showed you with SpreadsheetML, but there will also be some similarities. It's good to become a little familiar with the existing schemas, and in a few months I'll start posting a lot more about the new schemas.
-Brian
Comments
Anonymous
June 27, 2005
Thanks a ton for posting this. Open format will bring in lots of goodies.Anonymous
June 30, 2005
Here's some more information for those of you new to XML or at least Office XML. I mentioned in the previous...Anonymous
July 05, 2005
This post is for those of you interested in learning the basics behind WordprocessingML. That’s the schema...Anonymous
July 06, 2005
とりあえずここを読んでおけばよさそうな感じです。Brian Jones: Office XML Formats
現時点で以下の記事があがっています。
Excel:Introduction to Excel...Anonymous
September 07, 2005
Great job. Thanks a bunch.Anonymous
September 22, 2005
Great Job.
However can you post some, preferably not so primitive real Excel and PowerPoint documents as you did for Word?
That would really give us a better picture!
ThanksAnonymous
November 20, 2005
This is brilliant, thanks a lot mate.
I had been looking for a way to generate an Excel spreadsheet using an Oracle database function and your XML examples gave me exactly what I needed.
Cheers !Anonymous
January 04, 2006
Its very good article.
I got one problem with the above code. I want to place a Table inside a cell. How can I place a Table inside a Cell.
Please can any one give suggest for the above problem with some idea.
Thanks in advance.
Regards,
VijayAnonymous
February 01, 2006
Its too good,I just got the correct information I actually needed.Thanks alot.
regards,
karthikAnonymous
February 13, 2006
I'm glad this is working for everyone! :-)
Vijay, maybe I'm misunderstanding your question, but I don't believe what you are asking for is possible in Excel. Word supports nested tables, but Excel does not.
-BrianAnonymous
April 11, 2006
Brian,
I am looking at the XML created by the 2007 Microsoft Office Beta-1 Technical Refresh and it looks very different from your sample XML. Sheets are stored as separate parts. Data within the sheet is within a <sheetData> element. Columns are inside the <cols> element and are represented as <col> elements etc.
Please explain. Do your samples need to be updated or I am missing something very basic.
Thanks
AlexAnonymous
April 11, 2006
Hi Alex. This example is using the XML format from Excel XP & 2003. The new SpreadsheetML Open XML format is actually significantly different.
I just made a post covering some of the basics of the PresentationML format which is also new. I'll try to get something up around SpreadsheetML shortly.
-BrianAnonymous
July 03, 2008
PingBack from http://www.zacharysnow.com/2008/07/04/excel-xml/Anonymous
January 18, 2009
PingBack from http://www.keyongtech.com/2359812-reading-an-excel-file