Working with the shared string table
This topic discusses the Open XML SDK SharedStringTable class and how it relates to the Open XML File Format SpreadsheetML schema. For more information about the overall structure of the parts and elements that make up a SpreadsheetML document, see Structure of a SpreadsheetML document.
SharedStringTable in SpreadsheetML
The following information from the ISO/IEC 29500
specification introduces the SharedStringTable
(<sst/>
) element.
An instance of this part type contains one occurrence of each unique string that occurs on all worksheets in a workbook.
A package shall contain exactly one Shared String Table part
The root element for a part of this content type shall be sst.
A workbook can contain thousands of cells containing string (non-numeric) data. Furthermore, this data is very likely to be repeated across many rows or columns. The goal of implementing a single string table that is shared across the workbook is to improve performance in opening and saving the file by only reading and writing the repetitive information once.
© ISO/IEC 29500: 2016
Shared strings optimize space requirements when the spreadsheet contains multiple instances of the same string. Spreadsheets that contain business or analytical data often contain repeating strings. If these strings were stored using inline string markup, the same markup would appear over and over in the worksheet. While this is a valid approach, there are several downsides. First, the file requires more space on disk because of the redundant content. Moreover, loading and saving also takes longer.
To optimize the use of strings in a spreadsheet, SpreadsheetML stores a single instance of the string in a table, called the shared string table. The cells then reference the string by index instead of storing the value inline in the cell value. Excel always creates a shared string table when it saves a file. However, using the shared string table is not required to create a valid SpreadsheetML file. If you are creating a spreadsheet document programmatically and the spreadsheet contains a small number of strings, or does not contain any repeating strings, the optimizations usually gained from the shared string table might be negligible in these cases.
The shared strings table is a separate part inside the package. Each workbook contains only one shared string table part that contains strings that can appear multiple times in one sheet or in multiple sheets.
The following table lists the common Open XML SDK classes used when working with the SharedStringTable class.
SpreadsheetML Element | Open XML SDK Class |
---|---|
<si/> |
SharedStringItem |
<t/> |
Text |
Open XML SDK SharedStringTable Class
The Open XML SDK SharedStringTable
class
represents the paragraph (<sst/>
) element
defined in the Open XML File Format schema for SpreadsheetML documents.
Use the SharedStringTable
class to
manipulate individual <sst/>
elements in a
SpreadsheetML document.
Shared String Item Class
The SharedStringItem
class represents the
shared string item (<si/>
) element which
represents an individual string in the shared string table.
If the string is a simple string with formatting applied at the cell level, then the shared string item contains a single text element used to express the string. However, if the string in the cell is more complex ─ for example, if the string has formatting applied at the character level ─ then the string item consists of multiple rich text runs that are used collectively to express the string.
For example, the following XML code is the shared string table for a
worksheet that contains text formatted at the cell level and at the
character level. The first three strings ("Cell A1", "Cell B1", and "My
Cell") are from cells that are formatted at the cell level and only the
text is stored in the shared string table. The next two strings ("Cell
A2" and "Cell B2") contain character level formatting. The word "Cell"
is formatted differently from "A2" and "B2", therefore the formatting
for the cells is stored along with the text within the shared string
item using the RichTextRun
(<r/>
) and RunProperties
(<rPr/>
) elements. To preserve the white space in
between the text that is formatted differently, the space
attribute of the text
(<t/>
) element
is set equal to preserve
. For more
information about the rich text run and run properties elements, see the
ISO/IEC 29500 specification.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="6" uniqueCount="5">
<si>
<t>Cell A1</t>
</si>
<si>
<t>Cell B1</t>
</si>
<si>
<t>My Cell</t>
</si>
<si>
<r>
<rPr>
<sz val="11"/>
<color rgb="FFFF0000"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Cell</t>
</r>
<r>
<rPr>
<sz val="11"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> </t>
</r>
<r>
<rPr>
<b/>
<sz val="11"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>A2</t>
</r>
</si>
<si>
<r>
<rPr>
<sz val="11"/>
<color rgb="FF00B0F0"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Cell</t>
</r>
<r>
<rPr>
<sz val="11"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> </t>
</r>
<r>
<rPr>
<i/>
<sz val="11"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>B2</t>
</r>
</si>
</sst>
Text Class
The Text
class represents the text (<t/>
) element which represents the text content
shown as part of a string.
Open XML SDK Code Example
The following code takes a String
and a
SharedStringTablePart
and verifies if the
specified text exists in the shared string table. If the text does not
exist, it is added as a shared string item to the shared string table.
For more information about how to use the SharedStringTable
class to programmatically
insert text into a cell, see How to: Insert text into a cell in a spreadsheet document.
static int InsertSharedStringItem(string text, SharedStringTablePart shareStringPart)
{
// If the part does not contain a SharedStringTable, create one.
if (shareStringPart.SharedStringTable is null)
{
shareStringPart.SharedStringTable = new SharedStringTable();
}
int i = 0;
// Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
{
if (item.InnerText == text)
{
return i;
}
i++;
}
// The text does not exist in the part. Create the SharedStringItem and return its index.
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
return i;
}