Get the contents of a document part from a package
This topic shows how to use the classes in the Open XML SDK for Office to retrieve the contents of a document part in a Wordprocessing document programmatically.
Packages and Document Parts
An Open XML document is stored as a package, whose format is defined by ISO/IEC 29500. The package can have multiple parts with relationships between them. The relationship between parts controls the category of the document. A document can be defined as a word-processing document if its package-relationship item contains a relationship to a main document part. If its package-relationship item contains a relationship to a presentation part it can be defined as a presentation document. If its package-relationship item contains a relationship to a workbook part, it is defined as a spreadsheet document. In this how-to topic, you will use a word-processing document package.
Getting a WordprocessingDocument Object
The code starts with opening a package file by passing a file name to
one of the overloaded Open methods (Visual Basic .NET Shared
method or C# static method) of the WordprocessingDocument class that takes a
string and a Boolean value that specifies whether the file should be
opened in read/write mode or not. In this case, the Boolean value is
false
specifying that the file should be
opened in read-only mode to avoid accidental changes.
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, false))
With v3.0.0+ the Close() method
has been removed in favor of relying on the using statement.
It ensures that the Dispose() method is automatically called
when the closing brace is reached. The block that follows the using
statement establishes a scope for the object that is created or named in
the using statement. Because the WordprocessingDocument class in the Open XML SDK
automatically saves and closes the object as part of its IDisposable implementation, and because
Dispose() is automatically called when you
exit the block, you do not have to explicitly call Save() or
Dispose() as long as you use a using
statement.
Structure of a WordProcessingML Document
The basic document structure of a WordProcessingML
document consists of the document
and body
elements, followed by one or more block level elements such as p
, which represents a paragraph. A paragraph contains one or more r
elements. The r
stands for run, which is a region of text with a common set of properties, such as formatting. A run contains one or more t
elements. The t
element contains a range of text. The following code example shows the WordprocessingML
markup for a document that contains the text "Example text."
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>Example text.</w:t>
</w:r>
</w:p>
</w:body>
</w:document>
Using the Open XML SDK, you can create document structure and content using strongly-typed classes that correspond to WordprocessingML
elements. You will find these classes in the namespace. The following table lists the class names of the classes that correspond to the document
, body
, p
, r
, and t
elements.
WordprocessingML Element | Open XML SDK Class | Description |
---|---|---|
<document/> |
Document | The root element for the main document part. |
<body/> |
Body | The container for the block level structures such as paragraphs, tables, annotations and others specified in the ISO/IEC 29500 specification. |
<p/> |
Paragraph | A paragraph. |
<r/> |
Run | A run. |
<t/> |
Text | A range of text. |
For more information about the overall structure of the parts and elements of a WordprocessingML document, see Structure of a WordprocessingML document.
Comments Element
In this how-to, you are going to work with comments. Therefore, it is
useful to familiarize yourself with the structure of the <comments/>
element. The following information
from the ISO/IEC 29500
specification can be useful when working with this element.
This element specifies all of the comments defined in the current document. It is the root element of the comments part of a WordprocessingML document.Consider the following WordprocessingML fragment for the content of a comments part in a WordprocessingML document:
<w:comments>
<w:comment … >
…
</w:comment>
</w:comments>
The comments element contains the single comment specified by this document in this example.
© ISO/IEC 29500: 2016
The following XML schema fragment defines the contents of this element.
<complexType name="CT_Comments">
<sequence>
<element name="comment" type="CT_Comment" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
How the Sample Code Works
After you have opened the source file for reading, you create a mainPart
object by instantiating the MainDocumentPart
. Then you can create a reference
to the WordprocessingCommentsPart
part of
the document.
static string GetCommentsFromDocument(string document)
{
string? comments = null;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, false))
{
if (wordDoc is null)
{
throw new ArgumentNullException(nameof(wordDoc));
}
MainDocumentPart mainPart = wordDoc.MainDocumentPart ?? wordDoc.AddMainDocumentPart();
WordprocessingCommentsPart WordprocessingCommentsPart = mainPart.WordprocessingCommentsPart ?? mainPart.AddNewPart<WordprocessingCommentsPart>();
You can then use a StreamReader
object to
read the contents of the WordprocessingCommentsPart
part of the document
and return its contents.
using (StreamReader streamReader = new StreamReader(WordprocessingCommentsPart.GetStream()))
{
comments = streamReader.ReadToEnd();
}
}
return comments;
Sample Code
The following code retrieves the contents of a WordprocessingCommentsPart
part contained in a
WordProcessing
document package. You can
run the program by calling the GetCommentsFromDocument
method as shown in the
following example.
string document = args[0];
GetCommentsFromDocument(document);
Following is the complete code example in both C# and Visual Basic.
static string GetCommentsFromDocument(string document)
{
string? comments = null;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, false))
{
if (wordDoc is null)
{
throw new ArgumentNullException(nameof(wordDoc));
}
MainDocumentPart mainPart = wordDoc.MainDocumentPart ?? wordDoc.AddMainDocumentPart();
WordprocessingCommentsPart WordprocessingCommentsPart = mainPart.WordprocessingCommentsPart ?? mainPart.AddNewPart<WordprocessingCommentsPart>();
using (StreamReader streamReader = new StreamReader(WordprocessingCommentsPart.GetStream()))
{
comments = streamReader.ReadToEnd();
}
}
return comments;
}