Retrieving the Paragraphs and Their Styles
In this example, we write a query that retrieves the paragraph nodes from a WordprocessingML document. It also identifies the style of each paragraph.
This query builds on the query in the previous example, Finding the Default Paragraph Style, which retrieves the default style from the list of styles. This information is required so that the query can identify the style of paragraphs that do not have a style explicitly set. Paragraph styles are set through the w:pPr
element; if a paragraph does not contain this element, it is formatted with the default style.
This topic explains the significance of some pieces of the query, then shows the query as part of a complete, working example.
Example
The source of the query to retrieve all the paragraphs in a document and their styles is as follows:
xDoc.Root.Element(w + "body").Descendants(w + "p")
xDoc.Root.<w:body>...<w:p>
This expression is similar to the source of the query in the previous example, Finding the Default Paragraph Style. The main difference is that it uses the Descendants axis instead of the Elements axis. The query uses the Descendants axis because in documents that have sections, the paragraphs will not be the direct children of the body element; rather, the paragraphs will be two levels down in the hierarchy. By using the Descendants axis, the code will work of whether or not the document uses sections.
Example
The query uses a let clause to determine the element that contains the style node. If there is no element, then styleNode
is set to null:
let styleNode = para.Elements(w + "pPr").Elements(w + "pStyle").FirstOrDefault()
Let styleNode As XElement = para.<w:pPr>.<w:pStyle>.FirstOrDefault()
The let clause first uses the Elements axis to find all elements named pPr
, then uses the Elements extension method to find all child elements named pStyle
, and finally uses the FirstOrDefault standard query operator to convert the collection to a singleton. If the collection is empty, styleNode
is set to null. This is a useful idiom to look for the pStyle
descendant node. Note that if the pPr
child node does not exist, the code does nor fail by throwing an exception; instead, styleNode
is set to null, which is the desired behavior of this let clause.
The query projects a collection of an anonymous type with two members, StyleName
and ParagraphNode
.
Example
This example processes a WordprocessingML document, retrieving the paragraph nodes from a WordprocessingML document. It also identifies the style of each paragraph. This example builds on the previous examples in this tutorial. The new query is called out in comments in the code below.
You can find instructions for creating the source document for this example in Creating the Source Office Open XML Document.
This example uses classes found in the WindowsBase assembly. It uses types in the System.IO.Packaging namespace.
const string fileName = "SampleDoc.docx";
const string documentRelationshipType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
const string stylesRelationshipType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/styles";
const string wordmlNamespace = "https://schemas.openxmlformats.org/wordprocessingml/2006/main";
XNamespace w = wordmlNamespace;
XDocument xDoc = null;
XDocument styleDoc = null;
using (Package wdPackage = Package.Open(fileName, FileMode.Open, FileAccess.Read))
{
PackageRelationship docPackageRelationship = wdPackage.GetRelationshipsByType(documentRelationshipType).FirstOrDefault();
if (docPackageRelationship != null)
{
Uri documentUri = PackUriHelper.ResolvePartUri(new Uri("/", UriKind.Relative), docPackageRelationship.TargetUri);
PackagePart documentPart = wdPackage.GetPart(documentUri);
// Load the document XML in the part into an XDocument instance.
xDoc = XDocument.Load(XmlReader.Create(documentPart.GetStream()));
// Find the styles part. There will only be one.
PackageRelationship styleRelation = documentPart.GetRelationshipsByType(stylesRelationshipType).FirstOrDefault();
if (styleRelation != null)
{
Uri styleUri = PackUriHelper.ResolvePartUri(documentUri, styleRelation.TargetUri);
PackagePart stylePart = wdPackage.GetPart(styleUri);
// Load the style XML in the part into an XDocument instance.
styleDoc = XDocument.Load(XmlReader.Create(stylePart.GetStream()));
}
}
}
string defaultStyle =
(string)(
from style in styleDoc.Root.Elements(w + "style")
where (string)style.Attribute(w + "type") == "paragraph"&&
(string)style.Attribute(w + "default") == "1"
select style
).First().Attribute(w + "styleId");
// Following is the new query that finds all paragraphs in the
// document and their styles.
var paragraphs =
from para in xDoc
.Root
.Element(w + "body")
.Descendants(w + "p")
let styleNode = para
.Elements(w + "pPr")
.Elements(w + "pStyle")
.FirstOrDefault()
select new
{
ParagraphNode = para,
StyleName = styleNode != null ?
(string)styleNode.Attribute(w + "val") :
defaultStyle
};
foreach (var p in paragraphs)
Console.WriteLine("StyleName:{0}", p.StyleName);
Imports <xmlns:w="https://schemas.openxmlformats.org/wordprocessingml/2006/main">
Module Module1
Private Function GetStyleOfParagraph(ByVal styleNode As XElement, ByVal defaultStyle As String) As String
If (styleNode Is Nothing) Then
Return defaultStyle
Else
Return styleNode.@w:val
End If
End Function
Sub Main()
Dim fileName = "SampleDoc.docx"
Dim documentRelationshipType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"
Dim stylesRelationshipType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"
Dim wordmlNamespace = "https://schemas.openxmlformats.org/wordprocessingml/2006/main"
Dim xDoc As XDocument = Nothing
Dim styleDoc As XDocument = Nothing
Using wdPackage As Package = Package.Open(fileName, FileMode.Open, FileAccess.Read)
Dim docPackageRelationship As PackageRelationship = wdPackage.GetRelationshipsByType(documentRelationshipType).FirstOrDefault()
If (docPackageRelationship IsNot Nothing) Then
Dim documentUri As Uri = PackUriHelper.ResolvePartUri(New Uri("/", UriKind.Relative), docPackageRelationship.TargetUri)
Dim documentPart As PackagePart = wdPackage.GetPart(documentUri)
' Load the document XML in the part into an XDocument instance.
xDoc = XDocument.Load(XmlReader.Create(documentPart.GetStream()))
' Find the styles part. There will only be one.
Dim styleRelation As PackageRelationship = documentPart.GetRelationshipsByType(stylesRelationshipType).FirstOrDefault()
If (styleRelation IsNot Nothing) Then
Dim styleUri As Uri = PackUriHelper.ResolvePartUri(documentUri, styleRelation.TargetUri)
Dim stylePart As PackagePart = wdPackage.GetPart(styleUri)
' Load the style XML in the part into an XDocument instance.
styleDoc = XDocument.Load(XmlReader.Create(stylePart.GetStream()))
End If
End If
End Using
Dim defaultStyle As String = _
( _
From style In styleDoc.Root.<w:style> _
Where style.@w:type = "paragraph" And _
style.@w:default = "1" _
Select style _
).First().@w:styleId
' Following is the new query that finds all paragraphs in the
' document and their styles.
Dim paragraphs = _
From para In xDoc.Root.<w:body>...<w:p> _
Let styleNode As XElement = para.<w:pPr>.<w:pStyle>.FirstOrDefault() _
Select New With { _
.ParagraphNode = para, _
.StyleName = GetStyleOfParagraph(styleNode, defaultStyle) _
}
For Each p In paragraphs
Console.WriteLine("StyleName:{0}", p.StyleName)
Next
End Sub
End Module
This example produces the following output when applied to the document described in Creating the Source Office Open XML Document.
StyleName:Heading1
StyleName:Normal
StyleName:Normal
StyleName:Normal
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Normal
StyleName:Normal
StyleName:Normal
StyleName:Code
Next Steps
In the next topic, Retrieving the Text of the Paragraphs, we will create a query to retrieve the text of paragraphs.
See Also
Concepts
Tutorial: Manipulating Content in a WordprocessingML Document