检索段落及其样式
在本示例中,我们编写一个从 WordprocessingML 文档检索段落节点的查询。它还标识每个段落的样式。
此查询建立在前一示例查找默认段落样式中的查询的基础之上,该查询从样式列表中检索默认样式。这些信息是必需的,以便查询能标识未显式设置样式的段落样式。段落样式通过 w:pPr 元素来设置;如果段落未包含此元素,则它会使用默认样式的格式。
本主题解释某些查询的意义,然后作为一个完整工作示例的一部分来显示查询。
示例
检索文档中所有段落及其样式的查询的源如下所示:
xDoc.Root.Element(w + "body").Descendants(w + "p")
xDoc.Root.<w:body>...<w:p>
此表达式与前一示例查找默认段落样式中的查询的源相似。主要区别是它使用 Descendants 轴,而不是 Elements 轴。该查询使用 Descendants 轴,因为在包含节的文档中,段落将不是正文元素的直接子元素;而是在层次结构中的两个级别之下。通过使用 Descendants 轴,无论文档是否使用节,代码都能运行。
示例
该查询使用 let 子句来确定包含样式节点的元素。如果没有任何元素,则将 styleNode 设置为 null:
let styleNode = para.Elements(w + "pPr").Elements(w + "pStyle").FirstOrDefault()
Let styleNode As XElement = para.<w:pPr>.<w:pStyle>.FirstOrDefault()
let 子句首先使用 Elements 轴,查找所有名为 pPr 的元素,然后使用 Elements 扩展方法,查找所有名为 pStyle 的子元素,最后使用 FirstOrDefault 标准查询运算符,将集合转换为单一实例。如果集合为空,则将 styleNode 设置为 null。这是一个用于查找 pStyle 子代节点的有用方法。请注意,如果 pPr 子节点不存在,代码不会通过引发异常而运行失败;相反,它会将 styleNode 设置为 null,这是此 let 子句的期望行为。
该查询投影一个具有两个成员 StyleName 和 ParagraphNode 的匿名类型的集合。
示例
本示例处理一个 WordprocessingML 文档,它从 WordprocessingML 文档中检索段落节点。它还标识每个段落的样式。本示例以本教程中前面的一些示例为基础构建。下面代码中的注释标识出了这个新查询。
在创建源 Office Open XML 文档中,对本示例源文档的创建进行了说明。
本示例使用 WindowsBase 程序集中的类。它使用 System.IO.Packaging 命名空间中的类型。
const string fileName = "SampleDoc.docx";
const string documentRelationshipType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
const string stylesRelationshipType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/styles";
const string wordmlNamespace = "https://schemas.openxmlformats.org/wordprocessingml/2006/main";
XNamespace w = wordmlNamespace;
XDocument xDoc = null;
XDocument styleDoc = null;
using (Package wdPackage = Package.Open(fileName, FileMode.Open, FileAccess.Read))
{
PackageRelationship docPackageRelationship = wdPackage.GetRelationshipsByType(documentRelationshipType).FirstOrDefault();
if (docPackageRelationship != null)
{
Uri documentUri = PackUriHelper.ResolvePartUri(new Uri("/", UriKind.Relative), docPackageRelationship.TargetUri);
PackagePart documentPart = wdPackage.GetPart(documentUri);
// Load the document XML in the part into an XDocument instance.
xDoc = XDocument.Load(XmlReader.Create(documentPart.GetStream()));
// Find the styles part. There will only be one.
PackageRelationship styleRelation = documentPart.GetRelationshipsByType(stylesRelationshipType).FirstOrDefault();
if (styleRelation != null)
{
Uri styleUri = PackUriHelper.ResolvePartUri(documentUri, styleRelation.TargetUri);
PackagePart stylePart = wdPackage.GetPart(styleUri);
// Load the style XML in the part into an XDocument instance.
styleDoc = XDocument.Load(XmlReader.Create(stylePart.GetStream()));
}
}
}
string defaultStyle =
(string)(
from style in styleDoc.Root.Elements(w + "style")
where (string)style.Attribute(w + "type") == "paragraph"&&
(string)style.Attribute(w + "default") == "1"
select style
).First().Attribute(w + "styleId");
// Following is the new query that finds all paragraphs in the
// document and their styles.
var paragraphs =
from para in xDoc
.Root
.Element(w + "body")
.Descendants(w + "p")
let styleNode = para
.Elements(w + "pPr")
.Elements(w + "pStyle")
.FirstOrDefault()
select new
{
ParagraphNode = para,
StyleName = styleNode != null ?
(string)styleNode.Attribute(w + "val") :
defaultStyle
};
foreach (var p in paragraphs)
Console.WriteLine("StyleName:{0}", p.StyleName);
Imports <xmlns:w="https://schemas.openxmlformats.org/wordprocessingml/2006/main">
Module Module1
Private Function GetStyleOfParagraph(ByVal styleNode As XElement, ByVal defaultStyle As String) As String
If (styleNode Is Nothing) Then
Return defaultStyle
Else
Return styleNode.@w:val
End If
End Function
Sub Main()
Dim fileName = "SampleDoc.docx"
Dim documentRelationshipType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"
Dim stylesRelationshipType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"
Dim wordmlNamespace = "https://schemas.openxmlformats.org/wordprocessingml/2006/main"
Dim xDoc As XDocument = Nothing
Dim styleDoc As XDocument = Nothing
Using wdPackage As Package = Package.Open(fileName, FileMode.Open, FileAccess.Read)
Dim docPackageRelationship As PackageRelationship = wdPackage.GetRelationshipsByType(documentRelationshipType).FirstOrDefault()
If (docPackageRelationship IsNot Nothing) Then
Dim documentUri As Uri = PackUriHelper.ResolvePartUri(New Uri("/", UriKind.Relative), docPackageRelationship.TargetUri)
Dim documentPart As PackagePart = wdPackage.GetPart(documentUri)
' Load the document XML in the part into an XDocument instance.
xDoc = XDocument.Load(XmlReader.Create(documentPart.GetStream()))
' Find the styles part. There will only be one.
Dim styleRelation As PackageRelationship = documentPart.GetRelationshipsByType(stylesRelationshipType).FirstOrDefault()
If (styleRelation IsNot Nothing) Then
Dim styleUri As Uri = PackUriHelper.ResolvePartUri(documentUri, styleRelation.TargetUri)
Dim stylePart As PackagePart = wdPackage.GetPart(styleUri)
' Load the style XML in the part into an XDocument instance.
styleDoc = XDocument.Load(XmlReader.Create(stylePart.GetStream()))
End If
End If
End Using
Dim defaultStyle As String = _
( _
From style In styleDoc.Root.<w:style> _
Where style.@w:type = "paragraph" And _
style.@w:default = "1" _
Select style _
).First().@w:styleId
' Following is the new query that finds all paragraphs in the
' document and their styles.
Dim paragraphs = _
From para In xDoc.Root.<w:body>...<w:p> _
Let styleNode As XElement = para.<w:pPr>.<w:pStyle>.FirstOrDefault() _
Select New With { _
.ParagraphNode = para, _
.StyleName = GetStyleOfParagraph(styleNode, defaultStyle) _
}
For Each p In paragraphs
Console.WriteLine("StyleName:{0}", p.StyleName)
Next
End Sub
End Module
当此示例应用于 创建源 Office Open XML 文档 中说明的文档时,会生成以下输出。
StyleName:Heading1
StyleName:Normal
StyleName:Normal
StyleName:Normal
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Normal
StyleName:Normal
StyleName:Normal
StyleName:Code
后续步骤
在下一个主题检索段落的文本中,我们将创建一个查询,以检索段落的文本。