Separating Out the Code and Comments
[Back to the Table of Contents] This blog is inactive. New blog: EricWhite.com/blog
Next, we want to separate the code and comments from the rest of the text. The first thing that we can do is to introduce a new boolean member to our anonymous type that indicates if the paragraph is either code or a comment:
.Select(p =>
new {
ParagraphNode = p,
Style = GetParagraphStyle(p),
CommentOrCode =
GetParagraphStyle(p) == "Code" ||
GetParagraphStyle(p) == "CommentText",
ParaText =
p
.Elements(w + "r")
.Elements(w + "t")
.StringConcatenate(t => (string)t)
}
)
When implementing this functionality, there are two options. We can do it as above, which calls GetParagraphStyle three times for each paragraph. Given that GetParagraphStyle is evaluated lazily, and is moderately efficient, this is ok. Another approach is like this:
wordDoc
.Element(w + "body")
.Descendants(w + "p")
.Select(p =>
new {
ParagraphNode = p,
Style = GetParagraphStyle(p),
ParaText =
p
.Elements(w + "r")
.Elements(w + "t")
.StringConcatenate(t => (string)t)
}
)
.Select(p =>
new {
ParagraphNode = p.ParagraphNode,
Style = p.Style,
CommentOrCode =
p.Style == "Code" ||
p.Style == "CommentText",
ParaText = p.ParaText
}
)
.ForEach(
p =>
Console.WriteLine("{0} {1} {2}",
p.CommentOrCode.ToString().PadRight(6),
p.Style.PadRight(12),
p.ParaText
)
);
This is also lazily evaluated. It does create two anonymous types every time it iterates through the list instead of one. Either approach is O(n), so it doesn't really matter. Probably the second approach is slightly more efficient, but I would have to do more experiments to be able to say for sure. Certainly, if GetParagraphStyle were expensive, the second approach would be better.
When run, this outputs:
False Heading1 This is a heading.
False Default
False Default This is some normal test.
False Default
False Default See the following code for an example of how to do something:
False Default
True Code using System;
True CommentText <Test SnipId="000101" TestId="0001" Lang="C#9">
True CommentText <!-- validation instructions go here -->
True CommentText </Test>
True Code using System.Collections.Generic;
True Code using System.Text;
True Code using System.Query;
True Code using System.Xml.XLinq;
True Code using System.Data.DLinq;
True Code
True Code namespace WordMLReader
True Code {
True Code class Program
True Code {
True Code static void (string[] args)
True Code {
True Code Console.WriteLine("Hello");
True Code }
True Code }
True Code }
False Default
False Default This is more text.
False Default
True Code using System.Text;
True CommentText <Test SnipId="000201" TestId="0002" Lang="C#9">
True CommentText <!-- validation instructions go here -->
True CommentText </Test>
True Code using System.Query;
True Code using System.Xml.XLinq;
True Code using System.Data.DLinq;
True Code
True Code namespace WordMLReader
True Code {
True Code class Program
True Code {
False Default
This is what we expected.
Next: Retrieving the Two Code/Comment Groups
Comments
- Anonymous
August 30, 2008
http://www.mixx.com/users/types_of_flowers