Create a Concordance for a Word Document from C# using Automation
I was talking to my son about some of his homework assignments. He had to write several essays. We talked about the power of Microsoft Word and how I had to count the words manually when writing a 5000 word essay on a typewriter, but Word shows the word count easily.
Then we talked about the utility of a concordance and how to create one.
Below is a simple C# program that you can run that will open a Word document (or you can have it create a new one, and have it type some text), retrieve the text, count the occurrences of each word into a dictionary, then output the result into an Excel spreadsheet. The sample needs nothing more than office installed and Visual Studio. I suspect that any version of each will suffice.
The code uses C# dynamic for which types are not checked at compile time, and you don’t get intellisense.
Of course, you can accomplish the same thing with VBA in a macro (Excel or Word).
Years ago, I wrote the code to enable office intellisense in Visual FoxPro, even though it has no strong typing like C# dynamic. It still works today
Under the covers, dynamic and Visual Foxpro user something called OLE Automation which is still very useful today.
Of course, the code below can be modified to have an exclusion list (the number of “the”s in a document may not be interesting) and even to directly insert the concordance data directly into the target document.
<code sample>
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows;
using System.Windows.Controls;
namespace WpfApplication1
{
// Start Visual Studio
// File->New->Project->C# WPF application
// paste in this code
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
this.Loaded += (ol, el) =>
{
try
{
/*
when i don't know how to manipulate Word or Excel, i do this:
View->Macro->Record a macro
then edit the macro
or i look at MSDN documentation:
Excel: https://msdn.microsoft.com/en-us/library/office/ff194068.aspx
Word: https://msdn.microsoft.com/en-us/library/office/ff837519.aspx
or open a VS developer command prompt and
run oleview on "C:\Program Files (x86)\Microsoft Office\Office16\Excel.exe"
or open oleview, navigate to TypeLibraries->Microsoft Word 16.0 Object Library
*/
var dict = new Dictionary<string, int>();
var dtStart = DateTime.Now;
// also works with
// VisualFoxPro.Application
// PowerPoint.Application
// Outlook.Application
//
var typeWord = Type.GetTypeFromProgID("Word.Application");
dynamic oWord = Activator.CreateInstance(typeWord);
oWord.Visible = true;
//dynamic oDoc = oWord.documents.add(); // to create a new document
dynamic oDoc = oWord.RecentFiles(1).Open();
// you can type text like so
//oWord.Selection.TypeText("Hello there");
// save the file
//var filename = System.IO.Path.Combine(
// Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments),
// "temp.docx");
//oDoc.saveas(filename);
int nWrdCnt = 0;
// expand the selection to everything
oWord.Selection.WholeStory();
// get all the text of the document
var allText = oWord.Selection.Text as string;
// split the text into an array of words
var words = allText.Split(new[] { ' ' });
foreach (var txt in words)
{
// Is it in our dictionary?
if (dict.ContainsKey(txt))
{
dict[txt]++;
}
else
{
dict[txt] = 1;
}
nWrdCnt++;
}
oWord.Quit();
// it's actually faster to output the data as a CSV file,
// then start excel with it,
// but that's less flexible
//*
var sb = new StringBuilder();
// add a header
sb.AppendLine("Word, Count");
// iterate through the dictionary in descending order by value
foreach (var kvp in dict.OrderByDescending(kvp => kvp.Value))
{
sb.AppendLine($@"""{kvp.Key}"",{kvp.Value}");
}
var fileName = System.IO.Path.Combine(
System.IO.Path.GetTempPath(),
"t.csv");
System.IO.File.WriteAllText(fileName, sb.ToString());
System.Diagnostics.Process.Start(fileName);
/*/
var typeExcel = Type.GetTypeFromProgID("Excel.Application");
dynamic oExcel = Activator.CreateInstance(typeExcel);
var oWrkBook = oExcel.Workbooks.Add();
var oSheet = oWrkBook.ActiveSheet;
int nRow = 1;
foreach (var kvp in dict.OrderByDescending(kvp => kvp.Value))
{
oSheet.Cells[1][nRow] = kvp.Key.StartsWith("=") ? @"'=" : kvp.Key;
oSheet.Cells[2][nRow] = kvp.Value;
nRow++;
}
oExcel.Visible = true;
//*/
this.Title = $"# of words = {nWrdCnt} Duration = {(DateTime.Now - dtStart).TotalSeconds}";
var lv = new ListView()
{
ItemsSource = from kvp in dict
orderby kvp.Value descending
select new
{
V = $"{kvp.Value} {kvp.Key}"
}
};
this.Content = lv;
}
catch (Exception ex)
{
this.Content = ex.ToString();
}
};
}
}
}
</code sample>
Comments
- Anonymous
February 28, 2016
Very nice thought process and its implementation too.