Analysing Column Data using C# (... with a little Custom Drawing thrown in for fun)

When presented with a table of data in Excel, sometimes it is useful to be able to learn some quick facts about it. One quick fact which is often useful is the distribution of values within a particular column. For example, if we have a column called Region it is useful to  know which regions occur more often than others. Perhaps 50% of the column is North America, 30% Europe and 19% Pacific. Excel has all the tools we need to do this but we just need to tie them together in a way that suits us.

Additionally, it would be nice if we displayed this information graphically; perhaps using some form of simple histogram to compliment the numerical values and provide a tiny bit of visualisation.

In this post, we'll write a C# addin that does such a thing.

Here's a screenshot of the addin we'll build.

Getting Started

We won't cover the basics of getting up-and-running with C# Excel addins in this post. I have already covered that briefly here. Instead, we'll fast forward to the stage where we've already got our basic addin project and now we're ready to add the fun bits.

Note: This addin uses the ExcelUtility class I published previously in this post.

Designing the Custom Ribbon

We'll add a custom tab to the Ribbon called "Excel Extensions". On this tab, we'll place a button titled "Analyse Column" inside a group titled "Table Extensions".

Here's our Ribbon XML:

<?xml version="1.0" encoding="utf-8" ?>

<customUI onLoad="OnRibbonLoad" xmlns="https://schemas.microsoft.com/office/2006/01/customui">

  <ribbon>

    <tabs>

      <tab id="ExcelExtensionsTab" label="Excel Extensions">

        <group id="TableCommandsGroup" label="Table Extensions">

          <button id="ColumnAnalysis" onAction="OnColumnAnalysisClicked" label="Analyse Column" getImage="GetColumnAnalysisButtonImage" getEnabled="GetColumnAnalysisEnabled"

 supertip="Analyse a table column and view statistical details about the values in the column."

 size="large"/>

        </group>

      </tab>

    </tabs>

  </ribbon>

</customUI>

We'll store this in an XML file as part of our solution. Then we'll link the XML file into our assembly at compile time as an embedded resource i.e., the XML is compiled into our assembly. This means that we don't have to distribute the XML file with our addin and we use the .NET Assembly class to access the XML resource at runtime. 

Here's our implemention of the IRibbonExtensibility.GetCustomUI method:

public string GetCustomUI(string RibbonID) {

  Assembly assembly = Assembly.GetExecutingAssembly();

  using (System.IO.Stream stream = assembly.GetManifestResourceStream ("ExcelExtensions.ExcelExtensionsRibbon.xml")) {

                return new System.IO.StreamReader(stream).ReadToEnd();

   }

}

It would be nice if our Analyse Column button was enabled only when the user has clicked inside a table. To do this, we need to interact with the Ribbon API a little more.  

In our XML, we specify a getEnabled="GetColumnAnalysisEnabled" attribute on the button. This tells the Ribbon that whenever it needs to draw our button it should first call the GetColumnAnalysisEnabled method. This method will return a true/false value indicating whether the button should be displayed as enabled or not.

The Ribbon will call GetColumnAnalysisEnabled when it first displays our custom tab. But it won't call it again. So, how do we toggle the state of our button dynamically at runtime?

When our addin is first loaded, we are given a reference to the main Ribbon instance. This is an object of type IRibbonUI. One of the methods on this interface is Invalidate. When we call this method, the Ribbon invokes all of our getEnabled functions. It is our responsibility to make sure that these functions return the correct true/false value.

This seems a little weird to begin with. It is a side affect of the declarative nature of the Ribbon API; that is, we are not given pointers or references to the Ribbon controls at runtime so we need some callback approach to enable us to change the controls' appearances at runtime. 

The Ribbon doesn't know anything about our application logic so it doesn't know when we want to enable/disable buttons or dynamically change the Ribbon. To avoid constantly calling into all addins, the Ribbon declares methods by which all addins can call into it and force it to redraw part or all of their custom Ribbon tabs.

Because we want to enable/disable our button based on the cell that the user selects, we want the Ribbon to redraw our button every time the sheet selection changes. When the user selects a cell, we want to force the Ribbon to invoke our GetColumnAnalysisEnabled method. In this method we will determine whether or not the user clicked inside a table and return true/false appropriately.

First, let's sink the SheetSelectionChange event and invalidate the Ribbon whenever SheetSelectionChange fires.

protected void SinkExcelEvents() {

  _application.SheetSelectionChange += new Excel.AppEvents_SheetSelectionChangeEventHandler(_application_SheetSelectionChange);

}

   

void _application_SheetSelectionChange(object Sh, Excel.Range Target) {

  _ribbonUI.Invalidate();

}

Next, we implement the GetColumnAnalysisEnabled method. This method returns true if the selected cell is inside a table and false otherwise.

public bool GetColumnAnalysisEnabled(IRibbonControl Control) {

  return ColumnAnalysisExtension.EnableUI(this);

}

public static bool EnableUI(Connect Connection) {

  Excel.ListObjects listObjects = ((Excel.Worksheet)Connection.ExcelApp.ActiveSheet).ListObjects;

  Excel.Range activeCell = Connection.ExcelApp.ActiveCell;

  foreach (Excel.ListObject listObj in listObjects) {

    if (listObj.SourceType == Excel.XlListObjectSourceType.xlSrcRange) {

      List<object> param = new List<object>();

      Excel.Range intersection = ExcelUtility.ApplicationInvoke<Excel.Range>(Connection.ExcelApp, "Intersect", activeCell, listObj.Range);

      if (intersection != null){

        return true;

      }

    }

  return false;

}

The GetColumnAnalysisEnabled method is defined on our Connect class (the class that implements IDTExtensibility2 and IRibbonExtensibility). But we've seperated the implementation of the column analysis extension into a different class called ColumnAnalysisExtension.

So, the Ribbon invokes Connect.GetColumnAnalysisEnabled which in turn invokes ColumnAnalysisExtension.EnableUI.  

We determine whether or not the active cell is in a table by looping through all the tables in the book and calculating the intersection of each table's range with the active cell. If the intersection is non null then the active cell is in the table's range; otherwise, it's not. 

So, when the user clicks inside a table our button is enabled and when they click outside a table the button is disabled.

Calculating the Distribution of Values in a Column

There's no point writing our own code to count the occurances of values in an Excel range when we can just get Excel to do it. So, we'll make use of the existing COUNTA and COUNTIF worksheet functions to perform the actual calculations.

For each value in the column we'll gather two values: the number of occurances and the percentage of occurances. We'll store this data in a Dictionary<string, List<double>> object.

Therefore, we need to write a method that does the following:

  1. Count the number of rows in the column by evaluating =COUNTA(<column range>); 
  2. Determine all the unique values in the column;
  3. For each unique value:
    1. evaluate =COUNTIF(<column range>, <value>) to calculate the number of occurances of the value in the column;
    2. Store the number of occurances and the percentage of occurances in our dictionary object;

Here's the code:

protected void CalculateStats() {

  string eval = "=COUNTA(" + ExcelUtility.RangeInvoke<string>(_columnRange, "Address",

                null, null, Excel.XlReferenceStyle.xlA1) + ")";

  double totalCount = (double)_connection.ExcelApp.Evaluate(eval);

  

  _stats = new Dictionary<string, List<double>>();

  foreach (Excel.Range cell in _columnRange.Cells) {   

    try {

          string val = cell.Value2.ToString();

          if (!_stats.ContainsKey(val)) {

            _stats.Add(val, new List<double>());

          }

    }

    catch (Exception) {

    }

  }

 

  foreach (KeyValuePair<string, List<double>> entry in _stats) {

    eval = "=COUNTIF(" + ExcelUtility.RangeInvoke<string>(_columnRange, "Address",

           null, null, Excel.XlReferenceStyle.xlA1) + ", \"" + entry.Key + "\")";

    double count = (double)_connection.ExcelApp.Evaluate(eval);

    entry.Value.Add(count);

    entry.Value.Add(count / totalCount);

  }

}

Displaying the Data using a DataGridView Control and Custom Drawn Cells

The next stage is to display the data to the user. For this, we'll write a WinForm which contains a DataGridView control. In the DataGridView we'll list each value along with the value's statistics. However, inserting rows of data into a DataGridView control isn't that interesting so we'll not discuss that here. Instead we'll concentrate on the nicer bit: using custom drawn DataGridView cells.

One of the big advantages of using .NET to write Excel addins is that we have access to all the .NET framework class library (FCL). This brings about some exciting possibilities for Excel addins, one of which is using WinForms to build good looking and modern user interfaces.

The DataGridView control has a really nice feature which makes it very easy for us to perform custom drawing for cells in the control. It allows us to define a custom column class which the DataGridView will use as if it was an inbuilt column class. 

Then we define our custom cell class. This is the class which defines the cells in our custom column. The cell class extends one of the inbuilt DataGridViewXXXCell classes (such as DataGridViewTextBoxCell). All we have to do is override the Paint method of the base class and perform the custom drawing that we want to do.

Custom drawing doesn't get much easier than this! 

Let's remind ourselves what we want to draw.

We want to draw a rectangle whose size represents the percentage value of the item. So, North America has a value of 50.8% so we want the rectangle to fill (horizontally) 50.8% of the cell. In other words, we want something similar to conditional formatting in Excel, but in a DataGridView control. 

Notice that we also want to display the value itself as a number and we still want the cell borders and alternating background colours to be drawn.

The first step, is to define our custom column class. We'll call this DataGridViewBarChartColumn and its definition is as follows:

public class DataGridViewBarChartColumn : DataGridViewColumn {

        public DataGridViewBarChartColumn() {

            this.CellTemplate = new DataGridViewBarChartCell();

            this.ReadOnly = true;

        }

    }

The CellTemplate property specifies the class which defines the cells that occur in the column. In this case its DataGridViewBarChartCell. This is our custom cell class and it is in this class that we do the custom drawing.  

The DataGridViewBarChartCell definition is as follows.

class DataGridViewBarChartCell : DataGridViewTextBoxCell {

        protected Color _barColor = Color.FromArgb(50, 0, 0, 0);

        protected override void Paint(Graphics graphics,

            Rectangle clipBounds,

            Rectangle cellBounds,

            int rowIndex,

            DataGridViewElementStates elementState,

            object value,

            object formattedValue,

       string errorText,

            DataGridViewCellStyle cellStyle,

            DataGridViewAdvancedBorderStyle advancedBorderStyle,

            DataGridViewPaintParts paintParts) {

            base.Paint(graphics,

                clipBounds,

             cellBounds,

                rowIndex,

                elementState,

                value,

                formattedValue,

                errorText,

                cellStyle,

                advancedBorderStyle,

                paintParts);

            float val = 0.0F;

            try {

                val = Convert.ToSingle(value);

                val /= 100.0F;

            }

            catch (InvalidCastException InvalidCastEx) {

                return;

            }

            RectangleF barRect = cellBounds;

            barRect.Width *= val;

            using (SolidBrush brush = new SolidBrush(_barColor)) {

                graphics.FillRectangle(brush, barRect);

            }

        }

    }

DataGridViewBarChartCell is a subclass of the inbuilt DataGridViewTextBoxCell class. Extending the inbuilt class means that all we have to do is override the paint behaviour of the DataGridViewTextBoxCell class and let the DataGridViewTextBoxCell class handle everything else.

In the Paint method you can see that we are automatically provided with a Graphics object and a bounding rectangle in which to paint. All we need to do is calculate the width of the rectangle and then call Graphics.FillRectangle.

Finally, notice that the first thing we do in the Paint method is call base.Paint. This means we don't have to care about drawing the cell's value or the background or the borders because the base class will do that for us. All we do is then draw on top of what the base class draws. If we didn't call base.Paint then the cells would only contain the filled rectangle and nothing else.

Hooking it all Together

To make the DataGridView control use our custom column class, we set the type of the appropriate column in the DataGridView control to DataGridViewBarChartColumn.  

The final stage is to display our window when the user clicks the button in the Ribbon. Looking back at our Ribbon XML we defined the onAction callback for the button as onAction="OnColumnAnalysisClicked". This means that the Ribbon will call the OnColumnAnalysisClicked method on our addin when the user clicks the button, so, all we need to do is define the OnColumnAnalysisClicked method and make it public.

public void OnColumnAnalysisClicked(IRibbonControl Control) {

  ColumnAnalysisExtension ext = new ColumnAnalysisExtension(this);

}

In this method we are creating a new ColumnAnalysisExtension object. The constructor of this class simply calls our CalculateStats method and then displays the window containing the DataGridView control (it does one or two other things too).

public ColumnAnalysisExtension(Connect Connection) {

  _connection = Connection;

  BuildColumnRange();

  if (_columnRange != null) {

   CalculateStats();

    DisplayStatsWindow();

  }

}

Summary

  • Use the IRibbonUI.Invalidate method to force the Ribbon to redraw your custom Ribbon controls when you need their state to change;
  • To avoid having to ship your Ribbon XML, compile it as an embedded resource into your assembly;
  • Using .NET to write user interfaces offers exciting UI possibilities, such as easily-to-code custom drawn controls;
  • To custom draw cells in the DataGridView control, subclass the DataGridViewColumn class and one of the inbuilt DataGridViewXXXCell classes;
  • The code for the custom drawing should override the Paint method of the custom cell's base class;

Download the Excel Extensions Addin

The Excel Extensions addin and full source code can be downloaded from the MSDN Code Gallery here.

There are two downloads which can be downloaded from the Releases tab. There are: ExcelExtensionsSrc.zip and ExcelExtensions.msi.

ExcelExtensionsSrc.zip contains the complete source code and ExcelExtensions.msi is the set up program.

Over time, I'll be adding additional feature to the addin.

Comments

  • Anonymous
    March 11, 2008
    I'm currently away and cannot test your code but what about transparency when the value reaches 80 or 90%? TIA

  • Anonymous
    April 10, 2008
    When the data bar overlaps with the number in the cell, we can still read the number. That is, the shading is not completely opaque. The colour of the bar is: Color.FromArgb(50, 0, 0, 0) Gabhan.

  • Anonymous
    June 24, 2008
    The comment has been removed

  • Anonymous
    September 21, 2008
    I installed your set in Office 2007 machine, but how to enble the add-in.. it is still disable even if i added some values in column.

  • Anonymous
    September 22, 2008
    Dileep, the values need to be inside a Table. You can create a Table by selecting the range of data and clicking on the "Format as Table" button on the Home Ribbon tab.