Jaa


Using OMPM Part 1 - Identifying Document Conversion Candidates and Estimating Storage Savings

Brought to you by our compatibility guru Curtis Sawin.

Still using OMPM? Give Office Telemetry Dashboard a try!

Overview

Using Office Migration Planning Manager (OMPM) to identify “high-risk” binary format documents—that is, xls, doc, and ppt files that will have conversion issues--is a great way to determine which documents should not be converted after an Office 2010 deployment. Conversely, you also use OMPM to identify “low-risk” binary documents that are good candidates for conversion to Open XML, and then use another OMPM tool, ofc.exe, to bulk convert all the identified documents. Bulk conversions, you ask? Although many companies tell us they aren’t interested in bulk converting documents, and we actually recommend against converting documents as part of an Office 2010 deployment project, there are still good reasons to bulk convert: namely, reducing your network storage needs by 50% or more. This can translate into significant (and measurable) financial savings.

To convert or not to convert

The overview seemed to provide a bit of a contradiction. First, we say “don’t convert as part of an Office deployment project,” and then we immediately get excited about some benefits of conversion. What gives?

First of all, converting documents as part of an Office deployment project can lead to problems, especially broken links. Office 2010 uses Compatibility Mode when opening binary files and disables certain features that are not backwards compatible, ensuring that the binary files remain compatible with previous versions of Office. Using Compatibility Mode requires no effort or configuration on your part, and users can continue working in their files as they did with previous versions of Office. So, if the new features are not desired in existing files, why do the extra work? Part of the Office 2010 migration project should be to identify faster, simpler ways to migrate to the new platform. Not taking on work that doesn’t add business value is a great way to save money. So, when preparing to deploy Office 2010, make sure you’re focusing only on tasks that are “deployment enablers.”

Spoiler alert: Using OMPM (or other tools) to scan your environment for document conversion issues is NOT a “deployment enabling” task. More on this later.

Once you’ve migrated to Office 2010, you’re in a position to take advantage of the new features in all new documents you create. This is also a great time to determine if performing bulk document conversion is worth the effort. Leveraging features that enhance productivity (helloooo Paste Preview and SmartArt!) and reducing your network storage needs are great “environment optimization” tasks that allow you to realize the value of your investment in Office 2010.

Using OMPM to identify low risk files – aka, conversion candidates – is a post-Office deployment task that helps you determine if bulk conversion is worth the effort. The data gathered by OMPM can be used to help determine the return on investment (ROI) of bulk conversion. Meaning, by using OMPM, you can help answer the question “What will my storage savings be if I convert a bunch of documents?” In an environment where the IT organization implements a charge-back model for network storage, using OMPM could also help answer the question “How much money will I save if I convert my docs?” Often, IT organizations are viewed as “costing the business money.” Using OMPM can help change that perception to demonstrate how IT is “saving the business money.”

So what are “low risk” files?

Documents that are good candidates for conversion are ones where the projected impact to the business is minimal. When defining “low risk” for your organization, you could apply specific business rules, such as “exclude recently modified documents,” and combine that with specific data returned from OMPM, such as “exclude documents with yellow or red document conversion issues” and “exclude documents with macro issues.” What remains are documents that meet the following criteria:

  • Docs that haven’t been modified in n days (e.g., n = 30)
  • Docs that have only “green” conversion issues identified by OMPM
  • Docs without any conversion issues identified by OMPM
  • Docs that do not have macro issues identified by OMPM

OMPM categorizes potential document conversion issues into categories that indicate the severity of the issues. “Green” issues are mostly benign and will most likely have no impact. Examples of such green issues are Excel files that use labels in formulas (which are automatically converted in Excel 2010) or have charts in them. “Yellow” or “Red” issues are potentially more severe, and conversion of such documents may result in data or functionality loss.

Files with macro issues fall into two categories: (1) files that contain macros that use object model items that are either changed, removed, or deprecated since previous versions of Office, and (2) macros that call functions that are not specifically marked as compatible with 64-bit versions of Office. While OMPM doesn’t provide detailed information about the impact of these macro issues, we are excluding them from our potential conversion list…to reduce our risk.

Macro issues are listed separately from Red, Yellow, and Green issues. Meaning, a file could have 3,245 macro issues identified and listed as a “green” document for conversion issues. Or it could be listed as file with no conversion issues. This is an important distinction to note when we build our “low risk” filter.

We are also excluding recently modified files. For instance, we won’t convert any file that has been modified in the past 30 days. This will further ensure that we are only converting documents that will most likely never be used. Even if some of them will be used, we’re only converting ones with virtually no conversion risk. If 30 days isn’t sufficient for your environment, increase that number.

Note: the OMPM reporting tool only exposes the “last modified” date of files. It does not provide information about the “last accessed” date of files. Boo!

How To…

The basic process to convert your documents is below:

Step 1 – Gather your data:

  • Identify network storage that contains Office documents
  • Download the OMPM toolset
  • Scan for Office documents on network storage using OffScan.exe
  • Create an OMPM database
  • Import the scan results in the OMPM database using ImportScans.bat

To learn more about how to do these steps above (except for identifying your storage), check out our TechNet documentation for OMPM.

Step 2 – Analyze your data:

To determine which files are conversion candidates:

  • Use the OMPM reporting tool to create a “low risk” filter.

To estimate the storage savings of conversion:

  • Execute a “low risk” SQL query in SQL Server Management Studio to retrieve file size data

Step 3 – Do the work:

To convert the conversion candidates:

  • Export the filtered list using the OMPM reporting tool
  • Configure OFC.ini to use the filtered list and execute OFC.exe.

This article will cover step 2. A separate article will discuss step 3. Why two separate articles? Step 2 is all about determining “Is it worth it to convert my documents?” If you determine that it’s worth it (from a financial perspective), go on to step 3, which is about performing the work. If you determine it’s not worth it, stop reading, and provide the data to you management that justifies your decision. You’ve now just saved yourself and your company some effort, which you’ve determined is not worth potential savings. 

Creating the low risk filter in the OMPM Reporting Tool

Open up the OMPM Reporting tool (OMPM.accdr) and select the OMPM Compatibility Link. In the Select a File Filter section, navigate to the bottom of the section and select the Customize SQL button. The following WHERE clause can be used to meet or criteria:

WHERE MaxIssueLevelID > 2 AND DATEADD(d,-30,GETDATE()) > ModifiedDate AND FileID not in (SELECT FileID from Uv_FilterMacroIssue)

After adding the query, selecting the Apply Filter button will return all the files that meet the criteria.

Here’s what it looks like in the OMPM reporting tool

In the above example, about 74% of my documents are conversion candidates, as they meet my low-risk criteria.

If you’re not familiar with SQL, the above may look a bit confusing. The table below breaks down the query.

Estimating the potential storage savings (or determining your ROI)

Now that I know how many documents (and what percentage of my documents) are conversion candidates, it would be valuable to know my estimated storage savings if I decided to convert them. While our documentation on TechNet indicates the Open XML files are “up to 75% smaller” than binary files, in the field we’re seeing 50-60% reduction in size when converting documents. For planning purposes, we recommend you estimate a 50% reduction in storage needs when converting document to the Open XML format.

The OMPM reporting tool doesn’t provide the cumulative size of all files listed. So you could copy and paste (blecch!) all the data from the OMPM reporting tool (using the Scanned Files tab) into Excel (for example) and then sum up the File Size column. Not too elegant and quite cumbersome.

An easier method is to connect to the OMPM database using SQL Server Management Studio (SSMS) and execute a query against the database directly. Here are some simple steps you can perform to do this:

1. Open SQL Server Management Studio.
2. Connect to the server that contains the OMPM database.
3. Select the New Query button.
4. In the query editor window, copy and paste the below text:

SELECT

     SUM(Cast(Size as BigInt))/1024/1024/1024

FROM

     Uv_File

WHERE

     (MaxIssueLevel = 'No Issues' or MaxIssueLevel = 'Green')

AND

     DATEADD(d,-30,GETDATE()) > ModifiedDate AND

     FileID not in (SELECT FileID from Uv_FilterMacroIssue)

This is very similar to the filter used in the OMPM Reporting tool, except we have access to more tables and views. As a result, we are able to use the MaxIssueLevel fields to specify “Green” and “No Issues” rather than the MaxIssueLevelID values. This makes the query a bit easier to understand.

The result is a single value that represents the total number of gigabytes used by all of the files that are conversion candidates. The amount of data were evaluating could be huge, and the file size is represented in bytes. This is why we’re using the CAST function to convert the File Size data into a data type that can handle very large numbers. We’re then dividing by 1024 three times to convert the bytes to kilobytes, then megabytes, then gigabytes.

Here’s an example of this query and the resulting sum.

In the above screen shot, we see that 44 GB are being used by all of our low-risk documents. To estimate the storage savings, we’ll use a very simple formula:

 

This gives you the ability to determine, “Is it worth it?” Meaning, we can take the estimated storage savings and determine if the ROI of performing a bulk file conversion is worth the investment. Again, if you’re in an environment where the IT department implements a chargeback model for network storage, you can now estimate the amount of money you will be saving your customers. Typically, chargebacks are recurring, so you would provide the amount of money you would save…per year! (Side note: Some people LOVE to see 3-, 4-, or 5-year forecasts. Being able to provide “Here is my 5-year savings forecast for this project” could really make people notice the value of document conversion!)

Summary

The recommended use for OMPM is to identify documents which are good candidates for conversion and then convert those documents. This is an activity that is best undertaken after Office 2010 has been deployed. In short, use OMPM as an analysis tool to help determine if there is an adequate return on investment for a document conversion effort.

Comments

  • Anonymous
    January 01, 2003
    Andy, Couple things. You could separate your scans into multiple, smaller scans and import the scan results into separate databases.  Also, the OMPM reporting tool (OMPM.accdr) has a limit of 1,000,000 rows to display. So keep in mind that creating multiple databases and executing multiple scans increases the overhead/effort of scanning your documents.  So make sure you're doing it for the right reasons. Meaning, if you're trying to figure out which files are good candidates for conversion, then OMPM is great.  If you're looking to OMPM to help answer the question "Will my files work in Office 2007/2010" you're gonna be sad; as the red, yellow, and green information provided by OMPM won't help you get closer to determining if your stuff works. Curtis

  • Anonymous
    May 14, 2012
    Can you suggest an alternative to using SQL Server Express that doesn't have a maximum of 100,000 scannable files limit but avoiding the costs of the full SQL server? I will need something that runs into millions of scannable files. I was thinking of something like MySQL but I am by no means a DB expert so any advice will be greatfully received.

  • Anonymous
    October 23, 2012
    Thanks for your responce.   Great articles by the way :) Thanks

  • Anonymous
    November 13, 2012
    Me again.  I keep getting the following error on one set of folders I am scanning HRESULT = 0x80070008 Can anyone shed any light onto what might be the cause?  I have stopped the scan and restarted from another machine but this started again. Any ideas? Thanks