Sdílet prostřednictvím


Super Powers for Excel – Offloading Excel Calculations to a Cluster or the Cloud

Background

 

Many, many companies and individuals use Excel. Informal estimates have the number of Excel users at over 500 million. Excel is used for everything from making a grocery list to managing a 15 billion dollar portfolio of complex financial instruments to mission planning for the de-orbiting of a satellite.

 

It is the ubiquitous application in financial services across the Insurance, Capital Markets and Banking verticals is Excel. Other industries, including Pharmaceutical, Health Care and finance departments within large multinationals also are significant users of Excel. 

 

Some users of Excel stretch the limits of its capabilities, both of the application itself and also the limitations of the hardware they are using to perform their calculations. These aren’t typical users of Excel but “Power users” that are creating spreadsheets that are so big and complicated that many of them run for hours, days and sometimes weeks. They are in great need of a solution that enables them to reach their answer significantly sooner than they are currently able.

 

Today, such users are going through great pains to optimize their solutions to reduce the time to solution but have limited methods and solutions to do so. Most will opt for the traditional approach of simply buying a bigger, more powerful machine to run on. In the past, going from a single core desktop running at 1 GHz and 1 GB of ram and moving to a single core desktop running at 3 Ghz and 3 GB of ram provided a very attractive performance multiplier without ever having to “touch” the spreadsheet that was suffering.

 

The problem is that clock speeds have leveled off over the past 2-5 years and computers with significantly higher clock speeds just aren’t available. In 1989, some of the earliest PC-based servers were running on the order of 33Mhz. Fast forward to 2005, just as dual-core processors were going mainstream, you could buy a Pentium 4 x86 server that ran at 3.4GHz. This increase in clock speed allowed a very simple/affordable path by which a user could easily “scale up”.

 

While Moore’s law has been going and continues to go strong, clock frequencies have stopped increasing due to the basic physics of power density on silicon. Moore’s law states that the number of transistors that can be placed on an IC doubles every 2 yrs. For example, today you could buy an x86 server running up to four 6-core Intel Xeon processors, at 2.67 GHz, slower than our single-core x86 servers 5 years ago. 

The traditional “scale up” approach has now turned into a parallel, or “scale out” problem. Scaling out could be running in parallel across multiple cores and/or scaling out beyond the computer/node to running in parallel across multiple computers/nodes and lastly, scaling out could be scaling out from your environment to run in parallel across multiple virtual machines in the cloud.

 

Many applications like Excel and the applications written for Excel, weren’t originally designed to run in parallel and need to be re-engineered and re-thought in order to take advantage of a parallel architecture.

 

HPC (High Performance Computing) Services for Excel

 

Many organizations expressed their challenges with long running workbooks and asked Microsoft to look into tools/methods Microsoft could provide in order to reduce run times. The result was HPC Services for Excel.

 

HPC Services for Excel is a comprehensive set of tools for running Excel spreadsheet calculations on a Windows HPC 2008 R2 cluster. With HPC Services for Excel, you can build applications that run Excel workbooks in parallel for faster overall calculation.

 

The basic premise behind HPC Services for Excel is being able to take calculations that are taking too long to perform in Excel running on a single computer and “offloading” or moving those calculations to run on computers that are part of a High Performance Computer cluster. Microsoft has a product, Windows HPC 2008 Server, R2 which can be installed and configured on a group of commodity computers aka “nodes”, networked together, to run computationally intense work. A Windows HPC system is comprised of a head or “master” node and one or more “compute” nodes. The Master node acts as the brains of the cluster managing what work gets scheduled to be run on the compute nodes and providing a single point of management and administration of the cluster.

 

Example:

 

Below is an example of results from a project for a firm in the capital markets industry. The firm has spreadsheet based calculation designed to price 250 derivative securities.

 

The securities are complex. The models used to price them are also complex – in addition to the spreadsheet calculations, each valuation includes some VBA code, and some third party and external libraries that are called during each pricing calculation.

 

For 250 securities, the spreadsheet takes over an hour on the desktop. By moving the calculations to a 64 core cluster, the firm is able to reduce this to 65 seconds.

 

For each security, the calculation to price it originally took approximately 15 seconds. The graph below show that by going to a 4 core cluster, on average each security takes about 4 seconds. You can see as the application is moved to a greater number of cores, scaling is not linear and this is due to overhead and structuring the calculations. Even though the result isn’t get straight line performance scaling, exceptional performance improvement is gained as cores are added. The increase achieved by this method permits the firm to change how they use this spreadsheet in practice.

 

 

Two Distinct Methods of Calculations: Cluster-Enabled User Defined Functions (UDFs) and HPC Workbook calculations

 

Method 1: Cluster enabled User Defined Functions (UDFs)

 

The first method for accelerating an Excel spreadsheet is by “offloading” certain long running, parallelizable calculations via UDFs to external compute resources. 

 

This method works as follows: A user has a spreadsheet that has a few long running functions that can be run in parallel as UDFs. From the client, the user will execute the calculation of the workbook. When the workbook goes to execute the UDF function that is to be run in parallel, Excel will send a request to the HPC scheduler to execute the UDFs on the cluster. The scheduler will then dispatch a request to one or more compute nodes. On each compute node, the UDF will execute and return its value to the users excel spreadsheet.

 

What is a UDF? A function created by a user that can be included in an excel extension library which in turn allows it to be called from excel like any other excel function. UDFs have been around since 1992 in Excel 4.0. Multi-threading for UDFs was added in Excel 2007 and now Cluster-Enabled UDFs are possilbe in Excel 2010.

 

How do they work? In this case, the UDFs are in XLL files (which are windows Dynamic Link Library (DLL) files with extensions to support the excel interface). When a UDF XLL file is loaded using the excel XLL add-in manager, the DLL will be loaded into the excel process running on the desktop. Using the Excel API, Excel will find the names of the functions that can be typed in the spreadsheet and it will get function pointers to those functions in the library code. XLL use is widespread including many commercially available XLL's for specific uses, particularly prevalent in the financial services industry. The user defined functions can be very flexible. The API supports writing functions that take a wide range of parameters (max. 30) (discrete values, references to cells, ranges, any excel data type) and can return any excel data type.

Overview of running a (cluster enabled) UDF:

Beginning with the 2007 Microsoft Office system, Excel provided support for multi-threaded recalculation (MTR)–enabled UDFs. Office Excel 2007 was the first Excel release to support multi-threaded calculation on client computers with more than one processing core. MTR-enabled UDFs can be run in parallel on multiple processing cores, resulting in better workbook performance. This was one of Microsoft’s first steps in providing a “scale out” method of parallelizing a spreadsheet albeit on a single computer.

 

Office Excel 2010 extends this model to the cluster by enabling Office Excel 2010 UDFs to run in a Windows HPC cluster. In the cluster, UDFs work much like traditional UDFs, except that the calculation is performed by one or more servers. This offers much more flexible potential than running multi-threaded MTR enabled UDF’s in that the scaling can be much greater given the ability to scale beyond a single shared memory computer. Multiple servers can be used to evaluate functions simultaneously. As far as users are concerned, there is no difference between a client computer function and a function running in the cluster.

Pros and Cons of Cluster-Enabled UDFs

Cluster-Enabled UDF Pros

Cluster-Enabled UDF Cons

Performance scaling

Limited on functions (subset of Excel API)

Readily enabled or disabled

Has some overhead

Can Re-use/Modify existing UDF’s/C/C++ codes

Requires some development (C/C++)

 

Requires (re)compilation

Method 2: HPC Workbook Calculations

 

The second method of increasing the performance of Excel spreadsheets is to run entire workbooks on cluster compute nodes.

 

This method works as follows: A user has a spreadsheet that some portion of which can be run in parallel. From the client, the user will execute the calculation of the workbook. Excel will send a request to the HPC scheduler to execute the workbook on the cluster. The scheduler will then dispatch a request to one or more compute nodes. The entire work book is sent to each selected compute node. On each compute node, the service will start an instance of excel (a full excel process will run on the node). It will load the excel workbook and then it will calculate the portion of the spreadsheet that the user wants to have executed in parallel. It is essentially using excel on the compute nodes as a calculation engine.

 

For this method, HPC Workbook Calculations, there are two important things to keep in mind:

  1. 1.      The entire workbook will be calculated on each of the cluster compute nodes
  2. 2.      For this to work, the calculation(s) must be able to be partitioned in some way so they can be run in parallel and those parallel components can be re-assembled

 

Offloading entire workbooks can offer some unique advantages and flexibility because a user can basically run anything and do anything that they would do on the desktop - run VBA, add-ins, XLLs, UDFs. A user can utilize external resources such as databases, files, dependent workbooks, etc. as part of the offloaded calculations. The integrity of the calculations is assured given the same workbook is run on the cluster as is run on the desktop. They are calculated the same way and will get the same results

 

Pros and Cons of HPC Workbook Calculations

HPC Workbook Calculation Pros

HPC Workbook Calculation Cons

Performance increase provided by running parallel calculations

Requires workbook modifications/development

Multiple programming models are available

Only improves iterative calculation portion of workbooks – Monte Carlo simulations, stochastic calculations, parametric sweeps, etc. 

Can be programmed by Users/VBA developers

Requires full instance of Excel to be installed on each compute node

Provides full API access

 

 

When is HPC Services for Excel the appropriate solution?

 

The answer to this is both simple and complex. In a nutshell, the first and most important qualifier is that HPC Services for Excel is likely an appropriate solution if a user has an Excel spreadsheet that takes too long to calculate. How long is too long? That completely depends on the use case. In some cases too long may be days, in others it may be hours, in others it may be minutes. In any case, this solution is only appropriate if some significant portion of calculation time is being taken by a portion of the spreadsheet that can be run in parallel. HPC Services for Excel is also best suited for groups that want to support legacy Excel code and legacy models and ideally have models that are used repeatedly i.e. it may not be worthwhile to the time to create a solution that is only executed one time.

 

UDFs or Workbook Model? Which method is appropriate?

 

UDFs tend to be most appropriate when there is a calculation or function in the spreadsheet that is very computationally intensive and time consuming. Ideally, it would be mostly independent and does not require interaction with external data. Workbook calculations tend to be a good solution when the spreadsheet calculation involves complex interaction between the cells, VBA and potentially external resources. Running Workbook calculations tend to be a bit more flexible in that you can essentially do anything you would do on the desktop on the cluster. 

 

Getting Beyond the Box – Offloading to a Cluster or Cloud

 

All methods of offloading Excel to increase performance require a resource to offload to. Most of what has been described has been offloading to a Microsoft Windows Server High Performance Computing (HPC) cluster. This is the simplest approach for a user to use HPC Services for Excel, in particular if the user is part of the same corporate network as the cluster. The implementation is simpler and there is minimal/no concern over security. Another option is to offload to a local HPC cluster, in particular connect and offload to a local HPC Server headnode which in turn offloads the work to Azure based compute nodes. This provides a great deal of flexibility for scaling in particular if the local HPC cluster is used heavily and doesn’t provide the resources needed to achieve the user’s desired performance. A third option is to offload computations directly to Azure. This requires a HPC cluster to be configured in Azure including a head node and (one or more) compute nodes. The Excel client would connect to and offload work to the Azure cloud based HPC cluster in a similar manner in which it would utilize a local HPC cluster. This third method is will become much simpler to implement in the near future, than it currently is, given the availability of persistent Virtual Machines and VPN capabilities in Azure.

 

Summary

 

It is important in either case that the solution fits in with user’s workflow and user’s environment as seamlessly as possible. They should be able to run the same spreadsheet as before, with minimal interaction, just simply have it run much faster.