Best Practices for Using the HPC Macros to Run Excel Workbooks on a Cluster
The HPC Services for Excel provide a framework of Visual Basic for Applications (VBA) macros that you can implement for a Microsoft Excel workbook to enable the workbook calculations to run in parallel on a Windows HPC cluster. The HPC macros integrate with the SOA client (ExcelClient) and SOA service (ExcelService) that are included in the HPC Services for Excel.
For an overview of development scenarios, see Developing Excel workbooks that can run on a cluster. For a list of resources, see Windows HPC and Microsoft Excel: Resources for Building Cluster-Ready Workbooks.
See also the narrated power point presentation: Best practices on using Windows HPC Server 2008 R2 for Excel 2010.
If you have best practices to share, sign in to the wiki and then click “Edit” to add them to this article!
**Best practices
**
HPC macro framework versioning
In Windows HPC Server 2008 R2, the HPC macros provide a framework for running Excel workbooks on a cluster through the ExcelClient. As this is a developer framework which we expect to support in future versions of HPC server, we have developed a comprehensive versioning plan to ensure that your development investment is sustained across future versions of our product.
To adhere to the versioning guidelines we expect to follow in the future, VBA developers can implement the HPC_GetVersion macro in their workbooks. For Windows HPC Server 2008 R2 (which includes the first version of the HPC Services for Excel), the function must return the value "1.0". With this information, the ExcelClient knows that the workbook has implemented version 1.0 of the HPC macro framework which supports the partition/execute/merge pattern for distributed computation of the Excel workbook. This allows future versions of the ExcelClient to provide expected behavior even if features are added or changed.
When a new version of the HPC macro framework is released, workbooks that wish to use the new framework will have to return the specified version in HPC_GetVersion ("2.0" rather than "1.0" for example). Workbooks can support multiple macro versions by returning a list of versions (either as a string[] or variant[]). If such a workbook is run on a computer with the 1.0 release installed, the 1.0 macros will be used, while a computer with the 2.0 release installed will use the 2.0 macro specification.
Optimizing the partition/execute/merge model
In the Partition/Execute/Merge model, the ExcelClient generates some number (potentially large) of SOA requests based on the local evaluation of a Partition macro. These requests are handled by a cluster with some number (potentially large) of cores all running the Execute macro in parallel. The results of these operations are then fed back to the local workbook, where they are sent to the Merge macro for final processing and result aggregation.
The Partition macro should execute very quickly in relation to the Execute macro
If Partition does not produce requests quickly relative to the time that Execute requires, a sufficiently large allocation of cores can experience idle time because a core will complete its request before a new request is ready to be processed. That is, Partition will not be able to generate requests fast enough to keep all the cores busy.
For example, let’s say that each request (Execute macro) takes 1 second to run, and you send 100 requests across 10 cores. If the Partition macro is instantaneous, then the total time required at 100% efficiency (ignoring communication and other processing) would be 10 seconds (100 requests times 1 second per request, divided by 10 cores). If partition takes 0.1 second to generate each request, then all the requests will be sent in 10 seconds and all processing will be completed in 11 seconds (as the cluster would keep up with the requests as they arrive and only require 1 more Execution stage after the last request), resulting in 90.1% efficiency. If Partition takes 1 second to produce each request, then it would require 101 seconds to complete the whole operation resulting in 9.9% efficiency*.
The Merge macro should execute very quickly in relation to the Execute macro
If Merge does not handle responses quickly relative to the time that Execute requires, a backlog of responses will build up, potentially causing delays in operation completion.
For example, let’s say that the Partition macro is instantaneous, each request (Execute macro) takes 1 second to run, and you send 100 requests across 10 cores. If Merge runs instantly, then the total time required at 100% efficiency (ignoring communication and other processing) would be 10 seconds. However, if Merge takes 0.1 second to handle each response, then the same 90.1% efficiency results because just merging the responses takes a full 10 seconds. If Merge takes 1 second per response, then efficiency* drops to 9.9% as 100 seconds are required just for merging responses into the workbook.
* Note that efficiency in these examples is defined as (#cores * time to run Execute macro) / (Time to complete all work). This is therefore intended as a parallel efficiency rating, as the work being done in Partition and Merge might be useful work but must be done serially on the local machine.
Avoid repeating work in the Partition and Merge macros
Some operations that you include in the Partition or Merge macros might take a reasonable amount of time to complete, but might not provide significant benefit in being redone with each Partition or Merge call. Examples of such operations are chart updates and external database accesses.
Use the Initialize macro to pull initial data from an external database
For example, take a workbook that relies on pulling a large amount of data from an external database to generate the work items that Partition returns. If this is implemented in the Partition macro, you reload all the information from the database each time Partition is called, even when most of that data is not used. A more efficient way is to load all the information from the database during the Initialize macro and then allow Partition to return cell values, which is a much cheaper operation.
For example, if you need to load a large data set from a database, create the database connection and perform the query in the Initialize macro. Then in the Partition macro, you can just move to the next record. You can close the database connection in the Finalize macro.
Similarly, if you are writing to a log file in the Merge macro, consider opening the file in the Initialize macro. In the Merge macro you can write results one at a time, and then you can close the log file in the Finalize macro.
Avoid updating charts or screens until processing is complete
For example, take a workbook that receives responses in the Merge macro and places them into a chart which takes a long time to redraw. You might not want to redraw the chart with every Merge call if nobody is viewing the chart as it changes or if there is no beneficial insight gleaned from watching the chart progress towards completion. Instead, it might make sense to set chartobject.visible = False in the Initialize macro and reset it to True in the Finalize macro to avoid redrawing it each time. Disabling and enabling application.screenupdating in the same way accomplishes the same goal.
For example, in the Initialize macro, you can turn off screen updates with:
Application.ScreenUpdating = False
In the Finalize macro, you can turn on screen updates with:
Application.ScreenUpdating = True
Avoid recalculating the spreadsheet in the Merge macro
If you are inserting results into the spreadsheet in the Merge macro, you might inadvertently recalculate the spreadsheet – if the spreadsheet is very complicated, this can slow down processing.
By default, Excel will recalculate the spreadsheet whenever you change the value of a cell – including if you update the cell from a VBA macro. Excel will recalculate only those parts of the spreadsheet which are affected by the change – so if the cell sits by itself, this won’t be significant. But if the cell is used as part of a larger calculation, or as chart data, then the recalculation may be slow.
If you are updating any spreadsheet cells in the Merge macro, consider disabling automatic recalculation. You can disable automatic recalculation in the Initialize macro. Then in the Finalize macro you can recalculate the entire spreadsheet, and re-enable automatic calculation.
In the Initialize macro, you can turn off automatic calculation with:
Application.Calculation = xlCalculationManual
You can then recalculate the spreadsheet and re-enable automatic calculation in the Finalize macro with:
Application.Calculate
Application.Calculation = xlCalculationAutomatic
Insert blocks of data instead of individual values
**
**When you update spreadsheet cells from VBA, this involves a number of internal steps – and it may be slower than you expect. It is almost always better to change multiple cells as a single block, instead of one at a time.
For example, if the Execute macro returns a large number of values – an Array of 100 numbers, for example – there are two ways to insert this data into the spreadsheet. You can insert them one at a time, in a loop:
For i = 1 To 100 Step 1
Cells(1, i).Value2 = v(i)
Next i
Or, you can insert the full range as an array:
Range(Cells(2, 1), Cells(2, 100)).Value2 = v
Inserting one cell takes about the same amount of time as inserting a block of cells. So if you have an array of 100 values, the first way of doing this – inserting cells one at a time in a loop – can take up to 100 times as long as the second approach, inserting the block of cells as an array.
Use data structures to temporarily hold information in memory
If collecting information for processing – what you do in the Partition macro – is very time consuming, or if you need to do very complex post-processing of results – in the Merge macro – consider using data structures to store information in memory during the calculation.
For example, if each of your calculation requests in the Partition macro requires a very complicated database query, this can slow down processing and result in poor performance. In this case, it may be a good idea to perform this complex processing in the Initialize macro before the calculation starts.
If you perform this complex processing in the Initialize macro, you can store each of the calculation requests in a data structure in VBA, such as an Array of Variants. Then in the Partition macro, you can simply return the next entry in the Array. That will help ensure that the Partition macro is as fast as possible.
Similarly, if you must perform very complex post-processing on results as they arrive, it may be advisable to store the results in a data structure in the Merge macro, and then process the results in the Finalize macro after the calculation is complete.
For example, if you are storing results in a database, but each database write requires a complex insert statement, it may be better to move this processing to the Finalize macro. You can create a data structure – such as an Array of Variants – and in the Merge macro, simply insert each result into the Array. Then in the Finalize macro, you can loop through the Array and perform the necessary database write operations all at once.