Udostępnij za pośrednictwem


MATLAB 7.8.0 (R2009a) on Windows HPC Server 2008

MATLAB or Matrix Laboratory is an extremely popular tool for analysing matrices. It is widely used in educational institutions and also in the finance industry. Of course, matrix analysis is by nature, a computationally intensive task and it is a perfect candidate for processing in parallel.

 

MATLAB like most ISVs listed on Microsoft's HPC website, under supported applications, do already have some work done on integration. Although the links are there, users need to just take an additional step of performing the configuration and then they are ready to leverage on the power of parallel computing!

 

So what I am going to do, is I am go to walk through the steps required to configure MATLAB 7.8.0 (R2009a) to use Windows HPC Server 2008 for parallel processing.

 

So the very first step would be to launch the application. After launching the application, this is what most of you will see.

 

MATLAB Initial Screen

 

You will notice that at the top, on the menu bar, there is a "Parallel" option. By clicking it on it and then selecting "Configurations Manager", you should see the following screen. You should see only one option for now, we are going to add another.

 

 

On that screen, you need to click the "File" option. You should see the following.

 

MATLAB Parallel Configuration Option 2

 

You will see lots of other options but for now we will just focus on the "ccs" option. By selecting the "ccs" option, it will bring you to another screen shown below.

 

MATLAB CCS/HPC Parallel Configuration Screen

 

For now, we can leave the "Configuration Name" as it is, it just allows you to identify your Windows HPC configuration, situations where it might come in useful would be if you have various Windows HPC clusters, then you might have different names for different clusters, something meaningful that you can identify with.

 

Now let's explore the various options that are available to you. I will try and explain what each of the options mean as we go along.

 

First option, "Root Directory of MATLAB installation for workers (ClusterMatlabRoot)", this option simply points to where (the directory) MATLAB is installed on the compute nodes. I have to stress that in order for this work properly, MATLAB should be installed on all the nodes with a common path. i.e. if you install to C:\MATLAB on one node, MATLAB on all the other nodes should be installed to C:\MATLAB as well. Otherwise, the job/task/simulation will FAIL when you start. This is where your Node Templates will come in really useful to maintain a consistent image across the cluster.

 

Second option, "Number of workers available to scheduler (ClusterSize)", simply, the number of nodes that you might have or available to you to run MATLAB. Bearing in mind, the number of licenses, etc...

 

Third option, "Directory where job data is stored (DataLocation)", this is where your input and output files will go. Bear in mind that, I will choose a shared folder for this, if not, this is the possible scenario.

 

You will have to copy ALL your input data to a consistent location on ALL the compute nodes, i.e. if you choose C:\data, then C:\data MUST be available on al the compute nodes whether or not that node will be used because you will never know which node gets selected to run the simulation. Secondly, ALL your output data will go to that location. Which simply means that you will have to log onto each compute to check if your output is in that node... And if, this is not consistent, then your job will fail. Personally, I do not want to do that if I have a cluster of 128 nodes. So, do yourself a favour and choose a shared folder.

 

Fourth option, "CCS scheduler hostname (SchedulerHostname)", for this option, you just need to enter the hostname of your head node, and please do ensure that it is resolvable by DNS if you choose hostname.

 

Last option, "Workers run in SOA mode for distributed jobs (UseSOAJobSubmission)", setting this option will tell MATLAB to leverage on the latest feature available to Windows HPC Server 2008 which is the SOA mode jobs. Personally, I have not tried this option, and if you do, and you do find that it helps you to speed up the calculations, please drop me a comment, I would love to hear about it.

 

So to sum up, after you are done with the configuration, it should look something like this!

 

MATLAB CCS/WHPCS Parallel Configuration Completed.

 

For now, these are the options that you need to set in order to get started. Click "OK" and you will be taken back to the "Configurations Manager" screen. Now on that screen, you should see two options now, the original, "Local" and your recently added "ccsconfig1" as shown below.

 

MATLAB Configurations Manager with CCS/WHPCS Option

 

Now, I have to say that MATLAB provides an excellent function to validate the cluster configurations, it helps you to check if the cluster if running properly. So right now, you can click "Start Validation" and let MATLAB verify for you if the configurations are all in order.

 

So once you click "Start Validation", you should see the following.

 

MATLAB Validation 01

 

MATLAB Validation 02

 

MATLAB Validation 03

 

MATLAB Validation 04

 

From this point on, basically, your configuration is and you are ready to leverage your cluster. However, there is still some things that you need to do, you need to "parallelise" your scripts. For more information, you might want to consult the MATLAB Parallel Computing Toolbox. The site is here. You can even download the PDFs for offline viewing.