Partilhar via


UT MPI development with CCS

posted Saturday, February 17, 2007 2:53 PM by dongarra | 1 Comments

• FT-MPI

FT-MPI is a full 1.2 MPI specification implementation that provides process
level fault tolerance at the MPI API level. FT-MPI has been developed in the
frame of the HARNESS (Heterogeneous Adaptive Reconfigurable Networked SyStem)
project with the goal of providing the end-user a communication library
containing an MPI API, which benefits from the fault-tolerance already
found in the HARNESS system.

Current Status

 Currently, FT-MPI has been compiled under Cygwin, Windows Subsystem
for UNIX Applications (SUA) and native Windows. There is presently no
possibility to start the daemons automatically, as the only supported
method (SSH) is not natively available in the Windows environment.
However, once the daemons are manually started, we have been able to
spawn as many applications as necessary. Also, as the daemons are
started manually, security is provided by the Windows user log-on.
We also only have current support for BSD-like TCP, i.e. using read
and write. But as of yet, there is no support for any direct WinSock2 functions.

Future Work

 Most of our future work will be focused on Open MPI. We plan to
tighten the security for starting the applications, to provide
full support for the XML format supported by the windows batch
scheduler, memory and processor affinity, support for the Windows
registry, completely dynamic MPI libraries and internal modules.
Moreover, we know that the performances can be improved by at
least another 20% (and that's a minimum).

Additional information about FT-MPI can be found on the website –
https://icl.cs.utk.edu/ftmpi/.

• Open MPI

 Open MPI is an open source implementation of both the MPI-1 and MPI-2
documents and combines technologies and resources from several other
projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build
the best MPI library available.

Current Status

 Currently and like FT-MPI, we have compiled Open MPI under Cygwin,
Windows Subsystem for UNIX Applications (SUA) and native Windows.
The most used and tested way to compile has been under native Windows.
We have provided solutions and project files for Visual C Express,
allowing us to compile Open MPI both as a static or a dynamic library.
Support for C++, Fortran 77 as well as Fortran 90 is automatically built.
We are also able to start daemons locally, using Windows functionality
(spawn and/or CreateProcess) and we can start jobs on the cluster with CCS
(using submit). However, so far the only available communication framework
is on top of WinSock2, but work on Direct Socket is in progress. The
Visual C compiler (VC) is used as a backend for mpicc, which allows us to
compile the user applications in a normal environment.

 Integration with the parallel debugger is in progress, however the lack of
comprehensive documentation make this task difficult. We have the same problem
for accessing the high performance socket interface. The sparse documentation
available on MSDN or the Web does not provide enough insight for a smooth transition.

 Performance results compared with the Microsoft MPI have shown that Open MPI
performed faster over both shared memory and TCP, by a factor of ~10%. No
application benchmark has been run in order to compare these 2 MPI implementations further.

Future Work

 Once the support for Direct Socket is completed, we will benchmark again
and we expect a larger performance gap between these 2 MPI libraries.
We still need to define the behavior of MPI in the event a failure occurs
at the process level.

For more information about Open MPI, visit the website at -
https://icl.cs.utk.edu/open-mpi/.