HPC Institute, University of Tennessee - Project and Hardware
posted Friday, October 20, 2006 4:56 PM by DennisCr | 0 Comments
High Performance Compute Clustering with Windows
University of Tennessee
Innovative Computing Laboratory
Computer Science Department
Jack Dongarra
Windows Cluster Project
People
Jack Dongarra
George Bosilca
Dave Cronk
Julien Langou
Piotr Luszczek
Projects:
1. Numerical Linear Algebra Algorithms and Software
a. LAPACK, ScaLAPACK, ATLAS
b. Self Adapting Numerical Algorithms (SANS) Effort
c. Generic Code Optimization
d. LAPACK For Clusters – easy access to clusters
2. Heterogeneous Distributed Computing
a. NetSolve, FT-MPI, Open-MPI
3. Performance Evaluation
a. PAPI, HPC Challenge, Top500
4. Software Repositories
a. Netlib
LAPACK
1. Used by Matlab, Mathematica, Numeric Python,…
2. Tuned version provided by vendors: AMD, Apple, Compaq, Cray, Fujitsu, Hewlett-Packard, Hitachi, IBM, Intel, MathWorks, NAG, NEC, PGI, SUN, Visual Numerics, by Microsoft and most of Linux distribution (Fedora, Debian, Cygwin,...).
3. On going work: performance, accuracy, extended precision, ease of use
ScaLAPACK
1. Parallel implementation of LAPACK scaling on parallel hardware from 10’s to 100’s to 1000’s of processors
2. On going work: Match functionalities of current LAPACK
3. On going work: Target new architectures, new parallel environment. For example port to Microsoft HPC cluster solution
LAPACK for Clusters (LFC)
1. Most of ScaLAPACK functionality from serial clients (Matlab, Python, Mathematica)
FT-MPI and Open-MPI
1. Define the behavior of MPI in event a failure occurs at the process level.
2. FT-MPI based on MPI 1.3 (plus some MPI 2 features) with a fault tolerant model similar to what was done in PVM.
3. Complete reimplementation, not based on other implementations.
a. Gives the application the possibility to recover from a process-failure.
b. A regular, non fault-tolerant MPI program will run using FT-MPI.
c. What FT-MPI does not do:
4. Recover user data (e.g. automatic check-pointing)
5. Provide transparent fault-tolerance
Performance Application Programming Interface (PAPI)
1. A portable library to access hardware counters found on processors
2. Provides a standardized list of performance metrics
KOJAK (Joint with Felix Wolf)
1. Software package for the automatic performance analysis of parallel apps
2. Message passing and multi-threading (MPI and/or OpenMP)
3. Parallel performance
4. CPU and memory performance
Posters for Related Projects
· FT-MPI
· HPCC
· Kojak
· Open MPI
· PAPI
· top500
|