MPI fails when mixing Intel and AMD

Jeff Faust 0 Reputation points
2025-01-09T16:50:10.44+00:00

The following code works fine with only Intel or only AMD machines. When I launch from an Intel machine, with two worker nodes.

#include <iostream>
#include <mpi.h>

int main()
{
    int argc = 0;
    MPI_Init(&argc, nullptr);

    const int count = 100;
    for (int i = 0; i < count; ++i)
    {
        std::cout << " Attempting Barrier " << i + 1 << std::endl;
        MPI_Barrier(MPI_COMM_WORLD);
        std::cout << " Completed Barrier " << i + 1 << std::endl;
    }

    MPI_Finalize();
}

mpiexec -l -hosts 2 localhost amd_machine -wdir "\network\path" \path-to-exe

This fails consistently after loop 3, with the output:

[0] Attempting Barrier 1 [1] Attempting Barrier 1 [0] Completed Barrier 1 [0] Attempting Barrier 2 [1] Completed Barrier 1 [0] Completed Barrier 2 [1] Attempting Barrier 2 [0] Attempting Barrier 3 [0] Completed Barrier 3 [0] Attempting Barrier 4 [1] Completed Barrier 2 [1] Attempting Barrier 3 [1] Completed Barrier 3 [1] Attempting Barrier 4 job aborted: [ranks] message [0] terminated [1] fatal error Fatal error in MPI_Barrier: Other MPI error, error stack: MPI_Barrier(MPI_COMM_WORLD) failed A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (errno 10060)

C++
C++
A high-level, general-purpose programming language, created as an extension of the C programming language, that has object-oriented, generic, and functional features in addition to facilities for low-level memory manipulation.
3,818 questions
Windows Hardware Performance
Windows Hardware Performance
Windows: A family of Microsoft operating systems that run across personal computers, tablets, laptops, phones, internet of things devices, self-contained mixed reality headsets, large collaboration screens, and other devices.Hardware Performance: Delivering / providing hardware or hardware systems or adjusting / adapting hardware or hardware systems.
1,657 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.