csegmentation-faultmpi

MPI: Spanning Tree Segmentation Fault Issue


I'm currently grappling with an issue in an MPI program I'm developing to construct a spanning tree across multiple processes. Despite several days of troubleshooting, I've hit a roadblock – my program consistently crashes with a segmentation fault error. As someone relatively new to MPI programming, I'm seeking assistance from the community to help me diagnose the issue and find a solution.

[DESKTOP:18804] *** Process received signal ***
[DESKTOP:18804] Signal: Segmentation fault (11)
[DESKTOP:18804] Signal code: Address not mapped (1)
[DESKTOP:18804] Failing at address: 0xffffffffffffffff
*** stack smashing detected ***: terminated
[DESKTOP:18809] *** Process received signal ***
[DESKTOP:18809] Signal: Aborted (6)
[DESKTOP:18809] Signal code:  (-6)
[DESKTOP:18809] *** Process received signal ***
[DESKTOP:18809] Signal: Segmentation fault (11)
[DESKTOP:18809] Signal code: Address not mapped (1)
[DESKTOP:18809] Failing at address: 0xffffffffffffffff
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 7 with PID 0 on node DESKTOP-HUHQT64 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[DESKTOP:18784] *** Process received signal ***
[DESKTOP:18784] Signal: Segmentation fault (11)
[DESKTOP:18784] Signal code: Address not mapped (1)
[DESKTOP:18784] Failing at address: (nil)
[DESKTOP:18784] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f5008733520]
[DESKTOP:18784] [ 1] /lib/x86_64-linux-gnu/libpmix.so.2(PMIx_server_finalize+0x928)[0x7f5005eee5b8]
[DESKTOP:18784] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so(ext3x_server_finalize+0x386)[0x7f5006052ff6]
[DESKTOP:18784] [ 3] /lib/x86_64-linux-gnu/libopen-rte.so.40(pmix_server_finalize+0xa0)[0x7f5008a4d5e0]
[DESKTOP:18784] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_hnp.so(+0x66b4)[0x7f50084f26b4]
[DESKTOP:18784] [ 5] /lib/x86_64-linux-gnu/libopen-rte.so.40(orte_finalize+0x65)[0x7f5008a23385]
[DESKTOP:18784] [ 6] mpirun(+0x1350)[0x55d954018350]
[DESKTOP:18784] [ 7] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f500871ad90]
[DESKTOP:18784] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f500871ae40]
[DESKTOP:18784] [ 9] mpirun(+0x1415)[0x55d954018415]
[DESKTOP:18784] *** End of error message ***

The program implements a distributed spanning tree algorithm using MPI processes. Each process represents a node in the tree. The algorithm updates each node's table with parent information as it traverses the tree structure. When a node receives the updated table from its children, it merges this information with its own and sends the updated table back to its parent. This process continues until the root node is reached, resulting in a complete spanning tree with each node's table filled with parent information.

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define MAX_TIME_TO_WAIT 30
#define TABLE_SIZE 100

// tag 0 : parent -> children (children updates table with the parent)
// tag 1:  children -> parent (this is the new table)
// tag 2 : children -> node (already have a parent)

// Run: mpirun -np 12 ./a.out
//  DO NOT USE DIRECTLY. USE getNeighbors() INSTEAD
int graph[][2] = {{0, 1}, {1, 2}, {2, 3}, {3, 4}, {4, 11}, {11, 5}, {5, 6}, {6, 7}, {7, 8}, {8, 9}, {9, 10}, {10, 9}, {9, 8}, {8, 7}, {7, 6}, {6, 5}, {5, 11}, {11, 4}, {4, 3}, {3, 2}, {2, 1}, {1, 0}, {9, 5}, {5, 9}, {5, 3}, {3, 5}, {0, 2}, {2, 0}, {9, 7}, {7, 9}};

// getNeighbors returns a 100 int vector.
// position 0 represents the number of neighbors + 1
// neigh[i] for i > 0 represent the ranks of the neighbors
// your program should only send messages to processes on this list

int *getNeighbors(int myRank)
{
    int *neigh = malloc(sizeof(int) * 100);
    int i;
    int j = 1;
    int sizeOfGraph = sizeof(graph) / sizeof(int) / 2;
    for (i = 0; i < sizeOfGraph; i++) {
        if (graph[i][0] == myRank) {
            neigh[j] = graph[i][1];
            j++;
        }
    }
    neigh[0] = j;
    return neigh;
}

void printNeighbors(int *neigh, int rank, int neighSize)
{
    char aux[1000];
    aux[0] = 0;
    sprintf(aux + strlen(aux), "Rank %2i neighbors: ", rank);
    for (int i = 1; i < neighSize; i++) {
        sprintf(aux + strlen(aux), " %2i,", neigh[i]);
    }
    sprintf(aux + strlen(aux) - 1, "\n");  // also deletes the last comma
    printf("%s", aux);
}

int getInitRank(int argc, char **argv)
{
    if (argc < 2) {
        printf("Usage: ./a.out <Init Node> (usually 3)\n");
        exit(1);
    }

    int initRank = atoi(argv[1]);
    return initRank;
}

void updateTable(int **table, int *childTable, int count)
{
    for (int i = 0; i < count; i++) {
        if (((*table)[i] == -1) && (childTable[i] != -1)) {
            (*table)[i] = childTable[i];
        }
    }
}

void printTable(int **table, int rank, int count)
{
    printf("\nRANK: %d\n", rank);
    for (int i = 0; i < count; i++) {
        printf("Node: %2d\tParent:%d\n", i, (*table)[i]);
    }
}

void sendToChildren(int **table, int *neigh, int rank, int neighSize)
{
    // sends to all children
    MPI_Request req[neighSize];
    for (int i = 1; i < neighSize; i++) {
        if (neigh[i] != (*table)[rank]) {
            MPI_Isend(*table, TABLE_SIZE, MPI_INT, neigh[i], 0, MPI_COMM_WORLD, &(req[i]));
            printf("rank:%d sent to rank:%d\n", rank, neigh[i]);
        }
    }
}

void sendToParent(int **table, int rank)
{
    // sends to parent
    MPI_Send(*table, TABLE_SIZE, MPI_INT, (*table)[rank], 1, MPI_COMM_WORLD);
    printf("rank:%d sent to parent rank:%d\n", rank, (*table)[rank]);
}

void recvParent(int **table, int *neigh, int rank, int neighSize)
{
    MPI_Status status;
    MPI_Request req[neighSize];
    int flags[neighSize];
    int timeWaited = 0;
    int stillWait = 1;
    int tmpTables[neighSize][TABLE_SIZE];

    for (int i = 1; i < neighSize; i++) {
        flags[i] = 0;
    }

    // waits for a parent
    for (int i = 1; i < neighSize; i++) {
        MPI_Irecv(tmpTables[i], TABLE_SIZE, MPI_INT, neigh[i], 0, MPI_COMM_WORLD, &(req[i]));
    }

    while ((timeWaited < MAX_TIME_TO_WAIT) && stillWait) {
        for (int i = 1; i < neighSize; i++) {
            MPI_Test(&(req[i]), &(flags[i]), &status);

            if (flags[i]) {
                printf("rank:%d recived from parent rank:%d\n", rank, neigh[i]);
                memcpy(*table, tmpTables[i], sizeof(int) * TABLE_SIZE);
                (*table)[rank] = neigh[i];
                stillWait = 0;
                break;
            }
        }
        if (stillWait == 1) {
            timeWaited++;
            sleep(1);
        } else {
            break;
        }
    }

    // tells other neighbours that already have a parent
    for (int i = 1; i < neighSize; i++) {
        if (neigh[i] != (*table)[rank]) {
            MPI_Isend(*table, TABLE_SIZE, MPI_INT, neigh[i], 2, MPI_COMM_WORLD, &(req[i]));
            printf("rank:%d informed rank:%d that already has a parent\n", rank, neigh[i]);
        }
    }
}

void recvChildren(int **table, int *neigh, int rank, int neighSize)
{
    MPI_Status status;
    MPI_Request req[neighSize];
    int flags[neighSize];
    int timeWaited = 0;
    int stillWait = 1;
    int tmpTable[neighSize][TABLE_SIZE];

    for (int i = 1; i < neighSize; i++) {
        for (int j = 0; j < TABLE_SIZE; j++) {
            tmpTable[i][j] = -1;
        }
    }

    for (int i = 1; i < neighSize; i++) {
        flags[i] = 0;
    }

    // waits for response
    for (int i = 1; i < neighSize; i++) {
        if (neigh[i] != (*table)[rank]) {
            MPI_Irecv(tmpTable[i], TABLE_SIZE, MPI_INT, neigh[i], MPI_ANY_TAG, MPI_COMM_WORLD, &(req[i]));
        }
    }

    while ((timeWaited < MAX_TIME_TO_WAIT) && stillWait) {
        stillWait = 0;
        for (int i = 1; i < neighSize; i++) {
            if (neigh[i] != (*table)[rank]) {
                if (flags[i] == 0) {
                    MPI_Test(&(req[i]), &(flags[i]), &status);

                    if (flags[i]) {
                        if (status.MPI_TAG == 1) {
                            printf("rank:%d recived from child rank:%d\n", rank, neigh[i]);
                            updateTable(table, tmpTable[i], TABLE_SIZE);
                        } else if (status.MPI_TAG == 2) {
                            printf("rank:%d recived from not his child rank:%d\n", rank, neigh[i]);
                        }
                    } else {
                        stillWait = 1;
                    }
                }
            }
        }

        if (stillWait == 1) {
            timeWaited++;
            sleep(1);
        } else {
            break;
        }
    }
}

int main(int argc, char *argv[])
{
    int rank;
    int nProcesses;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nProcesses);

    int initRank = getInitRank(argc, argv);

    int *neigh = getNeighbors(rank);
    int neighSize = neigh[0];
    // printNeighbors(neigh, rank);
    int *table = (int *)malloc(TABLE_SIZE * sizeof(int));
    if (table == NULL) {
        printf("Error allocating memory\n");
        exit(1);
    }
    for (int i = 0; i < TABLE_SIZE; i++) {
        table[i] = -1;
    }

    // initRank is the root
    if (rank == initRank) {
        sendToChildren(&table, neigh, rank, neighSize);
        recvChildren(&table, neigh, rank, neighSize);
        printTable(&table, rank, nProcesses);
    } else {
        recvParent(&table, neigh, rank, neighSize);
        sendToChildren(&table, neigh, rank, neighSize);
        recvChildren(&table, neigh, rank, neighSize);
        printTable(&table, rank, nProcesses);
        sendToParent(&table, rank);
    }

    free(table);
    free(neigh);

    MPI_Finalize();
    return 0;
}

Despite my efforts to debug using gdb, valgrind, and AddressSanitizer, I've hit a dead end. Being relatively new to these tools and debugging techniques, I'm reaching out for some guidance and assistance. Any help would be greatly appreciated!

EDIT: I've revised the code and the error output in the post to reflect the changes suggested, along with the current error being generated.

I've changed the code as suggested by @SvenNilsson and it works better, but still, in some situations I get segmentation fault, for example when init_node is 7. Here is a visual representation of the graph used for testing in my program: image

Also as requested by @GillesGouaillardet I've added my mpirun command and also the mpicc. The program is compiled with:

mpicc spanningTree.c -o spanningTree -Wall -O3 -g3

And I run it with:

mpirun --oversubscribe -n 12 ./spanningTree <init_node>

The crucial part of the output, aside from the error message, is the incomplete table printed by the initializing node.

RANK: 7
Node:  0    Parent:2
Node:  1    Parent:2
Node:  2    Parent:3
Node:  3    Parent:5
Node:  4    Parent:11
Node:  5    Parent:6
Node:  6    Parent:7
Node:  7    Parent:-1
Node:  8    Parent:-1
Node:  9    Parent:-1
Node: 10    Parent:-1
Node: 11    Parent:5

Solution

  • The problem is that a node is setting up a receive operation for every neighbor while waiting for a parent, but it only handles one. This can lead to undefined behavior when leaving the subroutine. The solution is to just use one MPI_Irecv with MPI_ANY_SOURCE as the parameter: MPI_Irecv(auxTable, TABLE_SIZE, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &req) The lesson learned here is that every send needs a corresponding receive.