Bootstrap method: mpi

The “mpi” bootstrap method allows you to initialize a Flock group from an MPI communicator. This is particularly useful for HPC applications that are already using MPI for process management.

When to use

Use the “mpi” bootstrap method when:

  • Your application is already using MPI

  • You want all MPI ranks to automatically form a group

  • You need the group membership to match your MPI communicator

  • You’re deploying on HPC systems with MPI launchers

Prerequisites

To use the MPI bootstrap method, Flock must be compiled with MPI support:

spack install mochi-flock +mpi +bedrock

Configuration

In Bedrock configuration:

{
    "libraries": [
        "libflock-bedrock-module.so"
    ],
    "providers": [
        {
            "type": "flock",
            "name": "my_flock_provider",
            "provider_id": 42,
            "config": {
                "bootstrap": "mpi",
                "group": {
                    "type": "static",
                    "config": {}
                },
                "file": "mygroup.flock"
            }
        }
    ]
}

When you launch your Bedrock application with MPI (e.g., mpirun -n 4 bedrock ...), all ranks will automatically form a group with each other.

If you want only some of the ranks to be part of the group, for instance ranks 0, 1, 2, and 3, add the following in the provider’s “config” field:

"mpi_ranks": [0, 1, 2, 3]

In C code

To use MPI bootstrap programmatically, you can rely on the flock_group_view_init_from_mpi helper function from the flock/flock-bootstrap-mpi.h header file.

/*
 * (C) 2024 The University of Chicago
 *
 * See COPYRIGHT in top-level directory.
 */
#include <assert.h>
#include <stdio.h>
#include <mpi.h>
#include <margo.h>
#include <flock/flock-server.h>
#include <flock/flock-bootstrap-mpi.h>

int main(int argc, char** argv)
{
    int ret;
    int rank, size;

    // Initialize MPI
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    // Initialize Margo
    margo_instance_id mid = margo_init("na+sm", MARGO_SERVER_MODE, 0, 0);
    assert(mid);

    // Initialize provider args
    struct flock_provider_args args = FLOCK_PROVIDER_ARGS_INIT;
    flock_group_view_t initial_view = FLOCK_GROUP_VIEW_INITIALIZER;
    args.initial_view = &initial_view;

    // Bootstrap using MPI
    uint16_t provider_id = 42;
    ret = flock_group_view_init_from_mpi(
        mid, provider_id, MPI_COMM_WORLD, &initial_view);

    if(ret != FLOCK_SUCCESS) {
        fprintf(stderr, "Rank %d: Failed to initialize view from MPI\n", rank);
        margo_finalize(mid);
        MPI_Finalize();
        return -1;
    }

    printf("Rank %d: Bootstrapped group via MPI\n", rank);
    printf("Rank %d: Group size: %zu\n", rank, initial_view.members.size);

    // Configure with static backend
    const char* config = "{ \"group\":{ \"type\":\"static\", \"config\":{} } }";

    // Register provider
    flock_provider_t provider;
    ret = flock_provider_register(mid, provider_id, config, &args, &provider);
    assert(ret == FLOCK_SUCCESS);

    printf("Rank %d: Flock provider registered\n", rank);

    // Synchronize before finalizing
    MPI_Barrier(MPI_COMM_WORLD);

    if(rank == 0) {
        printf("All ranks initialized successfully\n");
    }

    // Wait for finalize (only rank 0 waits, others finalize immediately)
    if(rank == 0) {
        margo_wait_for_finalize(mid);
    } else {
        margo_finalize(mid);
    }

    MPI_Finalize();
    return 0;
}

The flock_group_view_init_from_mpi function takes:

  • The Margo instance

  • The provider ID

  • An MPI communicator (typically MPI_COMM_WORLD)

  • A pointer to the group view to initialize

This function performs an MPI collective operation to gather all addresses and construct a group view containing all MPI ranks in the specified communicator.

Important

The “mpi_ranks” field is only processed by Bedrock, not when using the C API. The reason is that the C API allows users to create exactly the communicator they want the group to use, while Bedrock is restricted to using MPI_COMM_WORLD and needs a way to specify which ranks in MPI_COMM_WORLD are concerned.

How it works

The MPI bootstrap process:

  1. Each MPI rank determines its Margo address

  2. An MPI_Allgather collective exchanges addresses between all ranks

  3. Each rank constructs an identical group view with all members

  4. The group is initialized with this view

Because this uses MPI collectives, all MPI ranks must call the bootstrap function simultaneously. If some ranks don’t participate, the collective will hang.

Example usage

Compile your application with MPI support:

$ mpicc -o server server.c $(pkg-config --cflags --libs flock-server margo)

Launch with mpirun:

$ mpirun -n 4 ./server
[Rank 0] Server running with 4 group members
[Rank 1] Server running with 4 group members
[Rank 2] Server running with 4 group members
[Rank 3] Server running with 4 group members

All four processes will have identical group views containing all four members.