Execution Streams and Pools

In this tutorial, you will learn how to create multiple execution streams for parallel execution and understand how pools distribute work among them. This is essential for achieving high performance in Mochi applications and understanding how Bedrock configures Argobots for your services.

Key Concepts

Execution Streams (xstreams)

Execution streams are OS-level threads that execute Argobots work units (ULTs). Each execution stream runs independently and can execute work in parallel with other streams. Think of them as worker threads.

Best Practice: Create one execution stream per CPU core for optimal performance. Creating more execution streams than cores can lead to contention and reduced performance.

Pools

Pools are work queues that hold ULTs waiting to be executed. Each execution stream has at least one pool that it pulls work from. Pools can be:

  • Private: Only one execution stream accesses the pool (lower overhead)

  • Shared: Multiple execution streams can access the pool (enables work-stealing)

Pool Types

Argobots provides several predefined pool types:

  • ABT_POOL_FIFO: First-In-First-Out queue (simple, low overhead)

  • ABT_POOL_FIFO_WAIT: FIFO with blocking wait when empty

  • ABT_POOL_RANDWS: Random work-stealing pool (best for load balancing)

You will see in a latter tutorial how to create new pool types. Margo, for instance, provides two additional pool types (that can be initialized via Margo’s API): “prio_wait” and “earliest_first”.

Pool Access Modes

Access modes control thread safety and determine which execution streams can access a pool:

  • ABT_POOL_ACCESS_PRIV: Single Producer, Single Consumer (private to one xstream)

  • ABT_POOL_ACCESS_SPSC: Single Producer, Single Consumer (optimized lock-free)

  • ABT_POOL_ACCESS_MPSC: Multiple Producers, Single Consumer

  • ABT_POOL_ACCESS_SPMC: Single Producer, Multiple Consumers

  • ABT_POOL_ACCESS_MPMC: Multiple Producers, Multiple Consumers (full thread-safe)

Work Distribution Strategies

  1. Fixed Allocation: Each execution stream has its own private pool. ULTs are statically assigned to pools and never migrate. Simple but can lead to load imbalance.

  2. Work-Stealing: Execution streams share pools. When an execution stream’s pool is empty, it can “steal” work from other pools. Better load balancing but higher overhead due to synchronization.

Fixed Allocation Example

In fixed allocation, each execution stream has its own private pool. ULTs are assigned to specific pools and will only execute on that pool’s execution stream.

 1/*
 2 * Fixed allocation: Each execution stream has its own private pool
 3 * ULTs are statically assigned to pools and cannot migrate
 4 */
 5
 6#include <stdio.h>
 7#include <stdlib.h>
 8#include <abt.h>
 9
10#define NUM_XSTREAMS 4
11#define NUM_THREADS 16
12
13typedef struct {
14    int thread_id;
15} thread_arg_t;
16
17void thread_func(void *arg)
18{
19    int thread_id = ((thread_arg_t *)arg)->thread_id;
20    int xstream_rank;
21
22    /* Get the rank of the execution stream running this ULT */
23    ABT_xstream_self_rank(&xstream_rank);
24
25    printf("ULT %2d executing on ES %d (fixed allocation)\n",
26           thread_id, xstream_rank);
27}
28
29int main(int argc, char **argv)
30{
31    int i;
32    ABT_xstream xstreams[NUM_XSTREAMS];
33    ABT_pool pools[NUM_XSTREAMS];
34    ABT_thread threads[NUM_THREADS];
35    thread_arg_t thread_args[NUM_THREADS];
36
37    /* Initialize Argobots */
38    ABT_init(argc, argv);
39
40    printf("=== Fixed Allocation Example ===\n");
41    printf("Creating %d execution streams with private pools\n\n", NUM_XSTREAMS);
42
43    /* Get the primary execution stream */
44    ABT_xstream_self(&xstreams[0]);
45
46    /* Get the primary execution stream's default pool */
47    ABT_xstream_get_main_pools(xstreams[0], 1, &pools[0]);
48
49    /* Create additional execution streams (each gets its own pool automatically) */
50    for (i = 1; i < NUM_XSTREAMS; i++) {
51        ABT_xstream_create(ABT_SCHED_NULL, &xstreams[i]);
52        /* Get the default pool for this execution stream */
53        ABT_xstream_get_main_pools(xstreams[i], 1, &pools[i]);
54    }
55
56    /* Create ULTs and assign them to pools in round-robin fashion */
57    for (i = 0; i < NUM_THREADS; i++) {
58        int pool_id = i % NUM_XSTREAMS;
59        thread_args[i].thread_id = i;
60
61        /* Each ULT is added to a specific pool and will only execute on
62           that pool's execution stream */
63        ABT_thread_create(pools[pool_id], thread_func, &thread_args[i],
64                          ABT_THREAD_ATTR_NULL, &threads[i]);
65    }
66
67    /* Wait for all ULTs to complete */
68    for (i = 0; i < NUM_THREADS; i++) {
69        ABT_thread_join(threads[i]);
70        ABT_thread_free(&threads[i]);
71    }
72
73    /* Join and free secondary execution streams */
74    for (i = 1; i < NUM_XSTREAMS; i++) {
75        ABT_xstream_join(xstreams[i]);
76        ABT_xstream_free(&xstreams[i]);
77    }
78
79    printf("\nAll ULTs completed\n");
80    printf("Note: Each ULT executed only on its assigned execution stream\n");
81
82    /* Finalize Argobots */
83    ABT_finalize();
84
85    return 0;
86}

Expected output (order may vary):

=== Fixed Allocation Example ===
Creating 4 execution streams with private pools

ULT  0 executing on ES 0 (fixed allocation)
ULT  4 executing on ES 0 (fixed allocation)
ULT  8 executing on ES 0 (fixed allocation)
ULT 12 executing on ES 0 (fixed allocation)
ULT  1 executing on ES 1 (fixed allocation)
ULT  5 executing on ES 1 (fixed allocation)
...
All ULTs completed
Note: Each ULT executed only on its assigned execution stream

Notice that ULT IDs are grouped by execution stream: 0,4,8,12 on ES 0; 1,5,9,13 on ES 1; etc.

Key Points

Private Pools

Each execution stream automatically gets a default private pool when created with ABT_xstream_create(ABT_SCHED_NULL, &xstream). Passing ABT_SCHED_NULL as scheduler will make Argobots instantiate a new default scheduler with a default (private) pool. We retrieve this pool using ABT_xstream_get_main_pools().

Static Assignment

ULTs are assigned to pools in round-robin fashion. Once assigned, a ULT will only execute on that pool’s execution stream. This is simple and has low overhead since pools don’t need synchronization.

Advantages:
  • Lower overhead (no lock contention)

  • Predictable execution (ULT always runs on same execution stream)

  • Better cache locality

Disadvantages:
  • Load imbalance: Some execution streams may finish early while others are still busy

  • No dynamic load balancing

Work-Stealing Example

In work-stealing, execution streams can access multiple pools. When an execution stream runs out of work in its first pool, it can steal work from other pools.

  1/*
  2 * Work-stealing: Execution streams share pools and can steal work from each other
  3 * This improves load balancing when some execution streams finish their work early
  4 */
  5
  6#include <stdio.h>
  7#include <stdlib.h>
  8#include <unistd.h>
  9#include <abt.h>
 10
 11#define NUM_XSTREAMS 4
 12#define NUM_THREADS 16
 13
 14typedef struct {
 15    int thread_id;
 16} thread_arg_t;
 17
 18void thread_func(void *arg)
 19{
 20    int thread_id = ((thread_arg_t *)arg)->thread_id;
 21    int xstream_rank;
 22
 23    /* Get the rank of the execution stream running this ULT */
 24    ABT_xstream_self_rank(&xstream_rank);
 25
 26    printf("ULT %2d executing on ES %d", thread_id, xstream_rank);
 27
 28    /* Simulate varying work amounts */
 29    if (thread_id % 4 == 0) {
 30        printf(" (heavy work)");
 31        usleep(100000); /* 100ms */
 32    } else {
 33        printf(" (light work)");
 34        usleep(10000);  /* 10ms */
 35    }
 36    printf("\n");
 37}
 38
 39int main(int argc, char **argv)
 40{
 41    int i, j;
 42    ABT_xstream xstreams[NUM_XSTREAMS];
 43    ABT_pool pools[NUM_XSTREAMS];
 44    ABT_sched scheds[NUM_XSTREAMS];
 45    ABT_thread threads[NUM_THREADS];
 46    thread_arg_t thread_args[NUM_THREADS];
 47
 48    /* Initialize Argobots */
 49    ABT_init(argc, argv);
 50
 51    printf("=== Work-Stealing Example ===\n");
 52    printf("Creating %d execution streams with shared pools\n\n", NUM_XSTREAMS);
 53
 54    /* Create pools with work-stealing capability
 55       ABT_POOL_ACCESS_MPMC = Multiple Producers, Multiple Consumers
 56       This allows multiple execution streams to access the pool */
 57    for (i = 0; i < NUM_XSTREAMS; i++) {
 58        ABT_pool_create_basic(ABT_POOL_FIFO,           /* Pool kind: FIFO */
 59                              ABT_POOL_ACCESS_MPMC,    /* Access: thread-safe */
 60                              ABT_TRUE,                /* Automatic free */
 61                              &pools[i]);
 62    }
 63
 64    /* Create schedulers that can access ALL pools
 65       Each scheduler can steal work from other pools when its own pool is empty */
 66    for (i = 0; i < NUM_XSTREAMS; i++) {
 67        ABT_pool *sched_pools = (ABT_pool *)malloc(sizeof(ABT_pool) * NUM_XSTREAMS);
 68
 69        /* Pool priority order: own pool first, then others in round-robin */
 70        for (j = 0; j < NUM_XSTREAMS; j++) {
 71            sched_pools[j] = pools[(i + j) % NUM_XSTREAMS];
 72        }
 73
 74        /* Create a scheduler with access to all pools */
 75        ABT_sched_create_basic(ABT_SCHED_DEFAULT,      /* Default scheduler */
 76                               NUM_XSTREAMS,            /* Number of pools */
 77                               sched_pools,             /* Array of pools */
 78                               ABT_SCHED_CONFIG_NULL,   /* Default config */
 79                               &scheds[i]);
 80        free(sched_pools);
 81    }
 82
 83    /* Set up the primary execution stream with its scheduler */
 84    ABT_xstream_self(&xstreams[0]);
 85    ABT_xstream_set_main_sched(xstreams[0], scheds[0]);
 86
 87    /* Create secondary execution streams with their schedulers */
 88    for (i = 1; i < NUM_XSTREAMS; i++) {
 89        ABT_xstream_create(scheds[i], &xstreams[i]);
 90    }
 91
 92    /* Create ULTs and add them to pools */
 93    printf("Creating %d ULTs (some with heavy work)...\n\n", NUM_THREADS);
 94    for (i = 0; i < NUM_THREADS; i++) {
 95        int pool_id = i % NUM_XSTREAMS;
 96        thread_args[i].thread_id = i;
 97
 98        /* ULTs are initially added to specific pools, but can be stolen
 99           by other execution streams if those streams run out of work */
100        ABT_thread_create(pools[pool_id], thread_func, &thread_args[i],
101                          ABT_THREAD_ATTR_NULL, &threads[i]);
102    }
103
104    /* Wait for all ULTs to complete */
105    for (i = 0; i < NUM_THREADS; i++) {
106        ABT_thread_join(threads[i]);
107        ABT_thread_free(&threads[i]);
108    }
109
110    /* Join and free secondary execution streams */
111    for (i = 1; i < NUM_XSTREAMS; i++) {
112        ABT_xstream_join(xstreams[i]);
113        ABT_xstream_free(&xstreams[i]);
114    }
115
116    printf("\nAll ULTs completed\n");
117    printf("Note: ULTs may have executed on different execution streams\n");
118    printf("      due to work-stealing for better load balancing\n");
119
120    /* Finalize Argobots */
121    ABT_finalize();
122
123    return 0;
124}

Expected output (order will vary significantly):

=== Work-Stealing Example ===
Creating 4 execution streams with shared pools
Creating 16 ULTs (some with heavy work)...

ULT  1 executing on ES 1 (light work)
ULT  2 executing on ES 2 (light work)
ULT  0 executing on ES 0 (heavy work)
ULT  3 executing on ES 3 (light work)
ULT  5 executing on ES 2 (light work)
...
All ULTs completed
Note: ULTs may have executed on different execution streams
      due to work-stealing for better load balancing

Notice that ULTs are not strictly grouped by execution stream. Execution streams that finish their light work may steal heavy work from others.

Key Points

Creating Shared Pools
ABT_pool_create_basic(ABT_POOL_FIFO,           /* Pool type */
                      ABT_POOL_ACCESS_MPMC,    /* Access mode */
                      ABT_TRUE,                /* Automatic free */
                      &pools[i]);

The ABT_POOL_ACCESS_MPMC access mode makes the pool thread-safe, allowing multiple execution streams to safely push and pop work.

Scheduler with Multiple Pools
/* Pool priority order: own pool first, then others */
for (j = 0; j < NUM_XSTREAMS; j++) {
    sched_pools[j] = pools[(i + j) % NUM_XSTREAMS];
}
ABT_sched_create_basic(ABT_SCHED_DEFAULT, NUM_XSTREAMS,
                       sched_pools, ABT_SCHED_CONFIG_NULL, &scheds[i]);

Each scheduler gets access to all pools, ordered differently. The scheduler first checks its first pool, then tries others in order. This enables work-stealing.

Varying Workloads

The example simulates varying work amounts to demonstrate work-stealing. ULTs with heavy work take longer, allowing execution streams with light work to steal from pools that still have pending ULTs.

Advantages:
  • Better load balancing (idle execution streams steal work)

  • More efficient use of resources

  • Handles dynamic and unpredictable workloads well

Disadvantages:
  • Higher overhead (synchronization costs)

  • Less predictable execution

  • Potential cache thrashing from migration

Understanding Pool Access Modes

Choosing the right access mode is critical for performance:

ABT_POOL_ACCESS_PRIV / SPSC

Use for private pools accessed by only one execution stream. Lowest overhead.

/* Default pools created by ABT_xstream_create() are private */
ABT_xstream_create(ABT_SCHED_NULL, &xstream);
ABT_POOL_ACCESS_MPSC

Use when multiple execution streams create work but only one executes it. Common in producer-consumer patterns.

ABT_POOL_ACCESS_MPMC

Use for work-stealing pools where multiple execution streams both produce and consume work. Highest overhead but most flexible.

ABT_pool_create_basic(ABT_POOL_FIFO, ABT_POOL_ACCESS_MPMC,
                      ABT_TRUE, &pool);

When to Use Each Strategy

Use Fixed Allocation When:
  • Workload is balanced and predictable

  • Each task belongs to a specific domain (e.g., processing separate data partitions)

  • Minimizing overhead is critical

  • Cache locality is important

  • Example: Margo RPC handlers on dedicated pools

Use Work-Stealing When:
  • Workload is unbalanced or unpredictable

  • Task execution times vary significantly

  • Maximizing throughput is more important than latency

  • You have more tasks than execution streams

  • Example: Task-parallel algorithms, recursive divide-and-conquer

Mochi/Bedrock Connection

Understanding execution streams and pools is crucial for configuring Mochi services through Bedrock and Margo. Bedrock and Margo configurations allow you to:

  • Create custom pools for different types of work

  • Assign RPC handlers or providers to specific pools

  • Configure work-stealing for load balancing

  • Set pool access modes for optimal performance

Example Bedrock pool configuration:

{
    "argobots": {
        "pools": [
            {
                "name": "rpc_pool",
                "kind": "fifo",
                "access": "mpmc"
            },
            {
                "name": "io_pool",
                "kind": "fifo_wait",
                "access": "mpmc"
            }
        ],
        "xstreams": [
            {
                "name": "rpc_xstream",
                "scheduler": {
                    "type": "basic_wait",
                    "pools": ["rpc_pool"]
                }
            },
            {
                "name": "io_xstream",
                "scheduler": {
                    "type": "basic_wait",
                    "pools": ["io_pool", "rpc_pool"]
                }
            }
        ]
    }
}

This configuration creates dedicated pools and execution streams for RPC and I/O operations, with the I/O execution stream able to steal work from the RPC pool.

Common Pitfalls

Too Many Execution Streams

Creating more execution streams than CPU cores often reduces performance due to core overloading, context switching overhead, and cache contention.

/* WRONG: 100 execution streams on a 4-core machine */
for (i = 0; i < 100; i++) {
    ABT_xstream_create(ABT_SCHED_NULL, &xstreams[i]);
}

Best Practice: Use sched_getaffinity() or environment variables to determine the number of available cores.

Forgetting to Join Execution Streams

Always join and free secondary execution streams before finalizing Argobots:

/* Must join and free all secondary execution streams */
for (i = 1; i < num_xstreams; i++) {
    ABT_xstream_join(xstreams[i]);
    ABT_xstream_free(&xstreams[i]);
}
Wrong Pool Access Mode

Using ABT_POOL_ACCESS_PRIV for a shared pool causes data races:

/* WRONG: Private access mode for shared pool */
ABT_pool_create_basic(ABT_POOL_FIFO, ABT_POOL_ACCESS_PRIV,
                      ABT_TRUE, &shared_pool);
/* Multiple xstreams accessing this pool = undefined behavior */

Fix: Use ABT_POOL_ACCESS_MPMC for work-stealing pools.

Not Freeing Pools Created with ABT_pool_create_basic

If you create pools manually and set automatic to ABT_FALSE, you must free them:

ABT_pool_create_basic(ABT_POOL_FIFO, ABT_POOL_ACCESS_MPMC,
                      ABT_FALSE, &pool);  /* Manual free */
/* ... use pool ... */
ABT_pool_free(&pool);  /* Must free manually */

API Reference

This tutorial covered the following Argobots functions:

Execution Stream Functions
  • int ABT_xstream_create(ABT_sched sched, ABT_xstream *newxstream)

    Create a new execution stream with the specified scheduler. Pass ABT_SCHED_NULL to use the default scheduler.

  • int ABT_xstream_create_basic(ABT_sched_predef predef, int num_pools, ABT_pool *pools, ABT_sched_config config, ABT_xstream *newxstream)

    Create an execution stream with a predefined scheduler and specific pools.

  • int ABT_xstream_join(ABT_xstream xstream)

    Wait for an execution stream to terminate. Must be called before freeing.

  • int ABT_xstream_free(ABT_xstream *xstream)

    Free an execution stream. Must call ABT_xstream_join() first.

  • int ABT_xstream_self_rank(int *rank)

    Get the rank of the current execution stream (0 for primary, 1+ for secondary).

  • int ABT_xstream_set_main_sched(ABT_xstream xstream, ABT_sched sched)

    Set the main scheduler for an execution stream.

Pool Functions
  • int ABT_pool_create_basic(ABT_pool_kind kind, ABT_pool_access access, ABT_bool automatic, ABT_pool *newpool)

    Create a pool with predefined kind and access mode.

    Parameters:
    • kind: Pool type (FIFO, FIFO_WAIT, RANDWS)

    • access: Access mode (PRIV, SPSC, MPSC, SPMC, MPMC)

    • automatic: If ABT_TRUE, pool is automatically freed

    • newpool: Output handle for the created pool

  • int ABT_pool_free(ABT_pool *pool)

    Free a pool (only if created with automatic = ABT_FALSE).

Scheduler Functions
  • int ABT_sched_create_basic(ABT_sched_predef predef, int num_pools, ABT_pool *pools, ABT_sched_config config, ABT_sched *newsched)

    Create a scheduler with predefined type and specific pools.