Execution Streams and Pools
In this tutorial, you will learn how to create multiple execution streams for parallel execution and understand how pools distribute work among them. This is essential for achieving high performance in Mochi applications and understanding how Bedrock configures Argobots for your services.
Key Concepts
- Execution Streams (xstreams)
Execution streams are OS-level threads that execute Argobots work units (ULTs). Each execution stream runs independently and can execute work in parallel with other streams. Think of them as worker threads.
Best Practice: Create one execution stream per CPU core for optimal performance. Creating more execution streams than cores can lead to contention and reduced performance.
- Pools
Pools are work queues that hold ULTs waiting to be executed. Each execution stream has at least one pool that it pulls work from. Pools can be:
Private: Only one execution stream accesses the pool (lower overhead)
Shared: Multiple execution streams can access the pool (enables work-stealing)
- Pool Types
Argobots provides several predefined pool types:
ABT_POOL_FIFO: First-In-First-Out queue (simple, low overhead)ABT_POOL_FIFO_WAIT: FIFO with blocking wait when emptyABT_POOL_RANDWS: Random work-stealing pool (best for load balancing)
You will see in a latter tutorial how to create new pool types. Margo, for instance, provides two additional pool types (that can be initialized via Margo’s API): “prio_wait” and “earliest_first”.
- Pool Access Modes
Access modes control thread safety and determine which execution streams can access a pool:
ABT_POOL_ACCESS_PRIV: Single Producer, Single Consumer (private to one xstream)ABT_POOL_ACCESS_SPSC: Single Producer, Single Consumer (optimized lock-free)ABT_POOL_ACCESS_MPSC: Multiple Producers, Single ConsumerABT_POOL_ACCESS_SPMC: Single Producer, Multiple ConsumersABT_POOL_ACCESS_MPMC: Multiple Producers, Multiple Consumers (full thread-safe)
Work Distribution Strategies
Fixed Allocation: Each execution stream has its own private pool. ULTs are statically assigned to pools and never migrate. Simple but can lead to load imbalance.
Work-Stealing: Execution streams share pools. When an execution stream’s pool is empty, it can “steal” work from other pools. Better load balancing but higher overhead due to synchronization.
Fixed Allocation Example
In fixed allocation, each execution stream has its own private pool. ULTs are assigned to specific pools and will only execute on that pool’s execution stream.
1/*
2 * Fixed allocation: Each execution stream has its own private pool
3 * ULTs are statically assigned to pools and cannot migrate
4 */
5
6#include <stdio.h>
7#include <stdlib.h>
8#include <abt.h>
9
10#define NUM_XSTREAMS 4
11#define NUM_THREADS 16
12
13typedef struct {
14 int thread_id;
15} thread_arg_t;
16
17void thread_func(void *arg)
18{
19 int thread_id = ((thread_arg_t *)arg)->thread_id;
20 int xstream_rank;
21
22 /* Get the rank of the execution stream running this ULT */
23 ABT_xstream_self_rank(&xstream_rank);
24
25 printf("ULT %2d executing on ES %d (fixed allocation)\n",
26 thread_id, xstream_rank);
27}
28
29int main(int argc, char **argv)
30{
31 int i;
32 ABT_xstream xstreams[NUM_XSTREAMS];
33 ABT_pool pools[NUM_XSTREAMS];
34 ABT_thread threads[NUM_THREADS];
35 thread_arg_t thread_args[NUM_THREADS];
36
37 /* Initialize Argobots */
38 ABT_init(argc, argv);
39
40 printf("=== Fixed Allocation Example ===\n");
41 printf("Creating %d execution streams with private pools\n\n", NUM_XSTREAMS);
42
43 /* Get the primary execution stream */
44 ABT_xstream_self(&xstreams[0]);
45
46 /* Get the primary execution stream's default pool */
47 ABT_xstream_get_main_pools(xstreams[0], 1, &pools[0]);
48
49 /* Create additional execution streams (each gets its own pool automatically) */
50 for (i = 1; i < NUM_XSTREAMS; i++) {
51 ABT_xstream_create(ABT_SCHED_NULL, &xstreams[i]);
52 /* Get the default pool for this execution stream */
53 ABT_xstream_get_main_pools(xstreams[i], 1, &pools[i]);
54 }
55
56 /* Create ULTs and assign them to pools in round-robin fashion */
57 for (i = 0; i < NUM_THREADS; i++) {
58 int pool_id = i % NUM_XSTREAMS;
59 thread_args[i].thread_id = i;
60
61 /* Each ULT is added to a specific pool and will only execute on
62 that pool's execution stream */
63 ABT_thread_create(pools[pool_id], thread_func, &thread_args[i],
64 ABT_THREAD_ATTR_NULL, &threads[i]);
65 }
66
67 /* Wait for all ULTs to complete */
68 for (i = 0; i < NUM_THREADS; i++) {
69 ABT_thread_join(threads[i]);
70 ABT_thread_free(&threads[i]);
71 }
72
73 /* Join and free secondary execution streams */
74 for (i = 1; i < NUM_XSTREAMS; i++) {
75 ABT_xstream_join(xstreams[i]);
76 ABT_xstream_free(&xstreams[i]);
77 }
78
79 printf("\nAll ULTs completed\n");
80 printf("Note: Each ULT executed only on its assigned execution stream\n");
81
82 /* Finalize Argobots */
83 ABT_finalize();
84
85 return 0;
86}
Expected output (order may vary):
=== Fixed Allocation Example ===
Creating 4 execution streams with private pools
ULT 0 executing on ES 0 (fixed allocation)
ULT 4 executing on ES 0 (fixed allocation)
ULT 8 executing on ES 0 (fixed allocation)
ULT 12 executing on ES 0 (fixed allocation)
ULT 1 executing on ES 1 (fixed allocation)
ULT 5 executing on ES 1 (fixed allocation)
...
All ULTs completed
Note: Each ULT executed only on its assigned execution stream
Notice that ULT IDs are grouped by execution stream: 0,4,8,12 on ES 0; 1,5,9,13 on ES 1; etc.
Key Points
- Private Pools
Each execution stream automatically gets a default private pool when created with
ABT_xstream_create(ABT_SCHED_NULL, &xstream). PassingABT_SCHED_NULLas scheduler will make Argobots instantiate a new default scheduler with a default (private) pool. We retrieve this pool usingABT_xstream_get_main_pools().- Static Assignment
ULTs are assigned to pools in round-robin fashion. Once assigned, a ULT will only execute on that pool’s execution stream. This is simple and has low overhead since pools don’t need synchronization.
- Advantages:
Lower overhead (no lock contention)
Predictable execution (ULT always runs on same execution stream)
Better cache locality
- Disadvantages:
Load imbalance: Some execution streams may finish early while others are still busy
No dynamic load balancing
Work-Stealing Example
In work-stealing, execution streams can access multiple pools. When an execution stream runs out of work in its first pool, it can steal work from other pools.
1/*
2 * Work-stealing: Execution streams share pools and can steal work from each other
3 * This improves load balancing when some execution streams finish their work early
4 */
5
6#include <stdio.h>
7#include <stdlib.h>
8#include <unistd.h>
9#include <abt.h>
10
11#define NUM_XSTREAMS 4
12#define NUM_THREADS 16
13
14typedef struct {
15 int thread_id;
16} thread_arg_t;
17
18void thread_func(void *arg)
19{
20 int thread_id = ((thread_arg_t *)arg)->thread_id;
21 int xstream_rank;
22
23 /* Get the rank of the execution stream running this ULT */
24 ABT_xstream_self_rank(&xstream_rank);
25
26 printf("ULT %2d executing on ES %d", thread_id, xstream_rank);
27
28 /* Simulate varying work amounts */
29 if (thread_id % 4 == 0) {
30 printf(" (heavy work)");
31 usleep(100000); /* 100ms */
32 } else {
33 printf(" (light work)");
34 usleep(10000); /* 10ms */
35 }
36 printf("\n");
37}
38
39int main(int argc, char **argv)
40{
41 int i, j;
42 ABT_xstream xstreams[NUM_XSTREAMS];
43 ABT_pool pools[NUM_XSTREAMS];
44 ABT_sched scheds[NUM_XSTREAMS];
45 ABT_thread threads[NUM_THREADS];
46 thread_arg_t thread_args[NUM_THREADS];
47
48 /* Initialize Argobots */
49 ABT_init(argc, argv);
50
51 printf("=== Work-Stealing Example ===\n");
52 printf("Creating %d execution streams with shared pools\n\n", NUM_XSTREAMS);
53
54 /* Create pools with work-stealing capability
55 ABT_POOL_ACCESS_MPMC = Multiple Producers, Multiple Consumers
56 This allows multiple execution streams to access the pool */
57 for (i = 0; i < NUM_XSTREAMS; i++) {
58 ABT_pool_create_basic(ABT_POOL_FIFO, /* Pool kind: FIFO */
59 ABT_POOL_ACCESS_MPMC, /* Access: thread-safe */
60 ABT_TRUE, /* Automatic free */
61 &pools[i]);
62 }
63
64 /* Create schedulers that can access ALL pools
65 Each scheduler can steal work from other pools when its own pool is empty */
66 for (i = 0; i < NUM_XSTREAMS; i++) {
67 ABT_pool *sched_pools = (ABT_pool *)malloc(sizeof(ABT_pool) * NUM_XSTREAMS);
68
69 /* Pool priority order: own pool first, then others in round-robin */
70 for (j = 0; j < NUM_XSTREAMS; j++) {
71 sched_pools[j] = pools[(i + j) % NUM_XSTREAMS];
72 }
73
74 /* Create a scheduler with access to all pools */
75 ABT_sched_create_basic(ABT_SCHED_DEFAULT, /* Default scheduler */
76 NUM_XSTREAMS, /* Number of pools */
77 sched_pools, /* Array of pools */
78 ABT_SCHED_CONFIG_NULL, /* Default config */
79 &scheds[i]);
80 free(sched_pools);
81 }
82
83 /* Set up the primary execution stream with its scheduler */
84 ABT_xstream_self(&xstreams[0]);
85 ABT_xstream_set_main_sched(xstreams[0], scheds[0]);
86
87 /* Create secondary execution streams with their schedulers */
88 for (i = 1; i < NUM_XSTREAMS; i++) {
89 ABT_xstream_create(scheds[i], &xstreams[i]);
90 }
91
92 /* Create ULTs and add them to pools */
93 printf("Creating %d ULTs (some with heavy work)...\n\n", NUM_THREADS);
94 for (i = 0; i < NUM_THREADS; i++) {
95 int pool_id = i % NUM_XSTREAMS;
96 thread_args[i].thread_id = i;
97
98 /* ULTs are initially added to specific pools, but can be stolen
99 by other execution streams if those streams run out of work */
100 ABT_thread_create(pools[pool_id], thread_func, &thread_args[i],
101 ABT_THREAD_ATTR_NULL, &threads[i]);
102 }
103
104 /* Wait for all ULTs to complete */
105 for (i = 0; i < NUM_THREADS; i++) {
106 ABT_thread_join(threads[i]);
107 ABT_thread_free(&threads[i]);
108 }
109
110 /* Join and free secondary execution streams */
111 for (i = 1; i < NUM_XSTREAMS; i++) {
112 ABT_xstream_join(xstreams[i]);
113 ABT_xstream_free(&xstreams[i]);
114 }
115
116 printf("\nAll ULTs completed\n");
117 printf("Note: ULTs may have executed on different execution streams\n");
118 printf(" due to work-stealing for better load balancing\n");
119
120 /* Finalize Argobots */
121 ABT_finalize();
122
123 return 0;
124}
Expected output (order will vary significantly):
=== Work-Stealing Example ===
Creating 4 execution streams with shared pools
Creating 16 ULTs (some with heavy work)...
ULT 1 executing on ES 1 (light work)
ULT 2 executing on ES 2 (light work)
ULT 0 executing on ES 0 (heavy work)
ULT 3 executing on ES 3 (light work)
ULT 5 executing on ES 2 (light work)
...
All ULTs completed
Note: ULTs may have executed on different execution streams
due to work-stealing for better load balancing
Notice that ULTs are not strictly grouped by execution stream. Execution streams that finish their light work may steal heavy work from others.
Key Points
- Creating Shared Pools
ABT_pool_create_basic(ABT_POOL_FIFO, /* Pool type */ ABT_POOL_ACCESS_MPMC, /* Access mode */ ABT_TRUE, /* Automatic free */ &pools[i]);
The
ABT_POOL_ACCESS_MPMCaccess mode makes the pool thread-safe, allowing multiple execution streams to safely push and pop work.- Scheduler with Multiple Pools
/* Pool priority order: own pool first, then others */ for (j = 0; j < NUM_XSTREAMS; j++) { sched_pools[j] = pools[(i + j) % NUM_XSTREAMS]; } ABT_sched_create_basic(ABT_SCHED_DEFAULT, NUM_XSTREAMS, sched_pools, ABT_SCHED_CONFIG_NULL, &scheds[i]);
Each scheduler gets access to all pools, ordered differently. The scheduler first checks its first pool, then tries others in order. This enables work-stealing.
- Varying Workloads
The example simulates varying work amounts to demonstrate work-stealing. ULTs with heavy work take longer, allowing execution streams with light work to steal from pools that still have pending ULTs.
- Advantages:
Better load balancing (idle execution streams steal work)
More efficient use of resources
Handles dynamic and unpredictable workloads well
- Disadvantages:
Higher overhead (synchronization costs)
Less predictable execution
Potential cache thrashing from migration
Understanding Pool Access Modes
Choosing the right access mode is critical for performance:
- ABT_POOL_ACCESS_PRIV / SPSC
Use for private pools accessed by only one execution stream. Lowest overhead.
/* Default pools created by ABT_xstream_create() are private */ ABT_xstream_create(ABT_SCHED_NULL, &xstream);
- ABT_POOL_ACCESS_MPSC
Use when multiple execution streams create work but only one executes it. Common in producer-consumer patterns.
- ABT_POOL_ACCESS_MPMC
Use for work-stealing pools where multiple execution streams both produce and consume work. Highest overhead but most flexible.
ABT_pool_create_basic(ABT_POOL_FIFO, ABT_POOL_ACCESS_MPMC, ABT_TRUE, &pool);
When to Use Each Strategy
- Use Fixed Allocation When:
Workload is balanced and predictable
Each task belongs to a specific domain (e.g., processing separate data partitions)
Minimizing overhead is critical
Cache locality is important
Example: Margo RPC handlers on dedicated pools
- Use Work-Stealing When:
Workload is unbalanced or unpredictable
Task execution times vary significantly
Maximizing throughput is more important than latency
You have more tasks than execution streams
Example: Task-parallel algorithms, recursive divide-and-conquer
Mochi/Bedrock Connection
Understanding execution streams and pools is crucial for configuring Mochi services through Bedrock and Margo. Bedrock and Margo configurations allow you to:
Create custom pools for different types of work
Assign RPC handlers or providers to specific pools
Configure work-stealing for load balancing
Set pool access modes for optimal performance
Example Bedrock pool configuration:
{
"argobots": {
"pools": [
{
"name": "rpc_pool",
"kind": "fifo",
"access": "mpmc"
},
{
"name": "io_pool",
"kind": "fifo_wait",
"access": "mpmc"
}
],
"xstreams": [
{
"name": "rpc_xstream",
"scheduler": {
"type": "basic_wait",
"pools": ["rpc_pool"]
}
},
{
"name": "io_xstream",
"scheduler": {
"type": "basic_wait",
"pools": ["io_pool", "rpc_pool"]
}
}
]
}
}
This configuration creates dedicated pools and execution streams for RPC and I/O operations, with the I/O execution stream able to steal work from the RPC pool.
Common Pitfalls
- Too Many Execution Streams
Creating more execution streams than CPU cores often reduces performance due to core overloading, context switching overhead, and cache contention.
/* WRONG: 100 execution streams on a 4-core machine */ for (i = 0; i < 100; i++) { ABT_xstream_create(ABT_SCHED_NULL, &xstreams[i]); }
Best Practice: Use
sched_getaffinity()or environment variables to determine the number of available cores.- Forgetting to Join Execution Streams
Always join and free secondary execution streams before finalizing Argobots:
/* Must join and free all secondary execution streams */ for (i = 1; i < num_xstreams; i++) { ABT_xstream_join(xstreams[i]); ABT_xstream_free(&xstreams[i]); }
- Wrong Pool Access Mode
Using
ABT_POOL_ACCESS_PRIVfor a shared pool causes data races:/* WRONG: Private access mode for shared pool */ ABT_pool_create_basic(ABT_POOL_FIFO, ABT_POOL_ACCESS_PRIV, ABT_TRUE, &shared_pool); /* Multiple xstreams accessing this pool = undefined behavior */
Fix: Use
ABT_POOL_ACCESS_MPMCfor work-stealing pools.- Not Freeing Pools Created with ABT_pool_create_basic
If you create pools manually and set
automatictoABT_FALSE, you must free them:ABT_pool_create_basic(ABT_POOL_FIFO, ABT_POOL_ACCESS_MPMC, ABT_FALSE, &pool); /* Manual free */ /* ... use pool ... */ ABT_pool_free(&pool); /* Must free manually */
API Reference
This tutorial covered the following Argobots functions:
- Execution Stream Functions
int ABT_xstream_create(ABT_sched sched, ABT_xstream *newxstream)Create a new execution stream with the specified scheduler. Pass
ABT_SCHED_NULLto use the default scheduler.int ABT_xstream_create_basic(ABT_sched_predef predef, int num_pools, ABT_pool *pools, ABT_sched_config config, ABT_xstream *newxstream)Create an execution stream with a predefined scheduler and specific pools.
int ABT_xstream_join(ABT_xstream xstream)Wait for an execution stream to terminate. Must be called before freeing.
int ABT_xstream_free(ABT_xstream *xstream)Free an execution stream. Must call
ABT_xstream_join()first.int ABT_xstream_self_rank(int *rank)Get the rank of the current execution stream (0 for primary, 1+ for secondary).
int ABT_xstream_set_main_sched(ABT_xstream xstream, ABT_sched sched)Set the main scheduler for an execution stream.
- Pool Functions
int ABT_pool_create_basic(ABT_pool_kind kind, ABT_pool_access access, ABT_bool automatic, ABT_pool *newpool)Create a pool with predefined kind and access mode.
- Parameters:
kind: Pool type (FIFO, FIFO_WAIT, RANDWS)access: Access mode (PRIV, SPSC, MPSC, SPMC, MPMC)automatic: IfABT_TRUE, pool is automatically freednewpool: Output handle for the created pool
int ABT_pool_free(ABT_pool *pool)Free a pool (only if created with
automatic = ABT_FALSE).
- Scheduler Functions
int ABT_sched_create_basic(ABT_sched_predef predef, int num_pools, ABT_pool *pools, ABT_sched_config config, ABT_sched *newsched)Create a scheduler with predefined type and specific pools.