Database snapshot and restore
In addition to migration, Yokan supports snapshotting a live database to a directory accessible through the local filesystem, and later restoring it into a provider. This is intended for HPC use cases where databases run against fast local storage on compute nodes (RAM, NVMe) and need to be periodically checkpointed to a more durable shared location such as a parallel filesystem.
Unlike migration, snapshot/restore:
does not require Yokan to be compiled with REMI;
does not require a second running provider;
preserves the source database by default (no transfer of ownership);
uses ordinary filesystem I/O via
std::filesystem::copy, so any destination reachable through a POSIX path will work.
How it works
A snapshot is a directory containing:
A manifest file
yokan-snapshot.jsonthat records the backend type, the original database configuration, and the list of files that make up the database.A
data/subdirectory that holds those files (copies of the backing store for on-disk backends, a serialized dump for in-memory backends).
Restore reads the manifest, copies the files out of data/ into a working
root path (typically a local SSD), then re-opens the database from that root
and attaches it to the target provider.
Snapshot API
struct yk_snapshot_options {
const char* extra_config; // reserved for backend-specific knobs
size_t xfer_size; // reserved; chunk size for the copy loop
};
yk_return_t yk_provider_snapshot_database(
yk_provider_t provider,
const char* dest_path,
bool remove_source,
const struct yk_snapshot_options* options);
provider: the source provider whose currently-attached database to snapshot.dest_path: a local directory path where the snapshot will be written. Created if it does not exist.remove_source: iftrue, the source database is destroyed after a successful snapshot (similar to migrate). Iffalse, the database remains attached and continues to serve.options: optional; may beNULL.
Restore API
struct yk_restore_options {
const char* new_root; // required: working root for the restored DB
const char* extra_config; // optional JSON merged into recovered db_config
size_t xfer_size; // reserved
};
yk_return_t yk_provider_restore_database(
yk_provider_t provider,
const char* src_path,
const struct yk_restore_options* options);
provider: the provider to attach the restored database to. If a database is already attached, it is destroyed first.src_path: directory previously produced byyk_provider_snapshot_database.options->new_rootis required: it names the local directory the restored database will operate against. The snapshot files are first copied fromsrc_path/data/intonew_rootand the database is then opened there. This keeps the snapshot on the parallel filesystem pristine and ensures that any subsequent writes go to local storage. Calling restore withoutoptions(or withnew_root == NULL) returnsYOKAN_ERR_INVALID_ARGS.options->extra_config: a JSON object whose fields are merged into the database configuration recorded in the manifest. Useful for adjusting backend-specific settings (e.g. cache sizes) on restore.
Snapshot/restore example
The following example creates a database, snapshots it without removing the source, then restores the snapshot into a second provider and verifies that all keys round-trip:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <margo.h>
#include <yokan/server.h>
#include <yokan/client.h>
#include <yokan/database.h>
int main(int argc, char** argv)
{
(void)argc; (void)argv;
// Initialize Margo (na+sm is fine for a single-process demo)
margo_instance_id mid = margo_init("na+sm", MARGO_SERVER_MODE, 0, 0);
assert(mid);
hg_addr_t addr;
hg_return_t hret = margo_addr_self(mid, &addr);
assert(hret == HG_SUCCESS);
(void)hret;
// Provider 1: source — has a map database.
yk_provider_t src_provider;
struct yk_provider_args args = YOKAN_PROVIDER_ARGS_INIT;
const char* src_config = "{ \"database\": { \"type\": \"map\" } }";
yk_return_t yret = yk_provider_register(
mid, 1, src_config, &args, &src_provider);
assert(yret == YOKAN_SUCCESS);
(void)yret;
// Provider 2: destination — starts with no database.
yk_provider_t dst_provider;
yret = yk_provider_register(mid, 2, "{}", &args, &dst_provider);
assert(yret == YOKAN_SUCCESS);
yk_client_t client;
yret = yk_client_init(mid, &client);
assert(yret == YOKAN_SUCCESS);
yk_database_handle_t dbh_src;
yret = yk_database_handle_create(client, addr, 1, true, &dbh_src);
assert(yret == YOKAN_SUCCESS);
// Populate the source database.
printf("Populating source database...\n");
for(int i = 0; i < 10; i++) {
char key[16], value[16];
sprintf(key, "key%05d", i);
sprintf(value, "value%05d", i);
yret = yk_put(dbh_src, 0, key, strlen(key), value, strlen(value));
assert(yret == YOKAN_SUCCESS);
}
printf(" Inserted 10 key/value pairs\n");
// Snapshot the database to a local directory. Imagine this path lives
// on a parallel filesystem (Lustre, GPFS, etc.) in production. With
// remove_source = false, the source database stays live.
const char* snap_dir = "/tmp/yokan-snapshot-example";
struct yk_snapshot_options snap_opts = { NULL /*extra_config*/,
0 /*xfer_size*/ };
printf("\nSnapshotting source to %s...\n", snap_dir);
yret = yk_provider_snapshot_database(
src_provider, snap_dir, false /*remove_source*/, &snap_opts);
assert(yret == YOKAN_SUCCESS);
printf(" Snapshot complete\n");
// Source database is still live.
{
char buf[32]; size_t vsize = sizeof(buf);
memset(buf, 0, sizeof(buf));
yret = yk_get(dbh_src, 0, "key00003", 8, buf, &vsize);
assert(yret == YOKAN_SUCCESS);
printf(" Source still serves: key00003 -> %.*s\n", (int)vsize, buf);
}
yk_database_handle_release(dbh_src);
// Restore the snapshot into the (empty) destination provider. The
// restored database will operate against new_root — typically a local
// SSD path so subsequent writes don't touch the parallel filesystem.
const char* restored_root = "/tmp/yokan-restored-example";
struct yk_restore_options rest_opts = {
restored_root, // new_root: required
NULL, // extra_config
0 // xfer_size
};
printf("\nRestoring snapshot into provider 2 (working root: %s)...\n",
restored_root);
yret = yk_provider_restore_database(dst_provider, snap_dir, &rest_opts);
assert(yret == YOKAN_SUCCESS);
printf(" Restore complete\n");
// Verify all keys round-trip on the destination.
yk_database_handle_t dbh_dst;
yret = yk_database_handle_create(client, addr, 2, true, &dbh_dst);
assert(yret == YOKAN_SUCCESS);
printf("\nVerifying restored data at destination...\n");
for(int i = 0; i < 10; i++) {
char key[16], expected[16], buf[32];
sprintf(key, "key%05d", i);
sprintf(expected, "value%05d", i);
size_t vsize = sizeof(buf);
memset(buf, 0, sizeof(buf));
yret = yk_get(dbh_dst, 0, key, strlen(key), buf, &vsize);
assert(yret == YOKAN_SUCCESS);
assert(vsize == strlen(expected));
assert(memcmp(buf, expected, vsize) == 0);
}
printf(" All 10 key/value pairs successfully restored\n");
// Clean up.
yk_database_handle_release(dbh_dst);
yk_client_finalize(client);
margo_addr_free(mid, addr);
margo_finalize(mid);
printf("\nSnapshot/restore example completed successfully!\n");
return 0;
}
Behavior and guarantees
Snapshot consistency. While yk_provider_snapshot_database runs, it
holds the same lock that database migration takes: writers are blocked for
the duration of the file copy. For large on-disk databases this can be a
non-trivial pause; plan snapshot frequency accordingly. In-memory backends
serialize the full database to a temporary file before copying, so the lock
is held for the serialize-then-copy interval.
Source preservation. With remove_source = false (the typical
checkpoint pattern), the database stays attached and serving the moment the
snapshot completes. With remove_source = true, the behavior matches
migration: the in-memory state or on-disk files are cleared, the database
is detached, and the provider returns YOKAN_ERR_INVALID_DATABASE for
subsequent operations.
Restore atomicity. Restore destroys any pre-existing database on the
target provider before the new one is attached. Once it returns
YOKAN_SUCCESS, the new database is live and serving. If restore fails
partway through, the target provider’s previous database is already gone
(the destroy step happens before recovery is attempted) — callers should
treat this as a clean-slate scenario rather than expecting rollback.
Backend compatibility
Snapshot and restore reuse the same per-backend machinery that powers migration:
Snapshot uses each backend’s
startMigration()to obtain the file list and acquire the necessary locks.Restore uses each backend’s
recover()static factory to re-open the database from the copied files.
Backends that don’t implement these (currently berkeleydb and null)
return YOKAN_ERR_OP_UNSUPPORTED from snapshot. All other built-in
backends — map, unordered_map, set, unordered_set, array,
log, leveldb, rocksdb, lmdb, gdbm, tkrzw, and
unqlite — support snapshot/restore.
Using snapshot/restore with Bedrock
Snapshot and restore are exposed as the snapshot() and restore()
overrides on the Bedrock yokan component. Bedrock’s
ProviderManager::snapshotProvider and restoreProvider (or the
equivalent admin RPCs) route into these.
For restore through Bedrock, pass the working directory in the JSON options
under the key new_root:
{
"new_root": "/local/ssd/yokan-restored",
"extra_config": {}
}
Comparison with migration
Aspect |
Migration |
Snapshot / restore |
|---|---|---|
Requires REMI |
Yes |
No |
Requires a second provider |
Yes (destination) |
No |
Transfer mechanism |
Mercury / REMI |
Filesystem copy |
Source preserved |
No (always cleared) |
Optional |
Typical use case |
Relocation, load balancing |
Checkpoint to PFS |
Snapshot/restore is the right tool when source and destination are reachable through the same filesystem (compute node and shared PFS). Migration is the right tool when shipping a database across nodes without a shared filesystem.