Debugging a Mochi application
Figuring out why a particular error happens down in Mercury or why a Mochi service is performing poorly can be difficult. In this tutorial we will see what users can do to help diagnose problems with Mochi codes.
First steps: Trouble Initializing
Mercury can use the Libfabric (OFI) transport layer for inter-node RPC
messaging. Sometimes configuring libfabric can be tricky.
The most common issue people are seeing when starting with
Mochi is Margo failing to initialize. 99% of the time this
is due to libfabric not being compiled with the right providers.
For instance if you need
tcp, libfabric will need to be
compiler with fabrics=tcp,rxm.
If misconfigured, Margo and Mercury will try to help you out by reporting an error like this:
1# [1035629.626564] mercury->fatal: [error] [..]/src/na/na_ofi.c:1807 2 # na_ofi_provider_check(): Requested OFI provider "verbs;ofi_rxm" (derived from "verbs" 3 protocol) is not available. Please re-compile libfabric with support for 4 "verbs;ofi_rxm" or use one of the following available providers: 5 tcp;ofi_rxm udp tcp sockets 6# [1035629.626629] mercury->fatal: [error] [..]/src/na/na.c:327 7 # NA_Initialize_opt(): No suitable plugin found that matches verbs 8[error] Could not initialize hg_class
Note in line 2 the “Requested OFI provider” – that is the protocol Mercury tried to use. In line 5, Mercury reports the providers supported by this build of libfabric.
On some systems, only the “compute nodes” have e.g. infiniband cards. You can verify with the
fi_info -l command.
Enabling logging in Mercury
More information can be obtained from Mercury by making sure
to build it with the
+debug variant. Once this is done,
debug, may provide more information about what Mercury
is trying to do and why it failed.
More information on logging with Mercury can be found