Troubleshooting on NeSIΒΆ
The advice in this page applies to the Globus backend, which is the
primary supported configuration. The Paramiko (SSH/SFTP) backend is
experimental and has had little stability work; if it is failing, the most
useful step is usually to confirm that you can SSH into the remote machine
manually with the configured key and that tmux is available there.
The NeSI platform occasionally experiences stability issues related the filesystems, Slurm, networking and Globus. RJM attempts to handle these issues by retrying commands that have failed but it is not always successful.
If RJM isn't working well first try running the rjm_health_check program (-ll debug will print additional
output that can be useful for debugging):
rjm_health_check -ll debug
There is no longer any need to run rjm_restart as we are now using the NeSI managed Globus Compute endpoint.
If you encounter problems, please contact NeSI Support and mention that you are using
Globus Transfer and Compute via the RemoteJobManager tool. You could also include the output from rjm_health_check -ll debug.