Troubleshooting on NeSIΒΆ

The NeSI platform occasionally experiences stability issues related the filesystems, Slurm, networking and Globus. RJM attempts to handle these issues by retrying commands that have failed but it is not always successful.

If RJM isn't working well first try running the rjm_health_check program (-ll debug will print additional output that can be useful for debugging):

rjm_health_check -ll debug

There is no longer any need to run rjm_restart as we are now using the NeSI managed Globus Compute endpoint.

If you encounter problems, please contact NeSI Support and mention that you are using Globus Transfer and Compute via the RemoteJobManager tool. You could also include the output from rjm_health_check -ll debug.