This document describes the Nagios plugins mainly used to monitor NorduGrid ARC compute elements and related resources, but some probes should also be usable to test non-ARC resources. The package includes commands to do
The following chapters will cover the probes related to each of these topics. This chapter will describe common configuration and options.
Acknowledgements. This work is co-funded by the EC EMI project under the FP7 Collaborative Projects Grant Agreement Nr. INFSO-RI-261611.
The configuration is merged from a list of the INI-format files, where settings from later files take precedence, and missing files are ignored. By the default the files considered are
/etc/nagios/plugins.ini
/etc/nagios/plugins/arcnagios-dist.ini
/etc/nagios/plugins/arcnagios.ini
/etc/nagios/plugins/arcnagios-local.ini
but this can be overridden by setting the environment variable $ARCNAGIOS_CONFIG to a colon-separated list. arcnagios-dist.ini is distributed with the plugins and contains a small collection of predefined tests for the CE and infosys probes.
Each probe has a main configuration section, with is named after the probe. In this section you can provide defaults for command-line options. The name of the configuration variable corresponding to an option is obtained by stripping the initial “--” and replacing “-” with “_“, e.g. “--home-dir” becomes “home_dir“.
The following options are common to all probes:
The check_arcce and check_gridstorage probes will require a proxy certificate to succeed. The probes will maintain a proxy when provided a X509 certificate and key. You can place these in a common section:
[gridproxy]
default_voms = <voms>
user_key = <path>
user_cert = <path>
#user_proxy = <path> # Optionally override the path of the generated proxy.
The probes which require an X509 proxy have a --voms=<voms> option to specify the VOMS server to contact instead of default_voms. When a user_key and user_cert pair is given, the default user_proxy path is unique to the selected VOMS.
To use a pre-initialized proxy, make sure user_key and user_cert are not set. You will probably want to use a non-default location for the proxy. Either point to it with the environment variable X509_USER_PROXY or set it in the configuration file:
[gridproxy]
user_proxy = <path>
If you use several VOs with require different certificates, you can replace the above section with one section gridproxy.<voms> per <voms> and use the --voms option to select which section to use. These sections don’t have the default_voms setting.
The configuration file of these probes should not be generated or parts substituted from an untrusted source without proper filtering. In particular the job tests picks up shell code to be executed on cluster nodes from configuration variables, and the ARIS tests uses the Python interpreter to evaluate custom expressions.
The following instructions apply to check_arcce_submit, check_arcce_monitor, check_arcce_clean, check_aris, check_egiis, check_arcglue2, and check_arcstorage. It also applies to the deprecated check_arcinfosys and check_arcce. The other probes can be invoked from the command-line without special attention.
For testing and debugging, it can be convenient to invoke the probes manually as a regular user. This can be done as follows. Choose a directory where you can store run-time state. Below, we use /tmp, but it may be tidier to create a fresh directory. Then, create a configuration like
[DEFAULT]
plugins_spooldir = /tmp
[gridproxy]
default_voms = <your-vo>
[gridproxy.your-vo]
user_proxy = /tmp/x509up_u<your-user-id>
substituting suitable values for the <your-*> meta-variables. You may need to add additional settings depending on want you test, of course. After acquiring a proxy certificate (if needed) and pointing to the new configuration file,
arcproxy -S <your-vo>
export ARCNAGIOS_CONFIG=<your-config>
The probes can now be run as
check_arcce_submit --how-invoked=manual ...
check_arcce_monitor --how-invoked=manual ...
check_arcce_clean --how-invoked=manual ...
check_egiis --how-invoked=manual ...
check_aris --how-invoked=manual ...
check_arcglue2 --how-invoked=manual ...
The main purpose of the --how-invoked=manual is to tell the probe that any passives results shall be printed to the screen rather than submitted to the Nagios command pipe. It is not strictly needed for active-only probes.
The following probes are deprecated. They will be removed in a future release.