The sections on "The Role of tm.rdb" through "The Role of tm.watchd" provide brief descriptions of the CRE master daemons: tm.rdb, tm.mpmd, and tm.watchd. The sections on "The Role of tm.omd" describe the nodal daemons, tm.omd, and tm.spmd.
tm.rdb is the resource database daemon. It runs on the master node and implements the resource database used by the other parts of the CRE. This database represents the state of the cluster and the jobs running on it.
When changes are made to the cluster configuration, the tm.rdb daemon must be restarted to update the database to reflect the new conditions. For example, if a node is added to a partition, tm.rdb must be restarted to implement these changes.
tm.mpmd is the master process-management daemon. It runs on the master node and services user (client) requests made via the mprun command. It also interacts with the resource database via calls to tm.rdb and coordinates the operations of the nodal client daemons.
tm.watchd is the cluster watcher daemon. It runs on the master node and monitors the states of cluster resources and jobs and, as necessary:
Marks individual nodes as online or offline by periodically executing remote procedure calls (RPCs) to all of the nodes.
Clears stale resource database (rdb) locks.
If the -Yk option has been enabled, aborts jobs that have processes on nodes determined to be down. This option is disabled by default.
tm.omd is the object-monitoring daemon. It runs on all the nodes in the cluster, including the master node, and continually updates the database with dynamic information concerning the nodes, most notably their load. It also initializes the database with static information about the nodes, such as their host names and network interfaces, when the CRE starts up.
tm.spmd is the slave process-management daemon.It runs on all the compute nodes of the cluster and, as necessary:
Handles spawning and termination of nodal processes per requests from the tm.mpmd.
In conjunction with mprun, handles multiplexing of stdio streams for nodal processes.
Interacts with the resource database via calls to tm.rdb.