Overview
The High Availability module is an optional module consisting of a group of components that together combine to facilitate the continual monitoring and automatic disaster recovery of NMS sites. The high availability module provides:
Continual monitoring of the state of each critical component within each NMS site.
Automatic disaster recovery from one NMS site to another.
A comprehensive set of web pages for reporting the state of each site and its components and configuration of the sites being monitored.
The main components involved in the automatic failover and recovery process are defined below.
NMS Agent
The NMS Agent runs on each NMS Services server and is responsible for monitoring NMS back-end services and reporting their state to the NMS Monitor module. In a dual-environment configuration, both administrative users will have an NMS Agent instance.
NMS Monitor
The NMS Monitor runs on a WebLogic managed server at each site and is responsible for periodically requesting the current status of the site from the NMS Agents, WebLogic managed servers and databases, storing the status of the sites in a ZooKeeper cluster, coordinating with other NMS Monitors in the system, determining the state of each site and triggering automatic failover and recovery of the system through Site Guard.
CESEJB, NMS-WS
The CESEJB and NMS-WS deployments provide REST APIs to allow the NMS Monitor to request the current status of the deployment.
WebLogic Server
The WebLogic servers are used to deploy the NMS Monitor. An NMS Monitor managed server is installed on each WebLogic instance. The WebLogic Admin Server reports the state of all managed servers and their applications to the NMS Monitor.
Database Server
The NMS Monitor interrogates the databases to determine the active and staging environment for each site.
EMCLI
The Enterprise Manager Command Line Interface (emcli) is installed on each WebLogic server and is used by the NMS Monitor to read site and operation plans from the Site Guard and request the automatic execution of a failover operation plan.
Site Guard
Oracle Site Guard is used for disaster recovery to initiate switchover or failover from an NMS instance at one site to an NMS Instance at another site. NMS High Availability requires either a single Enterprise Manager instance with Site Guard configured or a separate Enterprise Manager instance at each NMS site with identical Site Guard configuration. The former is recommended since the latter is more complicated. For full details of Site Guard and its configuration, see “Site Guard”.
ZooKeeper
An Apache ZooKeeper cluster is used to store the site status information collated by each of the NMS Monitors allowing each monitor to see the complete state of all NMS sites in the system in order to make an intelligent decision on the best disaster recovery plan to execute.