Recover a KMA

The cluster allows the system to recover from a KMA failure, as long as there is at least one functioning KMA in the cluster.

OKM uses a cluster of at least two KMAs to reduce the risk of disruptions and assist in recovery. Clustering KMAs allows you to replicate database entries and balance workloads. If a component fails, it can be easily replaced and restored. When designing an encryption and archive strategy, you should ensure that critical data is replicated and vaulted off-site (see Example Scenarios for Recovering Data). If at least one KMA remains operational, you can recover a single KMA without impacting the rest of the cluster.

The following sections address scenarios that require recovery of a single KMA.

KMA Recovery Following a Software Upgrade

Software upgrades do not require a repair or a recovery, however sometimes the KMA will be out of service as the upgrade takes place. The cluster allows the upgrade to occur without interrupting the active encryption agents. You can download the new software concurrently on all KMAs in the cluster, however activating the new software requires the KMA to reboot. Therefore to prevent an interruption, you should stagger rebooting the KMAs in the cluster so that at least one KMA is always active. As each KMA returns to an online status, any database updates done while the KMA was offline will be replicated and all KMAs in the cluster will re-synchronize.

KMA Recovery Following a Network Disconnection

When a KMA disconnects from the management network, such as when activating new software, the remaining KMAs in the cluster attempt to contact it and report communication errors in the audit event log. Agents continue to communicate with other KMAs across the network. Usually these are other KMAs attached to the same service network. However, because agents may be attached to the management network, they first attempt to work with the KMAs in their own configured site; but if need be, they will contact any reachable KMAs within the cluster. When the KMA reconnects to the network, any database updates done while the KMA was disconnected will be replicated and all KMAs in the cluster re-synchronize.

KMA Recovery Following a Hardware Failure

If a hardware failure occurs, you should first delete the KMA from the cluster so that the remaining KMAs stop attempting to communicate with it. If the KMA console is still accessible, you can reset the KMA. The reset operation returns the unit to its factory defaults. This operation offers the option to scrub the server's hard disk as an extra security precaution. Disposition of the failed server is handled by the customer. Oracle service representative can repair and add a KMA server to the cluster as described in the Oracle Key Manager 3 Installation and Service Manual, PN E48395-xx. Once added the cluster, the database replicates, KMAs in the cluster re-synchronize, and the new KMA becomes an active member of the cluster.