Sun Cluster Geographic Edition System Administration Guide

Chapter 8 Monitoring and Validating the Sun Cluster Geographic Edition Software

This chapter describes the files and tools that you can use to monitor and validate the Sun Cluster Geographic Edition software.

This chapter contains the following sections:

Monitoring the Runtime Status of the Sun Cluster Geographic Edition Software

You can display the runtime status of the local Sun Cluster Geographic Edition enabled cluster by using the geoadm status command. When you run this command, it displays output that is organized in the following sections:

You must be assigned the Basic Solaris User RBAC rights profile to run the geoadm status command. For more information about RBAC, see Sun Cluster Geographic Edition Software and RBAC.

For example, an administrator runs the geoadm status command on cluster-paris and the following information is displayed:


phys-paris-1# geoadm status

Cluster: cluster-paris

Partnership "paris-newyork-ps": OK
   Partner clusters    : cluster-newyork
   Synchronization     : OK
   ICRM Connection     : OK

   Heartbeat "paris-to-newyork" monitoring "cluster-newyork": OK
      Heartbeat plug-in "ping_plugin"    : Inactive
      Heartbeat plug-in "tcp_udp_plugin" : OK

Protection group "tcpg"     : OK
   Partnership              : "paris-newyork-ps"
   Synchronization          : OK

   Cluster cluster-paris    : OK
   Role                     : Primary
   PG activation state      : Activated
   Configuration            : OK
   Data replication         : OK
   Resource groups          : OK

  Cluster cluster-newyork   : OK
     Role                   : Secondary
     PG activation state    : Activated
     Configuration          : OK
     Data replication       : OK
     Resource groups        : OK

Pending Operations
Protection Group     : "tcpg"
Operation            : start
        

The information displayed shows that the protection group, tcpg, is activated on both the primary cluster, cluster-paris, and the secondary cluster, cluster-newyork. Data is replicating between the partner clusters and both partners are synchronized.

The following table describes the meaning of the status values.

Table 8–1 Status Value Descriptions

Field 

Value Descriptions 

Partnership

OK – The partners are connected.

Error – The connection between the partner clusters is lost.

Degraded – The partnership has been successfully created but a connection with the partner cluster has not yet been established. This status value occurs when the partnership has been created and the partner cluster has not been configured.

Synchronization

OK – The configuration information is synchronized between partner clusters.

Error – The configuration information differs between the partner clusters. You need to resynchronize the partnership for a partnership synchronization error, or resynchronize the protection group, for a protection group synchronization error.

For information about resynchronizing a partnership, see Resynchronizing a Partnership.

For information about resynchronizing a protection group, see one of the following data replication guides:  

Mismatch – Configuration information has been created separately on the clusters. The configuration information must be replaced by a copy of the configuration information from the partner cluster. You can synchronize the protection group configuration by using the geopg get command.

Unknown – Information is not accessible because the partners are disconnected or because some components of the protection group cannot be reached.

ICRM Connection

OK – The Intercluster Resource Management (ICRM) module is running properly.

Error – The ICRM module on the local cluster is unable to communicate with the ICRM module on the remote cluster.

Heartbeat

OK – Heartbeat checks are running and the partner cluster responds within the specified timeout and retry periods.

Offline – Heartbeat checks are not running.

Error – Heartbeat checks are running but the partner is not responding and retries have timed out.

Degraded – Heartbeat checks are running but one of the primary plug-ins is degraded or not running.

Heartbeat plug-in

OK – Responses are being received from the partner.

Inactive – Plug-in is not in use but is a standby for retrying to contact the partner if the other plug-ins obtain no response.

No-Response – Partner cluster is not responding.

Protection group

(overall protection group state) 

OK – The synchronization state is OK and the state of the protection group on each cluster is OK.

Degraded – The synchronization state is OK. The state of the protection group is Degraded on either one or both clusters in the partnership.

Unknown – The synchronization state or the state of the protection group on one or both clusters is unavailable. The protection group can be online or offline.

Error – The synchronization state or the state of the protection group on one or both clusters is in Error. The protection group can be online or offline.

Protection group > Cluster

(state of protection group on each cluster) 

OK – The state of all the protection group components, such as configuration data, data replication, or resource groups, is OK, NONE, or N/A on the cluster.

Degraded – The state of one or more of the protection group components is in the Degraded state on the cluster.

Unknown – The state of some components of the protection group, such as configuration data, data replication, or resource groups, is unavailable.

Error – The state of some components of the protection group, such as configuration data, data replication, or resource groups, is in Error.

Protection group > Cluster > Role

Primary – The cluster is the Primary for this protection group.

Secondary – The cluster is the Secondary for this protection group.

Unknown – Information is not accessible because the partners are disconnected or because some components of the protection group cannot be reached.

Protection group > Cluster > PG activation state

Activated – The protection group is activated.

Deactivated – The protection group is deactivated.

Unknown – Information is not accessible because the partners are disconnected or because some components of the protection group cannot be reached.

Protection group > Cluster > Configuration

OK – Protection group configuration has been validated without errors on the cluster.

Error – Protection group configuration validation resulted in errors on the cluster. You need to revalidate the protection group. For information about validating a protection group, see one of the following data replication guides:

Unknown – Information is not accessible because the partners are disconnected or because some components of the protection group cannot be reached.

Protection group > Cluster > Data replication

None – Data replication is not configured.

OK – Data replication is running and data is synchronized with the partner cluster when the protection group is activated. Replication is suspended when the protection group is deactivated. This state represents data replication on this cluster and does not reflect the overall state of data replication. This state is mapped from the corresponding state in the data replication subsystem.

Degraded – Data is not replicated and not synchronized with the partner cluster when the protection group is activated. New writes will succeed but not be replicated. This state represents data replication on this cluster and does not reflect the overall state of data replication. This state is mapped from the corresponding state in the data replication subsystem.

Error – Data replication from the primary cluster to the secondary cluster is in error if the data replication subsystem reports an error or if data replication is not suspended when the protection group is deactivated. This state represents data replication on this cluster and does not reflect the overall state of data replication. This state is mapped from the corresponding state in the data replication subsystem.

Unknown – Information is not accessible because the partners are disconnected or because some components of the protection group cannot be reached.

N/A – The data replication state of the protection group could not be mapped. Data replication is in a valid state on its own but in an Error state for the protection group. This state is available only if you are using Sun StorageTek Availability Suite data replication.

Protection group > Cluster > Resource groups

None – No resource group is protected by this protection group.

OK – If the cluster has the Primary role, all resource groups are online when the protection group is activated or unmanaged when the protection group is deactivated. If the cluster has the Secondary role, all resource groups are unmanaged.

Error – If the cluster has the Primary role, not all resource groups are online when the protection group is activated or unmanaged when the protection group is deactivated. If the cluster has the Secondary role, not all resource groups are unmanaged.

Unknown – Information is not accessible because the partners are disconnected or because some components of the protection group cannot be reached.

For more specific information about checking the runtime status of replication, see one of the following data replication guides:

Viewing the Sun Cluster Geographic Edition Log Messages

All the Sun Cluster Geographic Edition components produce messages that are stored in log files.

Information about the loading, running, and stopping Sun Cluster Geographic Edition components in the common agent container is recorded in the following log files. The most recently logged messages are in file 0, then 1, and 2.

System log messages are stored in the /var/adm/messages log file.

Each cluster node keeps separate copies of the previous log files. The combined log files on all cluster nodes form a complete snapshot of the currently logged information. The log messages of the Sun Cluster Geographic Edition modules are updated on the node where the Sun Cluster Geographic Edition software is currently active. The data replication control-log messages are updated on the node where the data replication resource is currently Online.

Displaying Configuration Information for Partnerships and Protection Groups

You can display the current local cluster partnership configuration, including a list of all partnerships that are defined between the local cluster and remote clusters.

You can also display the current configuration of a specific protection group or of all the protection groups that are defined on a cluster.

ProcedureHow to Display Configuration Information About Partnerships

  1. Log in to a cluster node.

    You must be assigned the Basic Solaris User RBAC rights profile to complete this procedure. For more information about RBAC, see Sun Cluster Geographic Edition Software and RBAC.

  2. Display information about the partnership.


    # geops list partnershipname
    
    partnershipname

    Specifies the name of the partnership. If you do not specify a partnership, then the geops list command displays information on all partnerships.

    For information about the names and values that are supported by Sun Cluster Geographic Edition software, see Appendix B, Legal Names and Values of Sun Cluster Geographic Edition Entities.


Example 8–1 Displaying Partnership Configuration Information

This example displays configuration information about the partnership between local cluster-paris and remote cluster-newyork.


# geops list paris-newyork-ps

ProcedureHow to Display Configuration Information About Protection Groups

  1. Log in to a cluster node.

    You must be assigned the Basic Solaris User RBAC rights profile to complete this procedure. For more information about RBAC, see Sun Cluster Geographic Edition Software and RBAC.

  2. Display information about a protection group.


    # geopg list [protectiongroupname]
    
    protectiongroupname

    Specifies the name of a protection group.

    If you do not specify a protection group, then the command lists information about all the protection groups that are configured on your system.


Example 8–2 Displaying Configuration Information About a Protection Group

This example displays configuration information for avspg, which is configured on cluster-paris.


# geopg list avspg