Skip Headers
StorageTek Automated Cartridge System Library Software High Availability 8.3 Cluster Installation, Configuration, and Operation
Release 8.3
E51939-02
  Go To Table Of Contents
Contents
Go To Index
Index

Previous
Previous
 
Next
Next
 

9 ACSLS Cluster Operation

Solaris Cluster is designed to achieve automatic system recovery under severe failure scenarios by transferring operational control from one server node to the next. But most failures in a Solaris system do not require full system switch-over action to recover.

All of these scenarios are handled quickly and automatically without the involvement of Solaris Cluster. But if any other severe fault should impact ACSLS operation on the active server node, ACSLS HA instructs Solaris Cluster to switch control over to the alternate node.

Once it is started, ACSLS HA probes the system once every minute, watching for any of the following events to occur:

Any of these events triggers a Cluster fail over. Solaris Cluster also knows to fail over if any fatal system conditions on the active server node occurs.

Starting Cluster Control of ACSLS

To activate Cluster failover control:

# cd /opt/ACSLSHA/util
# ./start_acslsha.sh -h <logical hostname> -g <IPMP group> -z acslspool

This action initiates Cluster control of ACSLS. Solaris Cluster monitors the system, probing once each minute to verify the health of ACSLS specifically and the Solaris system in general. Any condition that is deemed fatal initiates an action on the alternate node.

To check cluster status of the ACSLS resource group:

# clrg status

The display will:

  • Reveal the status of each node.

  • Identify which node is the active node.

  • Reveal whether failover action is suspended.

ACSLS Operation and Maintenance Under Cluster Control

Once cluster control has been activated, you can operate ACSLS in normal fashion. You can start and stop ACSLS using the standard acsss control utility. Under cluster control, a user starts and stops ACSLS services in the same fashion as they would start and stop the application on a stand-alone ACSLS server. Operation is administered with these standard acsss commands:

acsss enable
acsss disable
acsss db

Manually starting or stopping acsss services with these commands in no way causes Solaris Cluster to intervene with failover action. Nor will the use of the Solaris SMF commands (such as svcadm) cause Cluster to intervene. Whenever acsss services are aborted or interrupted, it is SMF, not Cluster, that is primarily responsible for rebooting these services.

Solaris Cluster only intervenes to restore control on the adjacent node under the following circumstances:

  • Lost communication with the ACSLS filesystem

  • Lost communication with all redundant public Ethernet ports

  • Lost and unrecoverable communication with a specified library

Suspending Cluster Control

If you suspect that your maintenance activity might trigger an unwanted cluster failover event, you can suspend cluster control of the acsls resource group.

To suspend Cluster control:

# clrg suspend acsls-rg

While the resource group is suspended, Solaris Cluster makes no attempt to switch control to the adjacent node, no matter what conditions might otherwise trigger such action.

This enable you to make more invasive repairs to the system, even while library production may be in full operation.

If the active node happens to reboot while in suspended mode, it will not mount the acslspool after the reboot, and ACSLS operation will be halted. To clear this condition, you should resume Cluster control.

To resume Cluster control:

# clrg resume acsls-rg

If the shared disk resource is mounted to the current node, then normal operation resumes. But if Solaris Cluster discovers upon activation that the zpool is not mounted. it immediately switches control to the adjacent node. If the adjacent node is not accessible, then control switches back to the current node and Cluster attempts to mount the acslspool and start ACSLS services on this node.

Powering Down the ACSLS HA Cluster

The following procedure provides for a safe power-down sequence if it is necessary to power down the ACSLS HA System.

  1. Determine the active node in the cluster.

    # clrg status
    

    Look for the online node.

  2. Log in as root to the active node and halt Solaris Cluster control of the ACSLS resource group.

    # clrg suspend acsls-rg
    
  3. Switch to user acsss and shutdown the acsss services:

    # su - acsss
    $ acsss shutdown
    
  4. Log out as acsss and gracefully power down the node.

    $ exit
    # init 5
    
  5. Log in to the alternate node and power it down with init 5.

  6. Power down the shared disk array using the physical power switch.

Powering Up a Suspended ACSLS Cluster System

To restore ACSLS operation on the node that was active before a controlled shutdown, use the following procedure

  1. Power on both nodes locally using the physical power switch or remotely using the Sun Integrated Lights Out Manager.

  2. Power on the shared disk array

  3. Log in to either node as root.

  4. If you attempt to login as acsss or to list the $ACS_HOME directory, you find that the shared disk resource is not mounted to either node. To resume cluster monitoring, run the following command:

    # clrg resume acsls-rg
    

    With this action, Solaris Cluster mounts the shared disk to the node that was active when you brought the system down. This action should also automatically reboot the acsss services and normal operation should resume.

Creating a Single Node Cluster

There may be occasions where ACSLS must continue operation from a standalone server environment on one node while the other node is being serviced. This would apply in situations of hardware maintenance, an operating system upgrade, or an upgrade to Solaris Cluster.

Use the following procedures to create a standalone ACSLS server.

  1. Reboot the desired node in a non-cluster mode.

    # reboot -- -x
    

    To boot into non-cluster mode from the Open Boot Prom (OBP) on SPARC servers:

    ok: boot -x
    

    On X86 Servers, it is necessary to edit the GRUB boot menu.

    1. Power on the system.

    2. When the GRUB boot menu appears, press e (edit).

    3. From the submenu, using the arrow keys, select kernel /platform/i86pc/multiboot. When this is selected, press e.

    4. In the edit mode, add -x to the multiboot option kernel /platform/i86pc/multiboot -x and click return.

    5. With the multiboot -x option selected, press b to boot with that option.

  2. Once the boot cycle is complete, log in as root and import the ACSLS Z-pool.

    # zpool import acslspool
    

    Use the -f (force) option if necessary when the disk resource remains tied to another node.

    # zpool import -f acslspool
    
  3. Bring up the acsss services.

    # su - acsss
    $ acsss enable