N1 Provisioning Server 3.1, Blades Edition, Troubleshooting Guide

Farm Activation and Transition Issues and Solutions

The following section gives information about debugging farm activation failures. If a farm fails to complete a request successfully, an error state will be set on the farm. You can determine the error state by running the command /opt/terraspring/sbin/farm -l. The second-to-last column in the command output indicates the error state of the farm.

Two error states are set for informational purposes only:

You might encounter the problems below during farm activation or when a farm transitions from one state to another. To reissue a farm request to a farm in an error state, add the -f option to the farm command, for example farm -af farm-id. This option will cause the farm error to be cleared and the farm request will be processed.

Problem:

When you run the /opt/terraspring/sbin/farm -l command, the farm state displays as NEW/NEW_CONFIG/20. This value indicates that your farm could not be allocated.

Solution:

Farm allocation fails if the farm requires more resources of a particular type than are available. To verify the resources that are preventing successful allocation of your farm, run the following command: /opt/terraspring/sbin/rsck farm-id. Retry farm activation after rsck reports that enough resources are available to allocate your farm.

Problem:

During the dispatch phase of activation, all devices in the farm are powered on for the final boot. If the farm fails to dispatch, an error message appears in the Control Center window. Details about the error are in the debug log file /var/adm/tspr.debug.

Solution:

One of the following two solutions might apply:

Problem:

Farm update failure. If the farm fails to update, an error message appears in the Control Center window. Details about the error are in the debug log file /var/adm/tspr.debug.

Solution:

The farm update procedure is very similar to the activation procedure. The farm will first transition from the ACTIVE state into the UPDATE state from which it transitions through the WIRED and DISPATCHED states back to the ACTIVE state. During the transition to the UPDATE state, the farm attempts to allocate newly requested resources. Failures during this transition should be debugged in the same manner as allocation problems are debugged. Once a farm has reached UPDATE state, it will transition through the WIRED, DISPATCHED, and ACTIVE states. Refer to the two preceding problems to debug failures.

Problem:

In farm STANDBY state, scrubbing of removed disks fails.

Solution:

Setting up a device for scrubbing follows the same procedure as preparing a device for a disk copy. Refer to the third problem in Farm Wiring Issues and Solutions to debug this problem.

Problem:

In farm STANDBY state, you are unable to move a device into a VLAN.

Solution:

Refer to the first problem in Farm Wiring Issues and Solutions to debug this problem.

Problem:

In farm STANDBY state, you are unable to change the power state of a device.

Solution:

Refer to the debugging method discussed in the second problem in Farm Wiring Issues and Solutions.

Problem:

Snapshot of disk images fails. The completion of the snapshot request leaves the farm in an error state.

Solution:

A disk snapshot is prepared in a similar way as disk copies to a server disk are set up. Follow the instructions in the third problem in Farm Wiring Issues and Solutions to debug this problem.

In addition, the snapshot process attempts to reserve disk space for the snapshot image on the image server prior to taking the snapshot image. If the image server is nearly filled up to its capacity, the snapshot process might fail. In this case, remove old images from the server or back these images up to a separate storage device. Use the image command to delete old images: /opt/terraspring/sbin/image -d image-id.

Problem:

Farm deactivation failed. The completion of the deactivation request leaves the farm in an error state.

Solution:

Farm deactivation goes through the same motions as moving a farm into STANDBY with the exception that no snapshot images are created for any removed devices. Follow the farm STANDBY debugging advice when troubleshooting farms that failed to deactivate.

Problem:

A replacement device could not be allocated for device failover support. The completion of the “replace failed device” request leaves the farm in an error state.

Solution:

To replace a failed device with a backup device, a backup device must be available. If there are no more devices available in the free pool, replacing a failed device will fail due to allocation problems. Use the following command to verify that a replacement device is available: /opt/terraspring/sbin/device -LFr device-id.

Problem:

The replacement device could not be provisioned for device failover support. The completion of the “replace failed device” request leaves the farm in an error state.

Solution:

A replacement device will be provisioned in the same way newly allocated devices are provisioned during initial farm activation and farm updates. Refer to Farm Wiring Issues and Solutions to debug these problems.

Problem:

When executing the command /opt/terraspring/sbin/request -lf farm-id, requests show as queued but do not get processed.

Solution:

Verify the following items:

Problem:

You are unable to deactivate a farm that contains a faulty blade. Both farm deactivation and replace device requests fail.

Solution:

Type the following command to mark the faulty blade as FAILED: device -sB blade-id. Then, re-issue the deactivation request.