As with any complex system, when farms transition from state to state, errors can occur. You must be able to remedy these errors quickly. Use the following general strategy to resolve an error state:
Determine that the farm request failed.
Diagnose the problem by determining the error state.
Fix the problem, for example, replace a failed server, free farm resources, resolve networking issue, and so forth. Then run the farm -af command to activate the farm.
Alternatively, you can bypass the problem, for example, delete the request and return to the prior condition of the farm or delete the farm and start over.
Every device in a logical server farm is continuously monitored for availability. The monitoring facility alerts in case of a device failure. The N1 Provisioning Server software automatically brings up another identically configured physical device to replace the failed device. In these cases, failover is expected behavior and no error message is generated.
Most error states can be diagnosed and resolved by the administrator. However, in some rare cases, error states must be resolved by a Sun Service provider.
At a high level, types of failures include resource layer device failure, that is, device and networking failures, configuration errors, or not enough resources available, software configuration errors, and software error/control plane error. The following list describes potential failure points in farm activation:
The action cannot be completed because there are not enough free resources
Provisionable equipment servers (PES) configuration issues
Network problems
Wiring problems
Other points of failure exist. Given the variety of devices and systems involved, there are a number of failure points to investigate. However, you know you have a problem if the following situations occur:
The Control Center shows a failed status in the Message section of the Farm Request dialog of the Administration screen
The Control Center shows a failed request in the Farm Details section of the Main and Editor screens.
When you run the farm -l farm_ID command, the farm ERROR is a nonzero number, other than 1000, and the farm is not in the desired state.