For a detailed description of farm operation failure scenarios, refer to Troubleshooting Problems with Farm Operations in N1 Provisioning Server 3.1, Blades Edition, System Administration Guide.
When a farm operation succeeds:
The Control Center shows a completed status in the Message section of the Farm Request dialog of the Administration screen.
The farm –l farm_ID command shows an ERROR of 0 and the farm state will reflect the desired state for that operation.
When a farm operation fails:
The Control Center shows a failed status in the Message section of the Farm Request dialog of the Administration screen.
The farm ERROR is a nonzero number (other than 1000) and the farm is not in the desired state. An ERROR of 1000 is not an error; it means that a farm operation is in progress.
Run the farm -Lt farm_ID command to extract messages related to the specified farm from the log files.
If the farm has been assigned to an SP (as shown by the farm –l farm_ID command), look at the /var/adm/messages file and the /var/adm/tspr.debug file on the owning SP for any error messages for the farm.
Check the /var/adm/messages file and the /var/adm/tspr.debug on the SP running the Master Segment Manager for any critical error messages for the farm.
The following example shows how a message appears in the log:
Oct 30 00:16:47 sp4 java[506]: [ID 289794 user.info] TSPR [sev=okay] [apps=770034] TCPEventHandler:dispatch...
See Chapter 6, Error Messages in N1 Provisioning Server 3.1, Blades Edition, System Administration Guide.
Use the following tools to help pinpoint the problem:
Monitor the farm activation process through the Control Center Farm Requests dialog of the Administration screen. During the activation process, a message reports when a device is added successfully to the farm. See if you can identify a device that failed.
Use the terminal server, or the serial port of the device if the terminal server is not available, as a console to connect to a specific device and obtain diagnostic information. Until the farm device is activated, the only way to connect to the device is through the console connection.
After you have determined the cause of the error and you have taken any necessary actions, that is, replaced a failed server, freed farm resources, resolved networking issues, and so forth, you can re-run the farm operation. Use the -f option to clear the error. For example, if a farm activation failed, you can run the farm –af farmid command.
Inadequate Resources
If you have determined that the cause of the error is inadequate resources, and you cannot free resources to fix this problem, you can do the following steps:
Run the farm -pf farm_ID command to clear the error state. This command clears the internal state. However, this change is not reflected in the Control Center.
Open the farm in the Control Center Editor, and select the last “good” farm configuration from Farm Details on the left-hand side of the screen.
Make any changes necessary to this version of the farm in the Editor and click Commit.
Abandon Request and Start Over
You might decide to abandon the farm and deactivate it by using the farm –df farm_ID command. This command clears the farm resources and brings the farm to the deactivated state. You can then delete the farm using the farm –D farmid command. You may then save the farm under a different name, by using the Save As option in the File menu. The saved farm may then be activated.
The Control Center reflects the current farm status because it is automatically synchronized with the control plane.