N1 Provisioning Server 3.1, Blades Edition, System Administration Guide

Troubleshooting Farm Operations

The following table describes N1 Provisioning Server troubleshooting issues related to farm operations. This list is not inclusive. The message log file indicates whether your problem relates to farm operations.

Table 7–2 Troubleshooting Farm Operations


Problem	Possible Cause	Action
Configuration exception	Invalid Farm Markup Language (FML) for the farm.	Contact http://sun.com/service/contacting/index.html for assistance.
Dynamic Host Configuration Protocol (DHCP) exception	Cannot create DHCP configuration for the farm.	This problem can occur if the host name of the control plane server cannot be determined. Make sure the server has the proper network configuration.
Domain Name Service (DNS) exception	Cannot create the DNS configuration for the farm.	This problem can occur if the host name of the control plane server cannot be determined. Make sure the server has the proper network configuration. This problem also can occur if the database connectivity is lost. To verify that the control plane database (CPDB) is running and is accepting connections, run any command, such as `farm -l`. If no proper output displays, the CPDB is not running.
`NoMoreResources` exception	Not enough resources are available to allocate the farm.	Check the message to see which resource is exhausted. Use one of the following methods to provision more resources Add resources and add their information to the database Free up resources by deactivating existing farms Flex down a farm to free up the required devices. Restart the farm operation.
I/O exception	An I/O error occurred while performing a task such as manipulating disks, configuring monitoring, or accessing files.	Depending on the specific place where the exception is thrown, take appropriate action that should be part of the message with the exception, such as checking network connections.
SQL exception	A database error occurred.	This problem can occur if the database connectivity is lost. To verify that the CPDB is running and is accepting connections, run any command, such as `farm -l`. If no proper output displays, the CPDB is not running. Report all other database errors to http://sun.com/service/contacting/index.html.
Exceptions other than SQL, such as`IllegalState`, `IllegalArgument`, `IllegalAccess`, and so on	Internal error.	Contact http://sun.com/service/contacting/index.html for assistance.
Farm activation fail.	Check the log file for any critical errors that might point to the cause.	Depending on what type of error message was received, take appropriate action to activate the farm.
Farm activation fails during allocation because not enough subnets are available	The subnet size might be too large.	You have the option to choose the size of the external subnet for the farm. If you select a size that is not currently in the database, you need to add a subnet by running the `subnet` command.
Farm deployment fails because `named` could not restart	The JVM^TM software caches the configuration of the `nsswitch.conf` file, which describes which database to use for host lookups. If DNS was not part of the `nsswitch.conf` file's host entry at the time the segment manager was started, all host lookups that cannot be resolved using the `/etc/hosts` file will fail. See the `tspr.debug` log for a detailed message describing the error.	Ensure that the entry for hosts lookup in the `/etc/nsswitch.conf` file reads as follows: `hosts: files dns` Restart the segment manager by running the command `/opt/terraspring/sbin/sm -start` Reactivate the farm.
Server is booting, but is unable to get its IP address through DHCP.	The DHCP daemon might not be running or the media access control (MAC) address of the server is incorrect.	Ensure that the details listed in the database are correct for that server. Run the `device -lv device-ID` command and check the MAC addresses and switch ports. Verify that the DHCP daemon is running and is answering requests by first running `ps -ef \|grep dhcp`, then look in the `tspr.debug` file to see if there are DHCP messages logged. Connect to the switch and ensure that those ports are connected. Depending on the switch, run either the `sh cam dyn port` or `sh.mac dyn` command to ensure that the correct MAC address of the server appears. Check that the Ethernet interface on the control plane server appears as connected on the switch and that it is running as an 802.1q trunk port, with a native virtual local area network (VLAN) of `1`.
The control plane server cannot create the DHCP configuration for the farm and receives a message indicating an unknown host.	The network configuration of the control plane server might be incorrect.	Check the network configuration of the control plane server. Check database connectivity and the file system that contains the DNS configuration.
The control plane server cannot create the DNS configuration for the farm and receives a message indicating an unknown host.	The network configuration of the control plane server might be incorrect.	Check the network configuration of the control plane server. Check database connectivity and the file system that contains the DNS configuration.
The Control Center does not display a farm correctly after it has been updated.	Two possible causes: The update request has not yet completed in the CPDB. This may take a few minutes. The update request has failed.	Reconfigure the farm in the Editor dialog and resubmit the update request. Reissue the the `farm -af farm ID` command from the command line. Then submit another farm update request from the Control Center Editor dialog.
Replace device requests are generated intermittently even though the devices are running and able to `ping` successfully.	The Ethernet port speeds might not be set to the correct value.	Ensure that the Ethernet port speeds and duplex setting are set to the same values on all sides (control plane server, switch, and device).

Farm Error Status Codes

Every farm has an error status code that is associated with the farm to indicate whether the farm is currently in an abnormal state.

The error status code of 0 represents a health state.
The error status code of 1000 means that the farm manager is processing a request.
An error status code other than 0 or 1000 means that the farm has an error.

During the request process, the farm's internal state changes whenever the transition process is completed with a success. If the farm fails to transition from one internal state to another internal state, the farm's internal state is not changed and the farm is set with an error status. The value of the error status code is the failed internal state value.

For example, if the farm failed the transition from state ALLOCATED (20) to state WIRED (30), the farm is still left with the internal state ALLOCATED (20) and the farm error status code is set to 30 to represent the failed state WIRED. Just before the code is set to 30, it is 1000 to indicate that the request is in progress.

Whenever a farm error occurs, a critical error message is generated in the system log file /var/adm/messages.

Caution –

The farm manager will not process any further farm requests until the error condition is changed and the error status code is cleared to 0.