The following section provides information about issues that you might encounter when you are validating the installation of your N1 Provisioning Server 3.1, Blades Edition software.
Problem:Resource pool server did not pass the final validation test. The installation log file (/var/opt/terraspring/install/run/install.log) contains the specific error message. The installer displays the following message, along with a series of options from which you can choose:
Installation may have failed due to incorrect user input or some other correctable error. |
During the final validation test, the installer attempts to boot all available resource pool server devices. During this process, the resource pool server will boot on the image provisioning network. This process relies on correctly configured data and control layer switches as well as proper configuration of the Boot Loader Image, DHCP, and BIND.
If a resource pool server fails the validation test, you should verify the following items:
Verify that the SNMP public community string on the chassis switch matches the admin password.
Verify that VLAN 8 is defined on all chassis switches and the data layer switch. Also ensure that VLAN 8 is added to the list of allowed VLANs on all trunked ports.
Verify that the Boot Loader Image has been set up properly.
Use the following command to verify that the Provisioning Server has an interface configured on the image VLAN (skge800 when using Syskonnect; ce8000 when using GigaSwift).
/sbin/ifconfig -a |
Use the following command to verify that traffic on the image subnet interface on the Provisioning Server is not blocked by IPF.
/usr/sbin/ipfstat -io |
To further debug this problem, you might need to look at the traffic on the image subnet interface and monitor the resource pool server device during boot up on the console port. Use the snoop utility.
Problem:During installation, the final validation test (pestest) is taking too long to run, and you do not want to wait for it to finish.
Solution:You can stop the test by pressing Control-C or by sending a kill signal. Stopping the test will not harm anything. However, the blades not have been validated for farm use. If any of the blades had a problem, farm activation would fail later. You should let the test finish to detect any problems with the blades, such as defective hardware.
If you stop the test, the pestest tool restores the state of each blade to the state that the blade was in before the pestest was run. For example, a blade might have an initial state of FREE. During the validation test, the blade might acquire the USED state. However, if you were to kill or cancel the test before the test finishes, pestest would set the blade back to FREE before exiting.
Problem:You think that the validation test (pestest) failed, but are not certain because you do not recognize the failure message.
Solution:When the validation test fails, messages similar to the following are displayed on your screen:
50306: test FAILED: Reason was: - Cannot save state information for 50306: Blade S6 seems to be faulty 50111: test FAILED: Reason was: - PES 50111 did not become active in 120 seconds Warning: 1 Blade(s) timed out and did not complete the test. Some Blades (1) in your I-Fabric have failed the validation test and are not usable by the N1 Provisioning Server. This may probably be due to some configuration issues in the I-Fabric. Please diagnose and fix the problem before using these Blades. |
During the validation test (pestest), the following message is displayed for some blades:
device-id: test FAILED: Reason was: - Cannot save state information for device-id: Blade Sn seems to be faulty |
This blade is defective and needs to be replaced as soon as possible. Follow these steps:
Type the following command to see the properties of the blade:
# /opt/terraspring/sbin/device -l device-id |
Where device-id is the device id shown in the error message.
Examine the FARM_ID column.
If the FARM_ID column does not contain a hyphen (-), the blade is part of a farm.
If the blade is part of a farm, type the following command to replace the failed blade in the farm with another blade that has similar attributes:
# /opt/terraspring/sbin/replacedevice farm-id failed-device-id |
To find the shelf ID and IP address for this blade, use the console command.
# /opt/terraspring/sbin/console failed-device-id |
In the following example, s2 is the shelf ID and 10.5.141.50 is the IP address.
# console 50102 Console Information ==================== IP address of Terminal-Server(Service Controller): 10.5.141.50 Port(Blade) ID: s2 # |
Telnet to the shelf and type the following command to inform the shelf controller that the blade is to be prepared for removal:
# replace fru Sn |
Where Sn is the shelf ID from the previous step.
In response to this command, a blue LED will light on the blade that is to be removed.
Approach the blade shelf front panel and remove the defective blade that has a blue LED.
Insert a good blade into the blade shelf to replace the defective blade.
To detect the new blade and update the information in the database, type the following command:
# /opt/terraspring/sbin/shelfsync |
To retest the blades, type the following command:
# /opt/terraspring/sbin/pestest |
If you do not replace a defective blade, you must mark that blade as FAILED. Otherwise, later farm activation could fail if that defective blade is used in the farm. Use the following command: /opt/terraspring/sbin/device -sB device-id
During the validation test (pestest), the following message is displayed for some blades:
device-id: test FAILED: Reason was: - PES device-id did not become active in 120 seconds |
This generic message indicates that the blade is unable to boot in the time allowed. If only some of the blades have failed with this message, the cause could be defective hardware, network congestion, or network interference. Try to retest specific blades using the /opt/terraspring/sbin/pestest -d device-id command. If after several retests, those blades still fail with the same message, the cause is likely to be defective hardware or network interference from another machine on the network. Before proceeding to create farms, you must either take appropriate steps to troubleshoot and fix the problem, or mark these blades with the FAILED status so that they will not be used in a farm. To mark the blades as FAILED, use the following command: /opt/terraspring/sbin/device -sB device-id
Problem:During the validation test (pestest), the following message is displayed for all blades:
###: test FAILED: Reason was: - PES ### did not become active in 120 seconds |
If all blades fail the validation test, pestest will print a generic message. This message tells you to diagnose and fix the problem before the blades can be used. This issue has three likely causes and solutions:
The network switch on the data plane is not configured properly. Follow these steps to check the configuration:
Verify that all the switch ports that connect the data plane switch to the blade shelves are set to “trunk.”
Verify that the image VLAN is created on VLAN 8.
Verify that the switch port that connects the N1 Provisioning Server machine to the data plane switch is set to VLAN 8.
For more information about configuring the switches, see Chapter 3, N1 Provisioning Server System and Network Preparation in N1 Provisioning Server 3.1, Blades Edition, Installation Guide.
A network problem prevents the blades from communicating with the N1 Provisioning Server machine. To resolve this issue, verify that no other server on the same network is configured as a DHCP server. Another DHCP server could be sending NACK to the blades, preventing them from properly acquiring an IP address from the N1 Provisioning Server machine.
Hardware connections are loose or otherwise not correct. Verify that all cables are properly connected.