This chapter describes known issues related to running the N1 Provisioning Server 3.1, Blades Edition product.
This section describes known issues related to the N1 Provisioning Server Control Center.
In certain cases, when you navigate from the Administration screen to the Editor screen and the session times out, you might not be able to log into the Control Center.
Workaround: Close the browser, open another browser, and log in again.
If you are in the Administration screen and the session times out, when you log in again, the window might display incorrectly. If this happens, close the browser and log in again.
To vary the session time-out, modify the time-out setting in the /opt/terraspring/gw/war/WEB-INF/web.xml file. Look for the session-timeout tag. The value is specified in minutes.
If you click the Find button in the Find Farm dialog box after the Find Farm session has timed out, the Find Farm window displays in full.
Workaround: Increase the time-out value. To do so, modify the time-out setting in the web.xml file in the /opt/terraspring/gw/war/WEB-INF directory. Look for the session-timeout tag. The value is specified in minutes.
When a farm update fails, changes do not always roll back to the last good state of the farm.
Workaround: The workaround for the problem is as follows:
To clear the error state, contact the farm using the farm -pf farm-ID command.
View the farm in the farm editor.
From the farm editor, select the latest farm update request from the Request History panel on the left. The correct farm information will display.
Choose Commit from the File menu to resubmit the farm update. A newly updated request is issued.
Unblock the new request. The correct farm information will display in the farm editor after the request is completed.
Do not bookmark any page except the welcome page.
In some dialog pages, for example, Select: Disk Image, the application needs to download information from the server. If you click any buttons on the page before that download is complete, a script error occurs.
Workaround: Wait until the page has finished downloading all information from the server before taking any actions.
When you create a new disk by flexing a server group, the snapshot button is not disabled as it should be. When you click that button, an error message appears.
When you delete snapshots and images from the Control Center, they are only marked as deleted. They are not yet deleted from the I-Fabric. Until you delete them from the I-Fabric, you will not be able to create snapshots and images with the same names as the ones marked as deleted.
To purge snapshots and images from the I-Fabric, issue the image -lR command from the control plane server to view a list of the images marked as deleted. Then issue the image -d command to delete them from the I-Fabric. See the image command man page for details.
The bill of materials (BOM) dialog box shows information about devices managed by the N1 Provisioning Server and about devices of a known type that are not managed. The BOM dialog box does not include information about unmanaged devices of an unknown type.
Sometimes when you reload a current farm, the graphics in the navigation bar do not display correctly.
Workaround: Refresh the page.
The Control Center requires MicrosoftTM Internet Explorer version 6.0 web browser with a 128-bit security encryption.
Initially, a device with only a single connection was requested and allocated. Later a second connection to the device was requested, but the request could not be granted because the allocated device only has one connection. No new device is allocated to fulfill this request.
Workaround 1: Always connect both physical interfaces to subnets even if you do not need the two interfaces. Connecting both interfaces guarantees that a device with two connected interfaces is allocated to the farm. The secondary interface can then be reconfigured at a later time once a purpose has been identified.
Workaround 2: When a farm update fails due to the lack of resources after connecting a secondary interface of a server, bring the farm into active state by changing the farm design back to its previous design and resubmitting a farm update request. When the farm is in active state, snapshot the device to which you would like to add a second interface. Once the snapshot is completed, update the farm by removing the original server to which you would like to add a second interface and replace the deleted server with a new server and initialize its disk from the snapshot image. Connect both physical interfaces of the new device to a subnet and submit the farm update request. A new server will be allocated to the farm with two connected interfaces.
Workaround 3: Add a second switch to the chassis that hosts the device requiring the second connection. Update the device using the shelfsync command. Then resubmit the farm update request.
Workaround 4: Instead of using the secondary interface of the server, update the farm using a virtual interface on the primary interface. This workaround will force the bandwidth of the primary interface to be shared across the eth0 and virtual interfaces.
An imported farm can have appropriate types, however, the implied number of connections might be incorrect.
Workaround: When importing a farm design from a .feml file, make sure that the device types match those in the current I-Fabric. In particular, make sure that the number of ports is correct for each device. If the number of ports for a device is not correct, change the device type within the device configuration dialog box before you submit the new design.
Rhe following message appears when trying to unblock a farm request from the Pending request page:
Operation failed... may have been caused by ifabric misconfiguration |
Workaround: Run the request again. After a few tries, it will succeed.
This section describes known issues that are associated with administering the N1 Provisioning Server 3.1, Blades Edition server.
When a farm goes into standby mode, disk copy operations for all servers belonging to that farm start simultaneously. However, when a failure occurs during the first disk copy operation, for example, not enough disk space, the other disk copy operations continue until all are completed, and then the failure is reported.
Workaround: None.
When no more provisionable servers are available in the Control Plane database, you will not be able to move a server from one server group to another in a single farm operation.
Workaround: Remove the server from the first server group and update the farm. Then add the server to the second server group and update the farm.
The server might be provisioned by another farm between removing it from the first server group and adding it to the second. Consequently a “no more resources” exception will occur.
An mls -a command might inaccurately report the agent on a server as being marked DOWN.
Workaround: Wait 60 seconds and try again to confirm the status of the node. The normal monitoring of the node by the control plane server is not affected by this condition, and monitoring will accurately report a failed node.
Typically, after bench configuration, the ports on the shelf are in trunk mode. This configuration prevents you from moving an unmanaged device into a farm VLAN.
Workaround: Change the unmanaged device port from trunk mode to hybrid mode. Then, add the unmanaged device to the VLAN.
Make sure that there are no active requests in the system before stopping the control plane server. If the control plane server fails, existing Farm Manager processes sometimes do not exit gracefully. Before restarting the control plane server, stop all remaining Farm Manager processes by performing the following steps:
Check the existence of any Farm Manager processes by executing the following command:
/usr/ucb/ps -auxwww | grep -i “com.terraspring.cs.fm” |
Use the UNIXTM kill command to stop any remaining Farm Manager processes.
The power command with the -off option is similar to the UNIX command poweroff, which powers off the device on which it is issued. When using the power command with the -off option, be sure to have a blank space between power and -off, otherwise you will power off the control plane server.
Do not independently power on or off provisionable servers in an existing farm.
The gigabit Ethernet card of the Provisioning Server machine must be assigned an instance of 0.
If you change the balancing policy of an active farm (for example, from round-robin to wt-round-robin), the farm goes through the update process. However, after the process is completed, the load balancer configuration still shows the initial policy (for example, round-robin).
Workaround: If you have already defined virtual IPs and you want to change their policies, follow these steps:
Delete the virtual IPs whose policy you want to change, and submit the request.
Change the policy to the newly desired policy.
Re–create the virtual IPs, and submit the request.
If a load balancer is in a chassis that has only one switch (ssc0) installed, the software still expects to configure the second switch (eth1).
Workaround: For this release, load balancing is supported only in dual-switch shelves.
If you try to update the locator URL of an existing snapshot image using the command image -u -l, you see an error message. The error message differs based on whether the database is Oracle or PostgresSQL.
For Oracle, you see the following message:
Locator URL 'nfs://3001//images/master-images/solaris9u5-i86pc-flash' already exists! |
For PostgresSQL, you see the following message:
ERROR: duplicate key violates unique constraint "imglocator_unique" |
Workaround: None. This is a cosmetic bug and will likely be fixed in a later release.
You defined a device with a specific type in an active server farm (for example, an x86 farm with the name “Server1”). You want to change the type (for example, to sparc), but keep the name. If you fail to commit the delete request before you add the new device, the add request fails with an unique constraint exception, stating that there is another device in the farm that has the same name.
Workaround: You have to do the removal and addition in two separate updates. Follow these steps:
Activate a one-server x86 farm from the Control Center (CC). For example, with the name "Server1" with eth0 connected to a subnet.
After the farm is activated, log onto the CC and delete the x86 server.
Submit the farm to commit this change.
Add a SPARC Solaris server that has the same name ("Server1").
Commit the change (send the update request) from the CC.
When using the PostgresSQL database, backupdb generates invalid backup data. As a result, restoredb fails because of the invalid data.
Workaround: None.
The image command allows you to delete an image (image -d) or modify an image (image -u) even if the image is in use. However, synchronizing this change with the Control Center will fail if the image is in use.
Workaround 1: Use the Control Center to delete or modify an image.
Workaround 2: Verify that the image is not in use before you use the image command. To verify whether an image is in use, follow these steps:
Type the following command to get a list of active farms: farm -l
For each farm that is in a state other than CREATED, type the following command to determine what images it uses: lr -lv fmid | grep Image
where fmid is the farm identifier provided in step 1.
For each farm that is in the CREATED state, type the following command to obtain its FML and save it to a temporary file: farm -lv fmid > /tmp/fmlfmid
Look through the temporary file and search for all occurrences of the string <diskimage. An image ID will be on the next line as shown below:
<disk id="10" location="internal" name="Disk B" size="30000000000" type="local"> <diskimage type="system"> 6 </diskimage> <client-info id="11" object-id="10">
The list of unique image IDs from steps 2 and 3 are the images that are currently in use.
When you use the command image -d and the image ID that you provide does not exist, a Java exception occurs.
Workaround: Run the command again with the correct image ID.
After upgrading from N1 Provisioning Server 3.0 Blades Edition, Update 1, to N1 Provisioning Server 3.1, Blades Edition, running the command request -lv causes a runtime exception for some of the requests that were created in the previous version of the product..
Workaround: None. You will not be able to view request details of certain requests that were filed before the upgrade.
When you use the image wizard to delete an account image, a Java exception occurs.
Workaround: Use the following command to delete the account image: image -d image-id
When you follow the instructions given by the image wizard to shut down your server, a replacePhysicalDevice request might be QUEUED. The image wizard does not tell you to delete that request. If you do not delete the QUEUED request, you cannot continue with the image process because the replacePhysicalDevice request will block the snapshot request from executing.
Workaround: Delete the replacePhysicalDevice request.
The image server size is determined during installation by querying the file system. The size is maintained in the database as an attribute of the image server device. This value is static and does not reflect changes that are made to the file system outside of the scope of N1 Provisioning Server. The following changes are outside of the scope:
If the images file system is being used for some other purpose and files other than those created by the N1 Provisioning Server software are copied onto it, the actual size of the image repository decreases. However, the size value in the database does not reflect the decrease in size for the image repository.
In this case, the software might allow a snapshot operation to proceed because the software assumes that enough space exists. However, the snapshot operation might fail due to lack of actual space on the filesystem.
If the partition is extended, the original partition is replaced by a new partition on a secondary disk, or files that are unknown to the N1 Provisioning Server software are removed from the images file system, the actual size of the images file system might increase. However, the size value in the database does not reflect the increase in size for the image repository.
In this case, the software might not allow a snapshot to proceed because the software assumes that enough space does not exist on the file system, even though sufficient space exists. The failure in this case could occur before the snapshot data is copied or after the snapshot data is copied.
In both cases, the symptom might not reflect the cause. The error that you see might not be clear enough to determine that the problem is due to incorrect image server size in the database. The error might be buried in the /var/adm/tspr.debug log file.
Workaround: If you see an unexplained snapshot error, follow these steps to determine whether the cause of the problem is a size inconsistency between the database and the actual file system:
Determine the device ID of the image server using the following command:
# /opt/terraspring/sbin/device -Lr |
Determine the image repository size in the N1 Provisioning Server database using the following command:
# /opt/terraspring/sbin/device lv device-id | grep imsvsize |
where device-id is the ID that you determined in the previous step.
Determine the total size of all the images that are known to the N1 Provisioning Server repository.
To get a verbose listing of all the images, type the following command:
# image -lv > tmpfile |
Look through the tmpfile and note all the size values in the “Image Locations” section for each image.
Add all the values in the previous step to arrive at the total size of all the images that are known to the repository.
Subtract the values from the two previous steps to determine the total available space in the image server as perceived by the N1 Provisioning Server software.
Determine the size of the actual filesystem using the following command:
# df -k path-to-images-filesystem |
Determine the available space in bytes for the actual filesystem by multiplying the value under “avail” in the df output by 1024.
If the value from step 4 (the perceived space) differs from the value in step 6 (the actual space), a size inconsistency exists. To resolve this inconsistency, follow these steps:
Add the actual available size (from step 6) and the total size of images (from step 3c). This total provides the new value for the imsvsize attribute in the N1 Provisioning Server repository.
Update the lmsvsize attribute in the N1 Provisioning Server database with the new value from the previous step using the following command:
# device -sA imsvsize new-imsvsize-value device-id |
When two farms are created simultaneously on a newly installed data center, sometimes one of the farms may fail with the following error message:
[MSG8300 ] Sql Error::ORA-00955: name is already used by an existing object |
Workaround: Resubmit the farm from the Control Center.
In an N1 Provisioning Server 3.1, Blades Edition installation running the PostgresSQL database, the command farm -Lt farm-id sometimes stops printing the log messages.
Workaround: Kill the log tail process and rerun it.
During installation when pestest is run or during runtime when farm activation is taking place, you might see the following message on your screen or in the debug log:
device-id: test FAILED: Reason was: - Cannot save state information for device-id: Blade Sn seems to be faulty |
Workaround: To prevent problems with later farm activation, you must do one of the following:
Replace the defective blade. This blade is defective and needs to be replaced as soon as possible. Follow these steps:
Type the following command to see the properties of the blade that is referred to by device-id in the message:
# /opt/terraspring/sbin/device -l device-id |
Examine the FARM_ID column.
If the FARM_ID column does not contain a hyphen (-), the blade is part of a farm.
If the blade is part of a farm, type the following command to replace the failed blade in the farm with another blade that has similar attributes:
# /opt/terraspring/sbin/replacedevice farm-id failed-device-id |
To find the ID of the shelf that houses this blade, type the following command:
# /opt/terraspring/sbin/device -l device-id |
Look for a line similar to the following:
cpu:sun-b100s-blade (- -) 50100:s0 ==> pwr:sun-b1600-pwr (- -) 50160:s0 |
In this example, the device ID for the shelf is 50160.
To determine the IP address for the shelf, type the following command:
# /opt/terraspring/sbin/device -lv shelf-device-id |
Look for the field ipaddress: to obtain the IP address of the shelf.
Telnet to the shelf and type the following command to inform the shelf controller that the blade is to be prepared for removal:
# replacefru Sn |
In response to this command, a blue LED will light on the blade.
Approach the blade shelf front panel and remove the defective blade.
The defective blade will have a blue LED that differentiates the defective blade from other blades in the shelf.
Insert a good blade into the blade shelf to replace the defective blade.
To detect the new blade and update the information in the database, type the following command:
# /opt/terraspring/sbin/shelfsync |
To retest the blades, type the following command:
# /opt/terraspring/sbin/pestest |
Mark the blade as FAILED. If you choose not to replace the defective blade, you must mark that blade as FAILED. Otherwise, your farm activation could fail if the defective blade is used in the farm. Follow these steps:
Type the following command to see the properties of the blade:
# /opt/terraspring/sbin/device -l device-id |
Examine the FARM_ID column.
If the FARM_ID column does not contain a hyphen (-), the blade is part of a farm.
Type the following command to replace the failed blade in the farm with another blade that has similar attributes:
# /opt/terraspring/sbin/replacedevice farm-id failed-device-id |
Examine the STATE column.
If the STATE is not set to FAILED, type the following command to set the state to FAILED:
# /opt/terraspring/sbin/device -sB device-id |
After a snapshot failure, farm activation or update fails.
Workaround: Remove configuration information of the farm from the image-copy-subnet section of /etc/dhcpd.conf file. Then, reboot the server and reactivate the farm again to restore a state prior to snapshot.
A Sun Fire B10n blade can be part of a high-availability load-balancing pair. In other words, the device is the child of a logical device that has a type that is subtype of device type halb. If you run the shelfsync command on the shelf that contains this blade, the shelfsync command reports the device as a newly discovered device. If you then choose to add this new device, the command reports a message while adding the device to the database. The message tells you that a device with the same MAC address is present in the database.
Workaround: Ignore the message.
The N1 Provisioning Server 3.1, Blades Edition product has disabled support for FTP images and FTP image servers due to the need to support flash and JumpStart images.
Workaround: You can enable FTP support. However, be aware of the following caveats:
Only one protocol (FTP or NFS) can be supported at a time. Therefore, the N1 datacenter where FTP support is enabled will now be able to provision and snapshot only via FTP.
Flash and JumpStart images cannot be supported in the FTP-enabled N1 datacenter. As a result, all flash and JumpStart images must be deleted.
Any attempts to perform a flash or JumpStart snapshot in the FTP-enabled N1 datacenter will result in an error and fail in an unknown way. Such an operation will not be supported.
Attempts to perform flash or JumpStart provisioning might work but will not be supported.
Make sure that you accept the caveats listed above.
To determine the device ID of the image server, type the following command:
# /opt/terraspring/sbin/device -Lr is |
In the example shown below, the device id is 3001:
# /opt/terraspring/sbin/device -Lr is
DEVICE_ID PARENT_ID STATUS FARM_ID TYPE
3001 - USED - cpu:sun-svr-420R-idb (Sun 420R)
1 devices found.
|
To verify the current protocol being used by the image server, type the following command:
# /opt/terraspring/sbin/device -lv image-server-device-id |
In the example shown below, the protocol is nfs:
# /opt/terraspring/sbin/device -lv 3001
Device ID: 3001, state: USED, owner: -, type: cpu:sun-svr-420R-idb (Sun 420R)
Device Attributes:
make: Sun
name: ps1
imsvsize: 67372343296
halclass: com.terraspring.drivers.sun.SunSysKonnect
nicvips: 1000
role: ispdb
model: 420R
basepath: /images
compressionratio:8
protocol: nfs
...
|
To change the protocol attribute to FTP, type the following command:
# /opt/terraspring/sbin/device -sA protocol ftp image-server-device-id |
Determine the username and password that will be used to connect to the image server via FTP.
You may create a new username and password for this purpose.
In the following example, the username is set to n1psftpu and the password is set to n1psftpp.
# useradd n1psftpu # passwd n1psftpu New Password: Re-enter new Password: passwd: password successfully changed for n1psftpu |
To encrypt the password, type the following command:
# /opt/terraspring/sbin/encrypter password |
Note the output in the following example.
# encrypter n1psftpp ptMSB/T9fNm8Borrjxl/gw== |
To add the ftp_user and ftp_password attributes to the image server device in the database, type the following command once for each attribute:
# /opt/terraspring/sbin/device -sA attribute-name attribute-value image-server-device-id |
Note that the encrypted password must be used as the value for the ftp_password attribute, as shown in the following example.
# /opt/terraspring/sbin/device -sA ftp_user n1psftpu 3001 # /opt/terraspring/sbin/device -sA ftp_password 'ptMSB/T9fNm8Borrjxl/gw==' 3001 |
To verify your changes, type the following command:
# /opt/terraspring/sbin/device -lv image-server-device-id |
To determine the list of disk images and other images, type the following command:
# /opt/terraspring/sbin/image -l |
The following example shows two images.
# /opt/terraspring/sbin/image -l IMAGE_ID IMAGE_NAME CUSTOMER SIZE OS TYPE \ STATE LOCATION 1 rh-linux-i86pc-disk-img __grid__ 30000000000 linux disk_image \ READY nfs://3001//images/master-images/rh-linux-i86pc-disk-img 6 solaris9u5-i86pc-flash __grid__ 1500000000 solaris flash \ READY nfs://3001//images/master-images/solaris9u5-i86pc-flash |
For each disk image, convert the protocol in the URLs to FTP.
Follow these steps:
To ensure that the image file is not deleted, rename the image file on the image server to a temporary name on the image server.
# mv /images/master-images/rh-linux-i86pc-disk-img \ /images/master-images/rh-linux-i86pc-disk-img.bak |
To delete the NFS URL from the image information in the database, type the command /opt/terraspring/sbin/image -dL nfs-url image-id.
# /opt/terraspring/sbin/image -dL \
nfs://3001//images/master-images/rh-linux-i86pc-disk-img 1
Image id is: 1
Delete URL nfs://3001//images/master-images/rh-linux-i86pc-disk-img for this image (y/n)? y
Deleting image content at: nfs://3001//images/master-images/rh-linux-i86pc-disk-img \
size: 1532913330 ip: 10.52.53.1 State: done
Deleted locator URL: nfs://3001//images/master-images/rh-linux-i86pc-disk-img
|
Rename the image back to the original name on the image server.
# mv /images/master-images/rh-linux-i86pc-disk-img.bak \ /images/master-images/rh-linux-i86pc-disk-img |
To add the FTP URL to the image database, type the command /opt/terraspring/sbin/image -uL ftp-url image-id.
The FTP URL is the same URL as the NFS URL, except for the protocol part, which is modified to ftp.
# /opt/terraspring/sbin/image -uL \
ftp://3001//images/master-images/rh-linux-i86pc-disk-img 1
Updated image: 1
|
To update the state of the FTP URL, type the command /opt/terraspring/sbin/imagesync --nosync image-id.
# /opt/terraspring/sbin/imagesync --nosync 1
Image 1 forcibly marked as synchronized
|
Type the following command to verify that the protocol in the URL has indeed been changed to ftp:
# /opt/terraspring/sbin/image -lv image-id |
For example:
# /opt/terraspring/sbin/image -lv 1
IMAGE_ID IMAGE_NAME CUSTOMER SIZE OS TYPE \
STATE LOCATION
1 rh-linux-i86pc-disk-img __grid__ 30000000000 linux disk_image \
READY ftp://3001//images/master-images/rh-linux-i86pc-disk-img
Description: RedHat Linux 2.1 AS, disk image, with snet NIC
Architecture: i86pc
Last Updated: 2004-02-12 23:19:01.0
Image Locations:
ID STATE SIZE LOCATION
26 done 1532913330 ftp://3001//images/master-images/rh-linux-i86pc-disk-img
|
For each flash or JumpStart image, type the following command to delete the image:
# /opt/terraspring/sbin/image -d image-id |
Before you delete the flash or JumpStart images, ensure that none of the images are in use as explained in image Command Does not Check Whether an Image is In Use (4892852 and 5002045). If an image is in use, deactivate and delete any farms that are using the image before you delete the image. If you decide not to do so, please note that future snapshots of the server disks on which these images have been deployed must be taken as disk_image even if the Control Center seems to allow a flash snapshot. See the caveats that precede this task.
# /opt/terraspring/sbin/image -d 6 Delete Image 6 (y/n)? y Queueing request to delete image ... Request (id: 74) submitted. Waiting for request 74 to complete... . Deleting image content at: nfs://3001//images/master-images/solaris9u5-i86pc-flash size: 647191212 ip: 10.52.53.1 State: done |
The FTP protocol now is enabled for both provisioning and taking snapshots of images in the datacenter.