N1 Provisioning Server 3.1, Blades Edition, Release Notes

Server Administration Issues

This section describes known issues that are associated with administering the N1 Provisioning Server 3.1, Blades Edition server.

Failures During Disk Copy Operations (4849694)

When a farm goes into standby mode, disk copy operations for all servers belonging to that farm start simultaneously. However, when a failure occurs during the first disk copy operation, for example, not enough disk space, the other disk copy operations continue until all are completed, and then the failure is reported.

Workaround: None.

“No More Resources” Exception When No Servers Available in Control Plane Database (4849699)

When no more provisionable servers are available in the Control Plane database, you will not be able to move a server from one server group to another in a single farm operation.

Workaround: Remove the server from the first server group and update the farm. Then add the server to the second server group and update the farm.

Note –

The server might be provisioned by another farm between removing it from the first server group and adding it to the second. Consequently a “no more resources” exception will occur.

`mls` Command Does not Show Accurate Status (4849719)

An mls -a command might inaccurately report the agent on a server as being marked DOWN.

Workaround: Wait 60 seconds and try again to confirm the status of the node. The normal monitoring of the node by the control plane server is not affected by this condition, and monitoring will accurately report a failed node.

Moving an Unmanaged Device Into Farm VLAN Fails (4856867)

Typically, after bench configuration, the ports on the shelf are in trunk mode. This configuration prevents you from moving an unmanaged device into a farm VLAN.

Workaround: Change the unmanaged device port from trunk mode to hybrid mode. Then, add the unmanaged device to the VLAN.

Deleting Active Requests Before Stopping the Control Plane Server (4856872)

Make sure that there are no active requests in the system before stopping the control plane server. If the control plane server fails, existing Farm Manager processes sometimes do not exit gracefully. Before restarting the control plane server, stop all remaining Farm Manager processes by performing the following steps:

Check the existence of any Farm Manager processes by executing the following command:
/usr/ucb/ps -auxwww | grep -i “com.terraspring.cs.fm”
Use the UNIX^TM kill command to stop any remaining Farm Manager processes.

Power Command Issues (4857749)

The power command with the -off option is similar to the UNIX command poweroff, which powers off the device on which it is issued. When using the power command with the -off option, be sure to have a blank space between power and -off, otherwise you will power off the control plane server.

Changing Power State of Provisionable Servers Used in an Existing Farm (4919199)

Do not independently power on or off provisionable servers in an existing farm.

Gigabit Ethernet Card Instance Assignment (4924060)

The gigabit Ethernet card of the Provisioning Server machine must be assigned an instance of 0.

Load Balancer Configuration Does Not Get Updated When Only the Balancing Policy has Changed (4998087)

If you change the balancing policy of an active farm (for example, from round-robin to wt-round-robin), the farm goes through the update process. However, after the process is completed, the load balancer configuration still shows the initial policy (for example, round-robin).

Workaround: If you have already defined virtual IPs and you want to change their policies, follow these steps:

Delete the virtual IPs whose policy you want to change, and submit the request.
Change the policy to the newly desired policy.
Re–create the virtual IPs, and submit the request.

Load Balancer Configuration Failed Trying to Configure Nonexistent Second Switch (`eth1`) (4998088)

If a load balancer is in a chassis that has only one switch (ssc0) installed, the software still expects to configure the second switch (eth1).

Workaround: For this release, load balancing is supported only in dual-switch shelves.

Error Message When Updating Locator URL (5002040)

If you try to update the locator URL of an existing snapshot image using the command image -u -l, you see an error message. The error message differs based on whether the database is Oracle or PostgresSQL.

For Oracle, you see the following message:

Locator URL 'nfs://3001//images/master-images/solaris9u5-i86pc-flash' already exists!

For PostgresSQL, you see the following message:

ERROR: duplicate key violates unique constraint "imglocator_unique"

Workaround: None. This is a cosmetic bug and will likely be fixed in a later release.

Error When Replacing a Device With a Different Type but the Same Name (5002041)

You defined a device with a specific type in an active server farm (for example, an x86 farm with the name “Server1”). You want to change the type (for example, to sparc), but keep the name. If you fail to commit the delete request before you add the new device, the add request fails with an unique constraint exception, stating that there is another device in the farm that has the same name.

Workaround: You have to do the removal and addition in two separate updates. Follow these steps:

Activate a one-server x86 farm from the Control Center (CC). For example, with the name "Server1" with eth0 connected to a subnet.
After the farm is activated, log onto the CC and delete the x86 server.
Submit the farm to commit this change.
Add a SPARC Solaris server that has the same name ("Server1").
Commit the change (send the update request) from the CC.

Using `backupdb` Creates Invalid Data for PostgresSQL Database `restoredb` (5002042)

When using the PostgresSQL database, backupdb generates invalid backup data. As a result, restoredb fails because of the invalid data.

Workaround: None.

`image` Command Does not Check Whether an Image is In Use (4892852 and 5002045)

The image command allows you to delete an image (image -d) or modify an image (image -u) even if the image is in use. However, synchronizing this change with the Control Center will fail if the image is in use.

Workaround 1: Use the Control Center to delete or modify an image.

Workaround 2: Verify that the image is not in use before you use the image command. To verify whether an image is in use, follow these steps:

Type the following command to get a list of active farms: farm -l
For each farm that is in a state other than CREATED, type the following command to determine what images it uses: lr -lv fmid | grep Image

where fmid is the farm identifier provided in step 1.
For each farm that is in the CREATED state, type the following command to obtain its FML and save it to a temporary file: farm -lv fmid > /tmp/fmlfmid

Look through the temporary file and search for all occurrences of the string <diskimage. An image ID will be on the next line as shown below:
```
<disk id="10" location="internal" name="Disk B" size="30000000000" type="local">
<diskimage type="system">
 6
</diskimage>
<client-info id="11" object-id="10">
```

The list of unique image IDs from steps 2 and 3 are the images that are currently in use.

Using `image -d` to Delete a Nonexisting Image Causes Java Exception (5002046)

When you use the command image -d and the image ID that you provide does not exist, a Java exception occurs.

Workaround: Run the command again with the correct image ID.

Upgrade Causes `incompatible class` Exceptions (5002048)

After upgrading from N1 Provisioning Server 3.0 Blades Edition, Update 1, to N1 Provisioning Server 3.1, Blades Edition, running the command request -lv causes a runtime exception for some of the requests that were created in the previous version of the product..

Workaround: None. You will not be able to view request details of certain requests that were filed before the upgrade.

Using Image Wizard to Delete an Account Image Causes Java Exception (5002051)

When you use the image wizard to delete an account image, a Java exception occurs.

Workaround: Use the following command to delete the account image: image -d image-id

Image Wizard Stops Executing Due to Queued `replacePhysicalDevice` Request (5002052)

When you follow the instructions given by the image wizard to shut down your server, a replacePhysicalDevice request might be QUEUED. The image wizard does not tell you to delete that request. If you do not delete the QUEUED request, you cannot continue with the image process because the replacePhysicalDevice request will block the snapshot request from executing.

Workaround: Delete the replacePhysicalDevice request.

Image Command Fails to Create New Image with “Insufficient Disk Space” Error Although Space is Available (4989527)

The image server size is determined during installation by querying the file system. The size is maintained in the database as an attribute of the image server device. This value is static and does not reflect changes that are made to the file system outside of the scope of N1 Provisioning Server. The following changes are outside of the scope:

If the images file system is being used for some other purpose and files other than those created by the N1 Provisioning Server software are copied onto it, the actual size of the image repository decreases. However, the size value in the database does not reflect the decrease in size for the image repository.

In this case, the software might allow a snapshot operation to proceed because the software assumes that enough space exists. However, the snapshot operation might fail due to lack of actual space on the filesystem.
If the partition is extended, the original partition is replaced by a new partition on a secondary disk, or files that are unknown to the N1 Provisioning Server software are removed from the images file system, the actual size of the images file system might increase. However, the size value in the database does not reflect the increase in size for the image repository.

In this case, the software might not allow a snapshot to proceed because the software assumes that enough space does not exist on the file system, even though sufficient space exists. The failure in this case could occur before the snapshot data is copied or after the snapshot data is copied.

In both cases, the symptom might not reflect the cause. The error that you see might not be clear enough to determine that the problem is due to incorrect image server size in the database. The error might be buried in the /var/adm/tspr.debug log file.

Workaround: If you see an unexplained snapshot error, follow these steps to determine whether the cause of the problem is a size inconsistency between the database and the actual file system:

Determine the device ID of the image server using the following command:
# /opt/terraspring/sbin/device -Lr
Determine the image repository size in the N1 Provisioning Server database using the following command:
# /opt/terraspring/sbin/device lv device-id | grep imsvsize
where device-id is the ID that you determined in the previous step.
Determine the total size of all the images that are known to the N1 Provisioning Server repository.
1. To get a verbose listing of all the images, type the following command:
  # image -lv > tmpfile
2. Look through the tmpfile and note all the size values in the “Image Locations” section for each image.
3. Add all the values in the previous step to arrive at the total size of all the images that are known to the repository.
Subtract the values from the two previous steps to determine the total available space in the image server as perceived by the N1 Provisioning Server software.
Determine the size of the actual filesystem using the following command:
# df -k path-to-images-filesystem
Determine the available space in bytes for the actual filesystem by multiplying the value under “avail” in the df output by 1024.

If the value from step 4 (the perceived space) differs from the value in step 6 (the actual space), a size inconsistency exists. To resolve this inconsistency, follow these steps:

Add the actual available size (from step 6) and the total size of images (from step 3c). This total provides the new value for the imsvsize attribute in the N1 Provisioning Server repository.
Update the lmsvsize attribute in the N1 Provisioning Server database with the new value from the previous step using the following command:
# device -sA imsvsize new-imsvsize-value device-id

Activating Two Farms Simultaneously Fails (4989529)

When two farms are created simultaneously on a newly installed data center, sometimes one of the farms may fail with the following error message:

[MSG8300 ] Sql Error::ORA-00955: name is already used by an existing object

Workaround: Resubmit the farm from the Control Center.

`farm -Lt` Does not Tail the Log (4997346)

In an N1 Provisioning Server 3.1, Blades Edition installation running the PostgresSQL database, the command farm -Lt farm-id sometimes stops printing the log messages.

Workaround: Kill the log tail process and rerun it.

Minor Faults Reported by SC Causes N1 Provisioning Server to Classify Blades as Failed/Unusable (4998378)

During installation when pestest is run or during runtime when farm activation is taking place, you might see the following message on your screen or in the debug log:

device-id: test FAILED: Reason was:  - Cannot save state information for device-id: Blade Sn seems to be faulty

Workaround: To prevent problems with later farm activation, you must do one of the following:

Replace the defective blade. This blade is defective and needs to be replaced as soon as possible. Follow these steps:
1. Type the following command to see the properties of the blade that is referred to by device-id in the message:
  # /opt/terraspring/sbin/device -l device-id
2. Examine the FARM_ID column.
  
  If the FARM_ID column does not contain a hyphen (-), the blade is part of a farm.
  
  If the blade is part of a farm, type the following command to replace the failed blade in the farm with another blade that has similar attributes:
  # /opt/terraspring/sbin/replacedevice farm-id failed-device-id
3. To find the ID of the shelf that houses this blade, type the following command:
  # /opt/terraspring/sbin/device -l device-id
  Look for a line similar to the following:
  cpu:sun-b100s-blade (- -) 50100:s0 ==> pwr:sun-b1600-pwr (- -) 50160:s0
  In this example, the device ID for the shelf is 50160.
4. To determine the IP address for the shelf, type the following command:
  # /opt/terraspring/sbin/device -lv shelf-device-id
  Look for the field ipaddress: to obtain the IP address of the shelf.
5. Telnet to the shelf and type the following command to inform the shelf controller that the blade is to be prepared for removal:
  # replacefru Sn
  In response to this command, a blue LED will light on the blade.
6. Approach the blade shelf front panel and remove the defective blade.
  
  The defective blade will have a blue LED that differentiates the defective blade from other blades in the shelf.
7. Insert a good blade into the blade shelf to replace the defective blade.
8. To detect the new blade and update the information in the database, type the following command:
  # /opt/terraspring/sbin/shelfsync
9. To retest the blades, type the following command:
  # /opt/terraspring/sbin/pestest
Mark the blade as FAILED. If you choose not to replace the defective blade, you must mark that blade as FAILED. Otherwise, your farm activation could fail if the defective blade is used in the farm. Follow these steps:
1. Type the following command to see the properties of the blade:
  # /opt/terraspring/sbin/device -l device-id
2. Examine the FARM_ID column.
  
  If the FARM_ID column does not contain a hyphen (-), the blade is part of a farm.
  
  Type the following command to replace the failed blade in the farm with another blade that has similar attributes:
  # /opt/terraspring/sbin/replacedevice farm-id failed-device-id
3. Examine the STATE column.
  
  If the STATE is not set to FAILED, type the following command to set the state to FAILED:
  # /opt/terraspring/sbin/device -sB device-id

Snapshot Failure Leaves Information in `dhcpd.conf` Configuration File (4998415)

After a snapshot failure, farm activation or update fails.

Workaround: Remove configuration information of the farm from the image-copy-subnet section of /etc/dhcpd.conf file. Then, reboot the server and reactivate the farm again to restore a state prior to snapshot.

`shelfsync` Command Wants to Add a B10`n` Blade That Already Exists (5006442)

A Sun Fire B10n blade can be part of a high-availability load-balancing pair. In other words, the device is the child of a logical device that has a type that is subtype of device type halb. If you run the shelfsync command on the shelf that contains this blade, the shelfsync command reports the device as a newly discovered device. If you then choose to add this new device, the command reports a message while adding the device to the database. The message tells you that a device with the same MAC address is present in the database.

Workaround: Ignore the message.

Product Does Not Clearly Support FTP Images and Image Servers (5003423)

The N1 Provisioning Server 3.1, Blades Edition product has disabled support for FTP images and FTP image servers due to the need to support flash and JumpStart images.

Workaround: You can enable FTP support. However, be aware of the following caveats:

Only one protocol (FTP or NFS) can be supported at a time. Therefore, the N1 datacenter where FTP support is enabled will now be able to provision and snapshot only via FTP.
Flash and JumpStart images cannot be supported in the FTP-enabled N1 datacenter. As a result, all flash and JumpStart images must be deleted.
Any attempts to perform a flash or JumpStart snapshot in the FTP-enabled N1 datacenter will result in an error and fail in an unknown way. Such an operation will not be supported.
Attempts to perform flash or JumpStart provisioning might work but will not be supported.

To Enable FTP in an N1 Datacenter

Steps

Make sure that you accept the caveats listed above.

To determine the device ID of the image server, type the following command:

# /opt/terraspring/sbin/device -Lr is

In the example shown below, the device id is 3001:

# /opt/terraspring/sbin/device -Lr is
 DEVICE_ID  PARENT_ID STATUS   FARM_ID    TYPE
      3001          - USED     -          cpu:sun-svr-420R-idb (Sun 420R)
1 devices found.

To verify the current protocol being used by the image server, type the following command:

# /opt/terraspring/sbin/device -lv image-server-device-id

In the example shown below, the protocol is nfs:

# /opt/terraspring/sbin/device -lv 3001
Device ID: 3001, state: USED, owner: -, type: cpu:sun-svr-420R-idb (Sun 420R)
  Device Attributes:
    make:           Sun
    name:           ps1
    imsvsize:       67372343296
    halclass:       com.terraspring.drivers.sun.SunSysKonnect
    nicvips:        1000
    role:           ispdb
    model:          420R
    basepath:       /images
    compressionratio:8
    protocol:       nfs
...

To change the protocol attribute to FTP, type the following command:

# /opt/terraspring/sbin/device -sA protocol ftp image-server-device-id

Determine the username and password that will be used to connect to the image server via FTP.

You may create a new username and password for this purpose.

In the following example, the username is set to n1psftpu and the password is set to n1psftpp.
# useradd n1psftpu # passwd n1psftpu New Password: Re-enter new Password: passwd: password successfully changed for n1psftpu

To encrypt the password, type the following command:

# /opt/terraspring/sbin/encrypter password

Note the output in the following example.

# encrypter n1psftpp
ptMSB/T9fNm8Borrjxl/gw==

To add the ftp_user and ftp_password attributes to the image server device in the database, type the following command once for each attribute:
# /opt/terraspring/sbin/device -sA attribute-name attribute-value image-server-device-id
Note that the encrypted password must be used as the value for the ftp_password attribute, as shown in the following example.
# /opt/terraspring/sbin/device -sA ftp_user n1psftpu 3001 # /opt/terraspring/sbin/device -sA ftp_password 'ptMSB/T9fNm8Borrjxl/gw==' 3001
Tip –
To verify your changes, type the following command:
# /opt/terraspring/sbin/device -lv image-server-device-id

To determine the list of disk images and other images, type the following command:

# /opt/terraspring/sbin/image -l

The following example shows two images.

# /opt/terraspring/sbin/image -l
IMAGE_ID IMAGE_NAME               CUSTOMER         SIZE       OS           TYPE            \
STATE     LOCATION
1        rh-linux-i86pc-disk-img  __grid__         30000000000 linux       disk_image      \
READY     nfs://3001//images/master-images/rh-linux-i86pc-disk-img
6        solaris9u5-i86pc-flash   __grid__         1500000000 solaris      flash           \
READY     nfs://3001//images/master-images/solaris9u5-i86pc-flash

For each disk image, convert the protocol in the URLs to FTP.

Follow these steps:

To ensure that the image file is not deleted, rename the image file on the image server to a temporary name on the image server.
# mv /images/master-images/rh-linux-i86pc-disk-img \ /images/master-images/rh-linux-i86pc-disk-img.bak

To delete the NFS URL from the image information in the database, type the command /opt/terraspring/sbin/image -dL nfs-url image-id.

# /opt/terraspring/sbin/image -dL   \
nfs://3001//images/master-images/rh-linux-i86pc-disk-img 1
    Image id is: 1
    Delete URL nfs://3001//images/master-images/rh-linux-i86pc-disk-img for this image (y/n)? y
    Deleting image content at: nfs://3001//images/master-images/rh-linux-i86pc-disk-img   \
size: 1532913330   ip: 10.52.53.1   State: done
    Deleted locator URL: nfs://3001//images/master-images/rh-linux-i86pc-disk-img

Rename the image back to the original name on the image server.

# mv /images/master-images/rh-linux-i86pc-disk-img.bak   \
/images/master-images/rh-linux-i86pc-disk-img

To add the FTP URL to the image database, type the command /opt/terraspring/sbin/image -uL ftp-url image-id.

Note –
The FTP URL is the same URL as the NFS URL, except for the protocol part, which is modified to ftp.
# /opt/terraspring/sbin/image -uL \ ftp://3001//images/master-images/rh-linux-i86pc-disk-img 1 Updated image: 1

To update the state of the FTP URL, type the command /opt/terraspring/sbin/imagesync --nosync image-id.
# /opt/terraspring/sbin/imagesync --nosync 1 Image 1 forcibly marked as synchronized

Type the following command to verify that the protocol in the URL has indeed been changed to ftp:

# /opt/terraspring/sbin/image -lv image-id

For example:

# /opt/terraspring/sbin/image -lv 1
IMAGE_ID IMAGE_NAME               CUSTOMER         SIZE       OS           TYPE            \
STATE     LOCATION
1        rh-linux-i86pc-disk-img  __grid__         30000000000 linux       disk_image      \
READY     ftp://3001//images/master-images/rh-linux-i86pc-disk-img

Description:   RedHat Linux 2.1 AS, disk image, with snet NIC
Architecture:  i86pc
Last Updated:  2004-02-12 23:19:01.0

Image Locations:
    ID    STATE     SIZE             LOCATION
    26    done      1532913330       ftp://3001//images/master-images/rh-linux-i86pc-disk-img

For each flash or JumpStart image, type the following command to delete the image:

# /opt/terraspring/sbin/image -d image-id

Note –

Before you delete the flash or JumpStart images, ensure that none of the images are in use as explained in image Command Does not Check Whether an Image is In Use (4892852 and 5002045). If an image is in use, deactivate and delete any farms that are using the image before you delete the image. If you decide not to do so, please note that future snapshots of the server disks on which these images have been deployed must be taken as disk_image even if the Control Center seems to allow a flash snapshot. See the caveats that precede this task.

# /opt/terraspring/sbin/image -d 6
Delete Image 6 (y/n)? y
Queueing request to delete image ...
Request (id: 74) submitted.
Waiting for request 74 to complete...
.
Deleting image content at: nfs://3001//images/master-images/solaris9u5-i86pc-flash   
size: 647191212   ip: 10.52.53.1   State: done

The FTP protocol now is enabled for both provisioning and taking snapshots of images in the datacenter.

Server Administration Issues

Failures During Disk Copy Operations (4849694)

“No More Resources” Exception When No Servers Available in Control Plane Database (4849699)

mls Command Does not Show Accurate Status (4849719)

Moving an Unmanaged Device Into Farm VLAN Fails (4856867)

Deleting Active Requests Before Stopping the Control Plane Server (4856872)

Power Command Issues (4857749)

Changing Power State of Provisionable Servers Used in an Existing Farm (4919199)

Gigabit Ethernet Card Instance Assignment (4924060)

Load Balancer Configuration Does Not Get Updated When Only the Balancing Policy has Changed (4998087)

Load Balancer Configuration Failed Trying to Configure Nonexistent Second Switch (eth1) (4998088)

Error Message When Updating Locator URL (5002040)

Error When Replacing a Device With a Different Type but the Same Name (5002041)

Using backupdb Creates Invalid Data for PostgresSQL Database restoredb (5002042)

image Command Does not Check Whether an Image is In Use (4892852 and 5002045)

Using image -d to Delete a Nonexisting Image Causes Java Exception (5002046)

Upgrade Causes incompatible class Exceptions (5002048)

Using Image Wizard to Delete an Account Image Causes Java Exception (5002051)

Image Wizard Stops Executing Due to Queued replacePhysicalDevice Request (5002052)

Image Command Fails to Create New Image with “Insufficient Disk Space” Error Although Space is Available (4989527)

Activating Two Farms Simultaneously Fails (4989529)

farm -Lt Does not Tail the Log (4997346)

Minor Faults Reported by SC Causes N1 Provisioning Server to Classify Blades as Failed/Unusable (4998378)

Snapshot Failure Leaves Information in dhcpd.conf Configuration File (4998415)

shelfsync Command Wants to Add a B10n Blade That Already Exists (5006442)

Product Does Not Clearly Support FTP Images and Image Servers (5003423)

To Enable FTP in an N1 Datacenter

Steps

`mls` Command Does not Show Accurate Status (4849719)

Load Balancer Configuration Failed Trying to Configure Nonexistent Second Switch (`eth1`) (4998088)

Using `backupdb` Creates Invalid Data for PostgresSQL Database `restoredb` (5002042)

`image` Command Does not Check Whether an Image is In Use (4892852 and 5002045)

Using `image -d` to Delete a Nonexisting Image Causes Java Exception (5002046)

Upgrade Causes `incompatible class` Exceptions (5002048)

Image Wizard Stops Executing Due to Queued `replacePhysicalDevice` Request (5002052)

`farm -Lt` Does not Tail the Log (4997346)

Snapshot Failure Leaves Information in `dhcpd.conf` Configuration File (4998415)

`shelfsync` Command Wants to Add a B10`n` Blade That Already Exists (5006442)