N1 Provisioning Server 3.1, Blades Edition, Release Notes

Chapter 3 N1 Provisioning Server 3.1, Blades Edition Runtime Issues

This chapter describes known issues related to running the N1 Provisioning Server 3.1, Blades Edition product.

Control Center

This section describes known issues related to the N1 Provisioning Server Control Center.

Administration Screen Time-out Issue (4849661)

In certain cases, when you navigate from the Administration screen to the Editor screen and the session times out, you might not be able to log into the Control Center.

Workaround: Close the browser, open another browser, and log in again.

If you are in the Administration screen and the session times out, when you log in again, the window might display incorrectly. If this happens, close the browser and log in again.

To vary the session time-out, modify the time-out setting in the /opt/terraspring/gw/war/WEB-INF/web.xml file. Look for the session-timeout tag. The value is specified in minutes.

Find Farm Window Displays in Full After Time-out (4849673)

If you click the Find button in the Find Farm dialog box after the Find Farm session has timed out, the Find Farm window displays in full.

Workaround: Increase the time-out value. To do so, modify the time-out setting in the web.xml file in the /opt/terraspring/gw/war/WEB-INF directory. Look for the session-timeout tag. The value is specified in minutes.

No Rollback After Farm Update Failure (4849688)

When a farm update fails, changes do not always roll back to the last good state of the farm.

Workaround: The workaround for the problem is as follows:

  1. To clear the error state, contact the farm using the farm -pf farm-ID command.

  2. View the farm in the farm editor.

  3. From the farm editor, select the latest farm update request from the Request History panel on the left. The correct farm information will display.

  4. Choose Commit from the File menu to resubmit the farm update. A newly updated request is issued.

  5. Unblock the new request. The correct farm information will display in the farm editor after the request is completed.

Do Not Bookmark Login Page (4849693)

Do not bookmark any page except the welcome page.

Allow Dialog Pages to Finish Downloading Information (4849697)

In some dialog pages, for example, Select: Disk Image, the application needs to download information from the server. If you click any buttons on the page before that download is complete, a script error occurs.

Workaround: Wait until the page has finished downloading all information from the server before taking any actions.

Snapshot Button Not Disabled for New Disk Created (4849721)

When you create a new disk by flexing a server group, the snapshot button is not disabled as it should be. When you click that button, an error message appears.

Deleting Snapshots or Images From the Control Center (4856854)

When you delete snapshots and images from the Control Center, they are only marked as deleted. They are not yet deleted from the I-Fabric. Until you delete them from the I-Fabric, you will not be able to create snapshots and images with the same names as the ones marked as deleted.

To purge snapshots and images from the I-Fabric, issue the image -lR command from the control plane server to view a list of the images marked as deleted. Then issue the image -d command to delete them from the I-Fabric. See the image command man page for details.

BOM Does Not Show Unmanaged Devices (4856865)

The bill of materials (BOM) dialog box shows information about devices managed by the N1 Provisioning Server and about devices of a known type that are not managed. The BOM dialog box does not include information about unmanaged devices of an unknown type.

Graphics Not Displayed Correctly After Reload Operation (4857740)

Sometimes when you reload a current farm, the graphics in the navigation bar do not display correctly.

Workaround: Refresh the page.

Control Center Client Software Requirements (4857757)

The Control Center requires MicrosoftTM Internet Explorer version 6.0 web browser with a 128-bit security encryption.

Adding Port When Updating a Farm Does Not Switch Devices (4989531)

Initially, a device with only a single connection was requested and allocated. Later a second connection to the device was requested, but the request could not be granted because the allocated device only has one connection. No new device is allocated to fulfill this request.

Workaround 1: Always connect both physical interfaces to subnets even if you do not need the two interfaces. Connecting both interfaces guarantees that a device with two connected interfaces is allocated to the farm. The secondary interface can then be reconfigured at a later time once a purpose has been identified.

Workaround 2: When a farm update fails due to the lack of resources after connecting a secondary interface of a server, bring the farm into active state by changing the farm design back to its previous design and resubmitting a farm update request. When the farm is in active state, snapshot the device to which you would like to add a second interface. Once the snapshot is completed, update the farm by removing the original server to which you would like to add a second interface and replace the deleted server with a new server and initialize its disk from the snapshot image. Connect both physical interfaces of the new device to a subnet and submit the farm update request. A new server will be allocated to the farm with two connected interfaces.

Workaround 3: Add a second switch to the chassis that hosts the device requiring the second connection. Update the device using the shelfsync command. Then resubmit the farm update request.

Workaround 4: Instead of using the secondary interface of the server, update the farm using a virtual interface on the primary interface. This workaround will force the bandwidth of the primary interface to be shared across the eth0 and virtual interfaces.

Imported Farm Can Have Incorrect Number of Ports (4998397)

An imported farm can have appropriate types, however, the implied number of connections might be incorrect.

Workaround: When importing a farm design from a .feml file, make sure that the device types match those in the current I-Fabric. In particular, make sure that the number of ports is correct for each device. If the number of ports for a device is not correct, change the device type within the device configuration dialog box before you submit the new design.

Confusing Error Message When Trying to Unblock a Farm Request With PostgresSQL (5002047)

Rhe following message appears when trying to unblock a farm request from the Pending request page:


Operation failed... may have been caused by ifabric misconfiguration

Workaround: Run the request again. After a few tries, it will succeed.

Server Administration Issues

This section describes known issues that are associated with administering the N1 Provisioning Server 3.1, Blades Edition server.

Failures During Disk Copy Operations (4849694)

When a farm goes into standby mode, disk copy operations for all servers belonging to that farm start simultaneously. However, when a failure occurs during the first disk copy operation, for example, not enough disk space, the other disk copy operations continue until all are completed, and then the failure is reported.

Workaround: None.

“No More Resources” Exception When No Servers Available in Control Plane Database (4849699)

When no more provisionable servers are available in the Control Plane database, you will not be able to move a server from one server group to another in a single farm operation.

Workaround: Remove the server from the first server group and update the farm. Then add the server to the second server group and update the farm.


Note –

The server might be provisioned by another farm between removing it from the first server group and adding it to the second. Consequently a “no more resources” exception will occur.


mls Command Does not Show Accurate Status (4849719)

An mls -a command might inaccurately report the agent on a server as being marked DOWN.

Workaround: Wait 60 seconds and try again to confirm the status of the node. The normal monitoring of the node by the control plane server is not affected by this condition, and monitoring will accurately report a failed node.

Moving an Unmanaged Device Into Farm VLAN Fails (4856867)

Typically, after bench configuration, the ports on the shelf are in trunk mode. This configuration prevents you from moving an unmanaged device into a farm VLAN.

Workaround: Change the unmanaged device port from trunk mode to hybrid mode. Then, add the unmanaged device to the VLAN.

Deleting Active Requests Before Stopping the Control Plane Server (4856872)

Make sure that there are no active requests in the system before stopping the control plane server. If the control plane server fails, existing Farm Manager processes sometimes do not exit gracefully. Before restarting the control plane server, stop all remaining Farm Manager processes by performing the following steps:

  1. Check the existence of any Farm Manager processes by executing the following command:


    /usr/ucb/ps -auxwww | grep -i “com.terraspring.cs.fm”
    
  2. Use the UNIXTM kill command to stop any remaining Farm Manager processes.

Power Command Issues (4857749)

The power command with the -off option is similar to the UNIX command poweroff, which powers off the device on which it is issued. When using the power command with the -off option, be sure to have a blank space between power and -off, otherwise you will power off the control plane server.

Changing Power State of Provisionable Servers Used in an Existing Farm (4919199)

Do not independently power on or off provisionable servers in an existing farm.

Gigabit Ethernet Card Instance Assignment (4924060)

The gigabit Ethernet card of the Provisioning Server machine must be assigned an instance of 0.

Load Balancer Configuration Does Not Get Updated When Only the Balancing Policy has Changed (4998087)

If you change the balancing policy of an active farm (for example, from round-robin to wt-round-robin), the farm goes through the update process. However, after the process is completed, the load balancer configuration still shows the initial policy (for example, round-robin).

Workaround: If you have already defined virtual IPs and you want to change their policies, follow these steps:

  1. Delete the virtual IPs whose policy you want to change, and submit the request.

  2. Change the policy to the newly desired policy.

  3. Re–create the virtual IPs, and submit the request.

Load Balancer Configuration Failed Trying to Configure Nonexistent Second Switch (eth1) (4998088)

If a load balancer is in a chassis that has only one switch (ssc0) installed, the software still expects to configure the second switch (eth1).

Workaround: For this release, load balancing is supported only in dual-switch shelves.

Error Message When Updating Locator URL (5002040)

If you try to update the locator URL of an existing snapshot image using the command image -u -l, you see an error message. The error message differs based on whether the database is Oracle or PostgresSQL.

For Oracle, you see the following message:


Locator URL 'nfs://3001//images/master-images/solaris9u5-i86pc-flash' already exists!

For PostgresSQL, you see the following message:


ERROR: duplicate key violates unique constraint "imglocator_unique"

Workaround: None. This is a cosmetic bug and will likely be fixed in a later release.

Error When Replacing a Device With a Different Type but the Same Name (5002041)

You defined a device with a specific type in an active server farm (for example, an x86 farm with the name “Server1”). You want to change the type (for example, to sparc), but keep the name. If you fail to commit the delete request before you add the new device, the add request fails with an unique constraint exception, stating that there is another device in the farm that has the same name.

Workaround: You have to do the removal and addition in two separate updates. Follow these steps:

  1. Activate a one-server x86 farm from the Control Center (CC). For example, with the name "Server1" with eth0 connected to a subnet.

  2. After the farm is activated, log onto the CC and delete the x86 server.

  3. Submit the farm to commit this change.

  4. Add a SPARC Solaris server that has the same name ("Server1").

  5. Commit the change (send the update request) from the CC.

Using backupdb Creates Invalid Data for PostgresSQL Database restoredb (5002042)

When using the PostgresSQL database, backupdb generates invalid backup data. As a result, restoredb fails because of the invalid data.

Workaround: None.

image Command Does not Check Whether an Image is In Use (4892852 and 5002045)

The image command allows you to delete an image (image -d) or modify an image (image -u) even if the image is in use. However, synchronizing this change with the Control Center will fail if the image is in use.

Workaround 1: Use the Control Center to delete or modify an image.

Workaround 2: Verify that the image is not in use before you use the image command. To verify whether an image is in use, follow these steps:

  1. Type the following command to get a list of active farms: farm -l

  2. For each farm that is in a state other than CREATED, type the following command to determine what images it uses: lr -lv fmid | grep Image

    where fmid is the farm identifier provided in step 1.

  3. For each farm that is in the CREATED state, type the following command to obtain its FML and save it to a temporary file: farm -lv fmid > /tmp/fmlfmid

    Look through the temporary file and search for all occurrences of the string <diskimage. An image ID will be on the next line as shown below:

    <disk id="10" location="internal" name="Disk B" size="30000000000" type="local">
    <diskimage type="system">
     6
    </diskimage>
    <client-info id="11" object-id="10">

The list of unique image IDs from steps 2 and 3 are the images that are currently in use.

Using image -d to Delete a Nonexisting Image Causes Java Exception (5002046)

When you use the command image -d and the image ID that you provide does not exist, a Java exception occurs.

Workaround: Run the command again with the correct image ID.

Upgrade Causes incompatible class Exceptions (5002048)

After upgrading from N1 Provisioning Server 3.0 Blades Edition, Update 1, to N1 Provisioning Server 3.1, Blades Edition, running the command request -lv causes a runtime exception for some of the requests that were created in the previous version of the product..

Workaround: None. You will not be able to view request details of certain requests that were filed before the upgrade.

Using Image Wizard to Delete an Account Image Causes Java Exception (5002051)

When you use the image wizard to delete an account image, a Java exception occurs.

Workaround: Use the following command to delete the account image: image -d image-id

Image Wizard Stops Executing Due to Queued replacePhysicalDevice Request (5002052)

When you follow the instructions given by the image wizard to shut down your server, a replacePhysicalDevice request might be QUEUED. The image wizard does not tell you to delete that request. If you do not delete the QUEUED request, you cannot continue with the image process because the replacePhysicalDevice request will block the snapshot request from executing.

Workaround: Delete the replacePhysicalDevice request.

Image Command Fails to Create New Image with “Insufficient Disk Space” Error Although Space is Available (4989527)

The image server size is determined during installation by querying the file system. The size is maintained in the database as an attribute of the image server device. This value is static and does not reflect changes that are made to the file system outside of the scope of N1 Provisioning Server. The following changes are outside of the scope:

In both cases, the symptom might not reflect the cause. The error that you see might not be clear enough to determine that the problem is due to incorrect image server size in the database. The error might be buried in the /var/adm/tspr.debug log file.

Workaround: If you see an unexplained snapshot error, follow these steps to determine whether the cause of the problem is a size inconsistency between the database and the actual file system:

  1. Determine the device ID of the image server using the following command:


    # /opt/terraspring/sbin/device -Lr
    
  2. Determine the image repository size in the N1 Provisioning Server database using the following command:


    # /opt/terraspring/sbin/device lv device-id | grep imsvsize
    

    where device-id is the ID that you determined in the previous step.

  3. Determine the total size of all the images that are known to the N1 Provisioning Server repository.

    1. To get a verbose listing of all the images, type the following command:


      # image -lv > tmpfile
      
    2. Look through the tmpfile and note all the size values in the “Image Locations” section for each image.

    3. Add all the values in the previous step to arrive at the total size of all the images that are known to the repository.

  4. Subtract the values from the two previous steps to determine the total available space in the image server as perceived by the N1 Provisioning Server software.

  5. Determine the size of the actual filesystem using the following command:


    # df -k path-to-images-filesystem
    
  6. Determine the available space in bytes for the actual filesystem by multiplying the value under “avail” in the df output by 1024.

If the value from step 4 (the perceived space) differs from the value in step 6 (the actual space), a size inconsistency exists. To resolve this inconsistency, follow these steps:

  1. Add the actual available size (from step 6) and the total size of images (from step 3c). This total provides the new value for the imsvsize attribute in the N1 Provisioning Server repository.

  2. Update the lmsvsize attribute in the N1 Provisioning Server database with the new value from the previous step using the following command:


    # device -sA imsvsize new-imsvsize-value device-id
    

Activating Two Farms Simultaneously Fails (4989529)

When two farms are created simultaneously on a newly installed data center, sometimes one of the farms may fail with the following error message:


[MSG8300 ] Sql Error::ORA-00955: name is already used by an existing object

Workaround: Resubmit the farm from the Control Center.

farm -Lt Does not Tail the Log (4997346)

In an N1 Provisioning Server 3.1, Blades Edition installation running the PostgresSQL database, the command farm -Lt farm-id sometimes stops printing the log messages.

Workaround: Kill the log tail process and rerun it.

Minor Faults Reported by SC Causes N1 Provisioning Server to Classify Blades as Failed/Unusable (4998378)

During installation when pestest is run or during runtime when farm activation is taking place, you might see the following message on your screen or in the debug log:


device-id: test FAILED: Reason was:  - Cannot save state information for device-id: Blade Sn seems to be faulty

Workaround: To prevent problems with later farm activation, you must do one of the following:

Snapshot Failure Leaves Information in dhcpd.conf Configuration File (4998415)

After a snapshot failure, farm activation or update fails.

Workaround: Remove configuration information of the farm from the image-copy-subnet section of /etc/dhcpd.conf file. Then, reboot the server and reactivate the farm again to restore a state prior to snapshot.

shelfsync Command Wants to Add a B10n Blade That Already Exists (5006442)

A Sun Fire B10n blade can be part of a high-availability load-balancing pair. In other words, the device is the child of a logical device that has a type that is subtype of device type halb. If you run the shelfsync command on the shelf that contains this blade, the shelfsync command reports the device as a newly discovered device. If you then choose to add this new device, the command reports a message while adding the device to the database. The message tells you that a device with the same MAC address is present in the database.

Workaround: Ignore the message.

Product Does Not Clearly Support FTP Images and Image Servers (5003423)

The N1 Provisioning Server 3.1, Blades Edition product has disabled support for FTP images and FTP image servers due to the need to support flash and JumpStart images.

Workaround: You can enable FTP support. However, be aware of the following caveats:

ProcedureTo Enable FTP in an N1 Datacenter

Steps
  1. Make sure that you accept the caveats listed above.

  2. To determine the device ID of the image server, type the following command:


    # /opt/terraspring/sbin/device -Lr is
    

    In the example shown below, the device id is 3001:


    # /opt/terraspring/sbin/device -Lr is
     DEVICE_ID  PARENT_ID STATUS   FARM_ID    TYPE
          3001          - USED     -          cpu:sun-svr-420R-idb (Sun 420R)
    1 devices found.
  3. To verify the current protocol being used by the image server, type the following command:


    # /opt/terraspring/sbin/device -lv image-server-device-id
    

    In the example shown below, the protocol is nfs:


    # /opt/terraspring/sbin/device -lv 3001
    Device ID: 3001, state: USED, owner: -, type: cpu:sun-svr-420R-idb (Sun 420R)
      Device Attributes:
        make:           Sun
        name:           ps1
        imsvsize:       67372343296
        halclass:       com.terraspring.drivers.sun.SunSysKonnect
        nicvips:        1000
        role:           ispdb
        model:          420R
        basepath:       /images
        compressionratio:8
        protocol:       nfs
    ...
  4. To change the protocol attribute to FTP, type the following command:


    # /opt/terraspring/sbin/device -sA protocol ftp image-server-device-id
    
  5. Determine the username and password that will be used to connect to the image server via FTP.

    You may create a new username and password for this purpose.

    In the following example, the username is set to n1psftpu and the password is set to n1psftpp.


    # useradd n1psftpu
    # passwd n1psftpu
    New Password:
    Re-enter new Password:
    passwd: password successfully changed for n1psftpu
  6. To encrypt the password, type the following command:


    # /opt/terraspring/sbin/encrypter password
    

    Note the output in the following example.


    # encrypter n1psftpp
    ptMSB/T9fNm8Borrjxl/gw==
  7. To add the ftp_user and ftp_password attributes to the image server device in the database, type the following command once for each attribute:


    # /opt/terraspring/sbin/device -sA attribute-name attribute-value image-server-device-id
    

    Note that the encrypted password must be used as the value for the ftp_password attribute, as shown in the following example.


    # /opt/terraspring/sbin/device -sA ftp_user n1psftpu 3001
    # /opt/terraspring/sbin/device -sA ftp_password 'ptMSB/T9fNm8Borrjxl/gw==' 3001
    

    Tip –

    To verify your changes, type the following command:


    # /opt/terraspring/sbin/device -lv image-server-device-id
    

  8. To determine the list of disk images and other images, type the following command:


    # /opt/terraspring/sbin/image -l
    

    The following example shows two images.


    # /opt/terraspring/sbin/image -l
    IMAGE_ID IMAGE_NAME               CUSTOMER         SIZE       OS           TYPE            \
    STATE     LOCATION
    1        rh-linux-i86pc-disk-img  __grid__         30000000000 linux       disk_image      \
    READY     nfs://3001//images/master-images/rh-linux-i86pc-disk-img
    6        solaris9u5-i86pc-flash   __grid__         1500000000 solaris      flash           \
    READY     nfs://3001//images/master-images/solaris9u5-i86pc-flash
  9. For each disk image, convert the protocol in the URLs to FTP.

    Follow these steps:

    1. To ensure that the image file is not deleted, rename the image file on the image server to a temporary name on the image server.


      # mv /images/master-images/rh-linux-i86pc-disk-img   \
      /images/master-images/rh-linux-i86pc-disk-img.bak
      
    2. To delete the NFS URL from the image information in the database, type the command /opt/terraspring/sbin/image -dL nfs-url image-id.


      # /opt/terraspring/sbin/image -dL   \
      nfs://3001//images/master-images/rh-linux-i86pc-disk-img 1
          Image id is: 1
          Delete URL nfs://3001//images/master-images/rh-linux-i86pc-disk-img for this image (y/n)? y
          Deleting image content at: nfs://3001//images/master-images/rh-linux-i86pc-disk-img   \
      size: 1532913330   ip: 10.52.53.1   State: done
          Deleted locator URL: nfs://3001//images/master-images/rh-linux-i86pc-disk-img
    3. Rename the image back to the original name on the image server.


      # mv /images/master-images/rh-linux-i86pc-disk-img.bak   \
      /images/master-images/rh-linux-i86pc-disk-img
      
    4. To add the FTP URL to the image database, type the command /opt/terraspring/sbin/image -uL ftp-url image-id.


      Note –

      The FTP URL is the same URL as the NFS URL, except for the protocol part, which is modified to ftp.



      # /opt/terraspring/sbin/image -uL \
      ftp://3001//images/master-images/rh-linux-i86pc-disk-img 1
          Updated image: 1
    5. To update the state of the FTP URL, type the command /opt/terraspring/sbin/imagesync --nosync image-id.


      # /opt/terraspring/sbin/imagesync --nosync 1
          Image 1  forcibly marked as synchronized
    6. Type the following command to verify that the protocol in the URL has indeed been changed to ftp:


      # /opt/terraspring/sbin/image -lv image-id
      

      For example:


      # /opt/terraspring/sbin/image -lv 1
      IMAGE_ID IMAGE_NAME               CUSTOMER         SIZE       OS           TYPE            \
      STATE     LOCATION
      1        rh-linux-i86pc-disk-img  __grid__         30000000000 linux       disk_image      \
      READY     ftp://3001//images/master-images/rh-linux-i86pc-disk-img
      
      Description:   RedHat Linux 2.1 AS, disk image, with snet NIC
      Architecture:  i86pc
      Last Updated:  2004-02-12 23:19:01.0
      
      Image Locations:
          ID    STATE     SIZE             LOCATION
          26    done      1532913330       ftp://3001//images/master-images/rh-linux-i86pc-disk-img
  10. For each flash or JumpStart image, type the following command to delete the image:


    # /opt/terraspring/sbin/image -d image-id
    

    Note –

    Before you delete the flash or JumpStart images, ensure that none of the images are in use as explained in image Command Does not Check Whether an Image is In Use (4892852 and 5002045). If an image is in use, deactivate and delete any farms that are using the image before you delete the image. If you decide not to do so, please note that future snapshots of the server disks on which these images have been deployed must be taken as disk_image even if the Control Center seems to allow a flash snapshot. See the caveats that precede this task.



    # /opt/terraspring/sbin/image -d 6
    Delete Image 6 (y/n)? y
    Queueing request to delete image ...
    Request (id: 74) submitted.
    Waiting for request 74 to complete...
    .
    Deleting image content at: nfs://3001//images/master-images/solaris9u5-i86pc-flash   
    size: 647191212   ip: 10.52.53.1   State: done

    The FTP protocol now is enabled for both provisioning and taking snapshots of images in the datacenter.