N1 Provisioning Server 3.1, Blades Edition, Release Notes

Server Administration Issues

This section describes known issues that are associated with administering the N1 Provisioning Server 3.1, Blades Edition server.

Failures During Disk Copy Operations (4849694)

When a farm goes into standby mode, disk copy operations for all servers belonging to that farm start simultaneously. However, when a failure occurs during the first disk copy operation, for example, not enough disk space, the other disk copy operations continue until all are completed, and then the failure is reported.

Workaround: None.

“No More Resources” Exception When No Servers Available in Control Plane Database (4849699)

When no more provisionable servers are available in the Control Plane database, you will not be able to move a server from one server group to another in a single farm operation.

Workaround: Remove the server from the first server group and update the farm. Then add the server to the second server group and update the farm.


Note –

The server might be provisioned by another farm between removing it from the first server group and adding it to the second. Consequently a “no more resources” exception will occur.


mls Command Does not Show Accurate Status (4849719)

An mls -a command might inaccurately report the agent on a server as being marked DOWN.

Workaround: Wait 60 seconds and try again to confirm the status of the node. The normal monitoring of the node by the control plane server is not affected by this condition, and monitoring will accurately report a failed node.

Moving an Unmanaged Device Into Farm VLAN Fails (4856867)

Typically, after bench configuration, the ports on the shelf are in trunk mode. This configuration prevents you from moving an unmanaged device into a farm VLAN.

Workaround: Change the unmanaged device port from trunk mode to hybrid mode. Then, add the unmanaged device to the VLAN.

Deleting Active Requests Before Stopping the Control Plane Server (4856872)

Make sure that there are no active requests in the system before stopping the control plane server. If the control plane server fails, existing Farm Manager processes sometimes do not exit gracefully. Before restarting the control plane server, stop all remaining Farm Manager processes by performing the following steps:

  1. Check the existence of any Farm Manager processes by executing the following command:


    /usr/ucb/ps -auxwww | grep -i “com.terraspring.cs.fm”
    
  2. Use the UNIXTM kill command to stop any remaining Farm Manager processes.

Power Command Issues (4857749)

The power command with the -off option is similar to the UNIX command poweroff, which powers off the device on which it is issued. When using the power command with the -off option, be sure to have a blank space between power and -off, otherwise you will power off the control plane server.

Changing Power State of Provisionable Servers Used in an Existing Farm (4919199)

Do not independently power on or off provisionable servers in an existing farm.

Gigabit Ethernet Card Instance Assignment (4924060)

The gigabit Ethernet card of the Provisioning Server machine must be assigned an instance of 0.

Load Balancer Configuration Does Not Get Updated When Only the Balancing Policy has Changed (4998087)

If you change the balancing policy of an active farm (for example, from round-robin to wt-round-robin), the farm goes through the update process. However, after the process is completed, the load balancer configuration still shows the initial policy (for example, round-robin).

Workaround: If you have already defined virtual IPs and you want to change their policies, follow these steps:

  1. Delete the virtual IPs whose policy you want to change, and submit the request.

  2. Change the policy to the newly desired policy.

  3. Re–create the virtual IPs, and submit the request.

Load Balancer Configuration Failed Trying to Configure Nonexistent Second Switch (eth1) (4998088)

If a load balancer is in a chassis that has only one switch (ssc0) installed, the software still expects to configure the second switch (eth1).

Workaround: For this release, load balancing is supported only in dual-switch shelves.

Error Message When Updating Locator URL (5002040)

If you try to update the locator URL of an existing snapshot image using the command image -u -l, you see an error message. The error message differs based on whether the database is Oracle or PostgresSQL.

For Oracle, you see the following message:


Locator URL 'nfs://3001//images/master-images/solaris9u5-i86pc-flash' already exists!

For PostgresSQL, you see the following message:


ERROR: duplicate key violates unique constraint "imglocator_unique"

Workaround: None. This is a cosmetic bug and will likely be fixed in a later release.

Error When Replacing a Device With a Different Type but the Same Name (5002041)

You defined a device with a specific type in an active server farm (for example, an x86 farm with the name “Server1”). You want to change the type (for example, to sparc), but keep the name. If you fail to commit the delete request before you add the new device, the add request fails with an unique constraint exception, stating that there is another device in the farm that has the same name.

Workaround: You have to do the removal and addition in two separate updates. Follow these steps:

  1. Activate a one-server x86 farm from the Control Center (CC). For example, with the name "Server1" with eth0 connected to a subnet.

  2. After the farm is activated, log onto the CC and delete the x86 server.

  3. Submit the farm to commit this change.

  4. Add a SPARC Solaris server that has the same name ("Server1").

  5. Commit the change (send the update request) from the CC.

Using backupdb Creates Invalid Data for PostgresSQL Database restoredb (5002042)

When using the PostgresSQL database, backupdb generates invalid backup data. As a result, restoredb fails because of the invalid data.

Workaround: None.

image Command Does not Check Whether an Image is In Use (4892852 and 5002045)

The image command allows you to delete an image (image -d) or modify an image (image -u) even if the image is in use. However, synchronizing this change with the Control Center will fail if the image is in use.

Workaround 1: Use the Control Center to delete or modify an image.

Workaround 2: Verify that the image is not in use before you use the image command. To verify whether an image is in use, follow these steps:

  1. Type the following command to get a list of active farms: farm -l

  2. For each farm that is in a state other than CREATED, type the following command to determine what images it uses: lr -lv fmid | grep Image

    where fmid is the farm identifier provided in step 1.

  3. For each farm that is in the CREATED state, type the following command to obtain its FML and save it to a temporary file: farm -lv fmid > /tmp/fmlfmid

    Look through the temporary file and search for all occurrences of the string <diskimage. An image ID will be on the next line as shown below:

    <disk id="10" location="internal" name="Disk B" size="30000000000" type="local">
    <diskimage type="system">
     6
    </diskimage>
    <client-info id="11" object-id="10">

The list of unique image IDs from steps 2 and 3 are the images that are currently in use.

Using image -d to Delete a Nonexisting Image Causes Java Exception (5002046)

When you use the command image -d and the image ID that you provide does not exist, a Java exception occurs.

Workaround: Run the command again with the correct image ID.

Upgrade Causes incompatible class Exceptions (5002048)

After upgrading from N1 Provisioning Server 3.0 Blades Edition, Update 1, to N1 Provisioning Server 3.1, Blades Edition, running the command request -lv causes a runtime exception for some of the requests that were created in the previous version of the product..

Workaround: None. You will not be able to view request details of certain requests that were filed before the upgrade.

Using Image Wizard to Delete an Account Image Causes Java Exception (5002051)

When you use the image wizard to delete an account image, a Java exception occurs.

Workaround: Use the following command to delete the account image: image -d image-id

Image Wizard Stops Executing Due to Queued replacePhysicalDevice Request (5002052)

When you follow the instructions given by the image wizard to shut down your server, a replacePhysicalDevice request might be QUEUED. The image wizard does not tell you to delete that request. If you do not delete the QUEUED request, you cannot continue with the image process because the replacePhysicalDevice request will block the snapshot request from executing.

Workaround: Delete the replacePhysicalDevice request.

Image Command Fails to Create New Image with “Insufficient Disk Space” Error Although Space is Available (4989527)

The image server size is determined during installation by querying the file system. The size is maintained in the database as an attribute of the image server device. This value is static and does not reflect changes that are made to the file system outside of the scope of N1 Provisioning Server. The following changes are outside of the scope:

In both cases, the symptom might not reflect the cause. The error that you see might not be clear enough to determine that the problem is due to incorrect image server size in the database. The error might be buried in the /var/adm/tspr.debug log file.

Workaround: If you see an unexplained snapshot error, follow these steps to determine whether the cause of the problem is a size inconsistency between the database and the actual file system:

  1. Determine the device ID of the image server using the following command:


    # /opt/terraspring/sbin/device -Lr
    
  2. Determine the image repository size in the N1 Provisioning Server database using the following command:


    # /opt/terraspring/sbin/device lv device-id | grep imsvsize
    

    where device-id is the ID that you determined in the previous step.

  3. Determine the total size of all the images that are known to the N1 Provisioning Server repository.

    1. To get a verbose listing of all the images, type the following command:


      # image -lv > tmpfile
      
    2. Look through the tmpfile and note all the size values in the “Image Locations” section for each image.

    3. Add all the values in the previous step to arrive at the total size of all the images that are known to the repository.

  4. Subtract the values from the two previous steps to determine the total available space in the image server as perceived by the N1 Provisioning Server software.

  5. Determine the size of the actual filesystem using the following command:


    # df -k path-to-images-filesystem
    
  6. Determine the available space in bytes for the actual filesystem by multiplying the value under “avail” in the df output by 1024.

If the value from step 4 (the perceived space) differs from the value in step 6 (the actual space), a size inconsistency exists. To resolve this inconsistency, follow these steps:

  1. Add the actual available size (from step 6) and the total size of images (from step 3c). This total provides the new value for the imsvsize attribute in the N1 Provisioning Server repository.

  2. Update the lmsvsize attribute in the N1 Provisioning Server database with the new value from the previous step using the following command:


    # device -sA imsvsize new-imsvsize-value device-id
    

Activating Two Farms Simultaneously Fails (4989529)

When two farms are created simultaneously on a newly installed data center, sometimes one of the farms may fail with the following error message:


[MSG8300 ] Sql Error::ORA-00955: name is already used by an existing object

Workaround: Resubmit the farm from the Control Center.

farm -Lt Does not Tail the Log (4997346)

In an N1 Provisioning Server 3.1, Blades Edition installation running the PostgresSQL database, the command farm -Lt farm-id sometimes stops printing the log messages.

Workaround: Kill the log tail process and rerun it.

Minor Faults Reported by SC Causes N1 Provisioning Server to Classify Blades as Failed/Unusable (4998378)

During installation when pestest is run or during runtime when farm activation is taking place, you might see the following message on your screen or in the debug log:


device-id: test FAILED: Reason was:  - Cannot save state information for device-id: Blade Sn seems to be faulty

Workaround: To prevent problems with later farm activation, you must do one of the following:

Snapshot Failure Leaves Information in dhcpd.conf Configuration File (4998415)

After a snapshot failure, farm activation or update fails.

Workaround: Remove configuration information of the farm from the image-copy-subnet section of /etc/dhcpd.conf file. Then, reboot the server and reactivate the farm again to restore a state prior to snapshot.

shelfsync Command Wants to Add a B10n Blade That Already Exists (5006442)

A Sun Fire B10n blade can be part of a high-availability load-balancing pair. In other words, the device is the child of a logical device that has a type that is subtype of device type halb. If you run the shelfsync command on the shelf that contains this blade, the shelfsync command reports the device as a newly discovered device. If you then choose to add this new device, the command reports a message while adding the device to the database. The message tells you that a device with the same MAC address is present in the database.

Workaround: Ignore the message.

Product Does Not Clearly Support FTP Images and Image Servers (5003423)

The N1 Provisioning Server 3.1, Blades Edition product has disabled support for FTP images and FTP image servers due to the need to support flash and JumpStart images.

Workaround: You can enable FTP support. However, be aware of the following caveats:

ProcedureTo Enable FTP in an N1 Datacenter

Steps
  1. Make sure that you accept the caveats listed above.

  2. To determine the device ID of the image server, type the following command:


    # /opt/terraspring/sbin/device -Lr is
    

    In the example shown below, the device id is 3001:


    # /opt/terraspring/sbin/device -Lr is
     DEVICE_ID  PARENT_ID STATUS   FARM_ID    TYPE
          3001          - USED     -          cpu:sun-svr-420R-idb (Sun 420R)
    1 devices found.
  3. To verify the current protocol being used by the image server, type the following command:


    # /opt/terraspring/sbin/device -lv image-server-device-id
    

    In the example shown below, the protocol is nfs:


    # /opt/terraspring/sbin/device -lv 3001
    Device ID: 3001, state: USED, owner: -, type: cpu:sun-svr-420R-idb (Sun 420R)
      Device Attributes:
        make:           Sun
        name:           ps1
        imsvsize:       67372343296
        halclass:       com.terraspring.drivers.sun.SunSysKonnect
        nicvips:        1000
        role:           ispdb
        model:          420R
        basepath:       /images
        compressionratio:8
        protocol:       nfs
    ...
  4. To change the protocol attribute to FTP, type the following command:


    # /opt/terraspring/sbin/device -sA protocol ftp image-server-device-id
    
  5. Determine the username and password that will be used to connect to the image server via FTP.

    You may create a new username and password for this purpose.

    In the following example, the username is set to n1psftpu and the password is set to n1psftpp.


    # useradd n1psftpu
    # passwd n1psftpu
    New Password:
    Re-enter new Password:
    passwd: password successfully changed for n1psftpu
  6. To encrypt the password, type the following command:


    # /opt/terraspring/sbin/encrypter password
    

    Note the output in the following example.


    # encrypter n1psftpp
    ptMSB/T9fNm8Borrjxl/gw==
  7. To add the ftp_user and ftp_password attributes to the image server device in the database, type the following command once for each attribute:


    # /opt/terraspring/sbin/device -sA attribute-name attribute-value image-server-device-id
    

    Note that the encrypted password must be used as the value for the ftp_password attribute, as shown in the following example.


    # /opt/terraspring/sbin/device -sA ftp_user n1psftpu 3001
    # /opt/terraspring/sbin/device -sA ftp_password 'ptMSB/T9fNm8Borrjxl/gw==' 3001
    

    Tip –

    To verify your changes, type the following command:


    # /opt/terraspring/sbin/device -lv image-server-device-id
    

  8. To determine the list of disk images and other images, type the following command:


    # /opt/terraspring/sbin/image -l
    

    The following example shows two images.


    # /opt/terraspring/sbin/image -l
    IMAGE_ID IMAGE_NAME               CUSTOMER         SIZE       OS           TYPE            \
    STATE     LOCATION
    1        rh-linux-i86pc-disk-img  __grid__         30000000000 linux       disk_image      \
    READY     nfs://3001//images/master-images/rh-linux-i86pc-disk-img
    6        solaris9u5-i86pc-flash   __grid__         1500000000 solaris      flash           \
    READY     nfs://3001//images/master-images/solaris9u5-i86pc-flash
  9. For each disk image, convert the protocol in the URLs to FTP.

    Follow these steps:

    1. To ensure that the image file is not deleted, rename the image file on the image server to a temporary name on the image server.


      # mv /images/master-images/rh-linux-i86pc-disk-img   \
      /images/master-images/rh-linux-i86pc-disk-img.bak
      
    2. To delete the NFS URL from the image information in the database, type the command /opt/terraspring/sbin/image -dL nfs-url image-id.


      # /opt/terraspring/sbin/image -dL   \
      nfs://3001//images/master-images/rh-linux-i86pc-disk-img 1
          Image id is: 1
          Delete URL nfs://3001//images/master-images/rh-linux-i86pc-disk-img for this image (y/n)? y
          Deleting image content at: nfs://3001//images/master-images/rh-linux-i86pc-disk-img   \
      size: 1532913330   ip: 10.52.53.1   State: done
          Deleted locator URL: nfs://3001//images/master-images/rh-linux-i86pc-disk-img
    3. Rename the image back to the original name on the image server.


      # mv /images/master-images/rh-linux-i86pc-disk-img.bak   \
      /images/master-images/rh-linux-i86pc-disk-img
      
    4. To add the FTP URL to the image database, type the command /opt/terraspring/sbin/image -uL ftp-url image-id.


      Note –

      The FTP URL is the same URL as the NFS URL, except for the protocol part, which is modified to ftp.



      # /opt/terraspring/sbin/image -uL \
      ftp://3001//images/master-images/rh-linux-i86pc-disk-img 1
          Updated image: 1
    5. To update the state of the FTP URL, type the command /opt/terraspring/sbin/imagesync --nosync image-id.


      # /opt/terraspring/sbin/imagesync --nosync 1
          Image 1  forcibly marked as synchronized
    6. Type the following command to verify that the protocol in the URL has indeed been changed to ftp:


      # /opt/terraspring/sbin/image -lv image-id
      

      For example:


      # /opt/terraspring/sbin/image -lv 1
      IMAGE_ID IMAGE_NAME               CUSTOMER         SIZE       OS           TYPE            \
      STATE     LOCATION
      1        rh-linux-i86pc-disk-img  __grid__         30000000000 linux       disk_image      \
      READY     ftp://3001//images/master-images/rh-linux-i86pc-disk-img
      
      Description:   RedHat Linux 2.1 AS, disk image, with snet NIC
      Architecture:  i86pc
      Last Updated:  2004-02-12 23:19:01.0
      
      Image Locations:
          ID    STATE     SIZE             LOCATION
          26    done      1532913330       ftp://3001//images/master-images/rh-linux-i86pc-disk-img
  10. For each flash or JumpStart image, type the following command to delete the image:


    # /opt/terraspring/sbin/image -d image-id
    

    Note –

    Before you delete the flash or JumpStart images, ensure that none of the images are in use as explained in image Command Does not Check Whether an Image is In Use (4892852 and 5002045). If an image is in use, deactivate and delete any farms that are using the image before you delete the image. If you decide not to do so, please note that future snapshots of the server disks on which these images have been deployed must be taken as disk_image even if the Control Center seems to allow a flash snapshot. See the caveats that precede this task.



    # /opt/terraspring/sbin/image -d 6
    Delete Image 6 (y/n)? y
    Queueing request to delete image ...
    Request (id: 74) submitted.
    Waiting for request 74 to complete...
    .
    Deleting image content at: nfs://3001//images/master-images/solaris9u5-i86pc-flash   
    size: 647191212   ip: 10.52.53.1   State: done

    The FTP protocol now is enabled for both provisioning and taking snapshots of images in the datacenter.