This chapter describes the Sun N1TM System Manager 1.1 issues that are known to be problems.
This section lists the features and software that are not supported in Sun N1 System Manager 1.1 release.
The N1 System Manager product is localized only for Japanese. The browser interface and command line help are in English only.
The following OSs are not currently supported by N1 System Manager on provisionable servers, even though the create update command lists them as OS types:
Red Hat Enterprise Linux WS 3.0
Red Hat Enterprise Linux WS 3.0, 64-bit
Red Hat Enterprise Linux WS 4.0
Red Hat Enterprise Linux WS 4.0, 64-bit
Each type of job in the N1 System Manager has a weight associated with it. The weight is a reflection of the load created by the job on the system resources. There is also a global limit on how much total load can be placed on the system. The following table provides a listing of the weight for each type of (user level) job. The maximum load permitted is 1000.
Table 1–1 Job Weight Values
Task |
Weight |
---|---|
OS Deployment |
500 |
Package Deployment |
500 |
Package Uninstall |
500 |
Discovery |
200 |
Firmware Deployment |
500 |
Remote Command Execution |
200 |
Job Deletion |
400 |
Create OS |
1000 |
Reset Server |
200 |
Server Power Off |
200 |
Server Power On |
200 |
Server Refresh |
200 |
Set Server Feature |
200 |
Remove Server |
100 |
Add Server |
100 |
The total load is the sum of the loads of all the current running jobs. The system will compare the current total load with the maximum permitted load at the following points in time:
After enqueueing a new job
After completion or stopping a running job
If the difference between the current total load and the maximum permitted load is great enough to accommodate the job at the head of the job queue, then that job is promoted to a running state,else it is left in the queued state. The current total load governs the permissible concurrent running job mix within the system.
For example, there can only be two OS Deployment jobs running at one time (500 + 500 <= 1000) or one OS Deployment job and two Server Power Off jobs (500 + 200 + 200 <= 1000).
The following procedure describes how to copy the SLES 9 SP1 OS distribution to the management server.
Substitute directory with the name of the directory in which you have stored the SLES9 SP1 ISO images.
Copy the SLES 9 distribution:
# n1sh create os sles9u1 file /directory/SLES-9-i386-RC5-CD1.iso,/directory/SLES-9-i386-RC5-CD2.iso, /directory/SLES-9-i386-RC5-CD3.iso,/directory/SLES-9-i386-RC5-CD4.iso, /directory/SLES-9-i386-RC5-CD5.iso,/directory/SLES-9-i386-RC5-CD6.iso |
Wait for the Create OS command to complete before going to the next step.
Copy the SLES 9 Update 1 distribution:
# n1sh create os sles9u1 file directory/SLES-9-SP-1-i386-RC5-CD1.iso,directory/SLES9/SLES-9-SP-1-i386-RC5-CD2.iso, directory/SLES-9-SP-1-i386-RC5-CD3.iso |
This section describes known documentation errata.
The list of default distribution groups is incorrect. For a Solaris profile, it should be the Entire Distribution plus OEM Support instead of Core. For a Red Hat profile, it should be Everything instead of Base.
An OS distribution can be deleted even if it is currently deployed on a provisionable server. However, an OS distribution cannot be deleted until all of its associated OS profiles are deleted.
An OS profile can be deleted even if it is currently deployed on a provisionable server.
The type value for SUSE LINUX Enterprise Server 9 is invalid. It should be suse instead of sles9.
This section describes known documentation errata in the Sun N1 System Manager 1.1 Site Preparation Guide.
Management Server Connections in Sun N1 System Manager 1.1 Site Preparation Guide is applicable only to the first reference configuration, Separate Management, Provisioning, and Data Networks in Sun N1 System Manager 1.1 Site Preparation Guide, including the note at the end of the section. Refer to the other reference configurations for connection information.
The caution statement in Combined Provisioning and Data Network, and a Separate Management Network in Sun N1 System Manager 1.1 Site Preparation Guide and Combined Provisioning, Data, and Management Network in Sun N1 System Manager 1.1 Site Preparation Guide should read:
The N1 System Manager DHCP service must be the only DHCP service on the data network.
This section describes known documentation errata in the Sun N1 System Manager 1.1 Administration Guide.
The procedure To Restore the N1 System Manager Database and Configuration Files in Sun N1 System Manager 1.1 Administration Guide does not make it clear that you must first install an operating system and the N1 System Manager software on the replacement management server before starting the procedure. See Chapter 3, Installing and Configuring an OS on the Management Server, in Sun N1 System Manager 1.1 Site Preparation Guide, and the Sun N1 System Manager 1.1 Installation and Configuration Guide for details.
Table 5–3, Factory-Configured Default Polling Intervals in section Changing Polling Intervals With the Monitoring Configuration File in Sun N1 System Manager 1.1 Administration Guide lists the factory-default polling intervals as follows:
Hardware health |
120 seconds |
OS resources |
120 seconds |
Network reachability |
60 seconds |
The factory–default polling intervals are 600 seconds:
Hardware health |
600 seconds |
OS resources |
600 seconds |
Network reachability |
600 seconds |
This section describes known N1 System Manager installation issues.
If the N1 System Manager is interrupted and restarted, the N1 System Manager installation can fail in step 5, “Install OS provisioning components”. If this issue occurs, a subsequent uninstall and reinstall of the N1 System Manager will fail.
The installation log file /var/tmp/installer.log.latest shows the following after initial installation failure:
Installing Master Server ... Error! Missing file (looked for /opt/SUNWn1sps /N1_Grid_Service_Provisioning_System_5.1 /server/postgres/postgresql.conf.in)! print() on closed filehandle GEN0 at /usr/perl5/5.8.4/lib/i86pc-solaris-64int/IO/Handle.pm line 399. SPS install failed with exit status: 256 ----------------------------- 2k. Which port should Postgres listen on? (default: 5434) [1024-65535] spawn id(3) is not a tty. Not changing mode at /usr/perl5/site_perl/5.8.4/Expect.pm line 375. admin admin admin ** Invalid Input. Enter a numeric value for the port number. 2k. Which port should Postgres listen on? (default: 5434) [1024-65535] spawn id(3) is not a tty. Not changing mode at /usr/perl5/site_perl/5.8.4/Expect.pm line 375. admin admin admin ** Invalid Input. Enter a numeric value for the port number. 2k. Which port should Postgres listen on? (default: 5434) [1024-65535
The installation log shows the following after uninstall and reinstall of the N1 System Manager software:
Error! Failed to initialize the database (exit value was 1). Exiting.. print() on closed filehandle GEN0 at /usr/lib/perl5/5.8.0 /i386-linux-thread-multi/IO/Handle.pm line 395. SPS install failed with exit status: 256 |
Workaround: Perform the workaround procedure below that is applicable to the operating system installed on your management server. Depending on how the installation error occurred, some of the workaround steps might not complete successfully. If a workaround step does not complete successfully, go to the next step.
Solaris based Sun Fire X4100 or X4200 management server:
Stop the server and agent.
# su - n1gsps -c "/opt/SUNWn1sps/N1_Service_Provisioning_System_5.1/server/bin/cr_server stop" # su - n1gsps -c "/opt/SUNWn1sps/N1_Service_Provisioning_System/agent/bin/cr_agent stop" |
Uninstall service provisioning manually.
# /opt/SUNWn1sps/N1_Service_Provisioning_System_5.1/cli/bin/cr_uninstall_cli.sh # /opt/SUNWn1sps/N1_Service_Provisioning_System_5.1/server/bin/cr_uninstall_ms.sh |
Remove the following packages.
SUNWspsc1
SUNWspsms
SUNWspsml
# pkgrm SUNWspsc1 # pkgrm SUNWspsms # pkgrm SUNWspscl |
Type y in response to prompts asking “Do you want to remove this package? [y,n,?,q]”. If the message pkgrm: ERROR: no package associated with SUNWspscl appears, that package has already removed by step 2. Continue removing packages.
Delete the service provisioning directory and files.
# cd / # rm -rf /opt/SUNWn1sps/ # rm /n1gc-setup/sps/state # rm /n1gc-setup/state/0installSPS.pl.state |
Reboot the management server and then install the N1 System Manager software.
Linux based Sun Fire X4100 or X4200 management server:
Stop the server and agent.
# su - n1gsps -c "/opt/sun/N1_Service_Provisioning_System_5.1/server/bin/cr_server stop" # su - n1gsps -c "/opt/sun/N1_Service_Provisioning_System/agent/bin/cr_agent stop" |
Delete the service provisioning directory and files.
# cd / # rm -rf /opt/sun/N1_Grid_Service_Provisioning_System_5.1 # rm -rf /opt/sun/N1_Grid_Service_Provisioning_System # rm -rf /opt/sun/N1_Service_Provisioning_System # rm -rf /opt/sun/N1_Service_Provisioning_System_5.1 # rm /n1gc-setup/sps/state # rm /n1gc-setup/state/0installSPS.pl.state |
Reboot the management server and then install the N1 System Manager software.
Using an external DVD-ROM to create an OS distribution can fail because of job time out due to the introduction of additional network latency.
Workaround: Use the n1smconfig command on the management server to increase the job time out values to a worst case value of three hours.
This section describes known security issues.
The event refresher frame reloads every 10 seconds, which updates the user's session timestamp. Therefore, the browser interface session will never time out.
Workaround: Explicitly log out when you are done using the browser interface.
This section describes known discovery issues.
The error message “Driver Not Found” is misleading when Sun Fire X4100 and Sun Fire X4200 servers fail discovery. The error occurs because the management server cannot create an SSH connection to the server.
Workaround: Check the SSH credentials by using ssh to access the system directly, and update the credentials accordingly.
Discovery of a Sun Fire V20z or V40z server will fail if the server's SSH credentials are configured but its IPMI credentials are not. This issue can occur as follows:
For all SP firmware revisions, the server was previously configured with SSH credentials, but the IPMI credentials were not configured.
For firmware revisions 2.3.0.11 and later, SSH and IPMI credentials were previously configured, but IPMI was manually disabled on the service processor using the ipmi disable channel lan command. This command unconfigures the IPMI credentials, which was not the case for previous firmware revisions.
The following error messages display when this issue occurs:
The Discovery job:
Errors Results Result 1: Server: 10.0.3.12 Status: -3 Message: An exception occurred trying to access 10.0.3.12. Please refer to the log file for more information. |
The syslog file:
Aug 25 17:43:26 lab126-rh-n1sm cacao[9720]: v20z.V20zAuthService.authenticate : IPMI channel enabling failed. IPMI user account needs to be initialized. |
Workaround: Enable IPMI on the Sun Fire V20z or Sun Fire V40z server using the following procedure and rerun discovery.
Log in to the service processor.
Enable IPMI:
sp$ ipmi enable channel lan |
Enter the password when prompted.
This section describes the known OS provisioning (deployment) issues.
OS deployments of Red Hat or SUSE Linux might time out or stop due to a known issue with the spanning tree setting on a network switch.
Workaround: Disable spanning tree on the switch or the switch ports used for the management server and the target servers provisioning network connections.
OS deployment of Red Hat Linux 3.0 Update 2 might stop and enter interactive mode due to a time out issue.
Workaround: Use Update 3 or later.
When using the Load OS wizard in the browser interface, the Exclude Server field does not work when you enter servers to be excluded from being provisioned. This feature is not available from the browser interface.
Workaround: Use the load server command from the command line interface.
The Load OS Wizard does not make it clear why the user has to enter the hostname and configuration twice.
The user is prompted for the hostname in two of the wizard steps:
The first prompt is for the hostname used during the installation process
The second prompt for hostname value that is to be used when the installation is complete
These values might or might not be the same.
Solaris 10 does not install the dhcpd daemon in /usr/sbin/ as did earlier versions of Solaris, but instead Solaris 10 installs the dhcpd daemon in /usr/local/sbin/. As a result, if the machine is rebooted or you kill the dhcpd daemon, the dhcpd daemon cannot be restarted.
Workaround: Every time the Solaris management server is rebooted or shutdown, you must enter the following command on the management server after it boots:
/opt/SUNWscs/sbin/s_dhcp_config.pl -e -I interface |
This section describes the known browser interface and command line interface issues.
If the management IP addresses of two discovered servers are swapped, the detailed server information displayed for each of the servers with the swapped addresses will be the information for the other swapped server. For example, if server A and server B have their management IP addresses swapped, “show server A” will show server B's information and “show server B” will show server A's information.
Workaround: Delete both of the servers with swapped IP addresses and then rediscover them. This will result in any user supplied information about the server being lost.
The browser interface uses frames that are synchronized. If you click the browser's Back button in one of the frames, the frames can get out of sync.
Workaround: Press F5 or refresh the page to synchronize the frames.
Loading a Solaris OS distribution from a Solaris management server to a provisionable server fails to a provisionable server fails with the error prom_panic: Could not mount filesystem immediately after the network boot process starts. After the error message is displayed, the provisionable server enters the boot debugger mode.
Workaround: Stop and restart the NFS service on the management server as follows:
# /etc/init.d/nfs.server stop # /etc/init.d/nfs.server start |
The last line of the serial console launched from the N1 System Manager browser interface is not displayed in the serial console window.
Workaround: Press Enter or Return to display the last line.
The applet used for serial console access from the Web Browser interface uses SSHv1 only for communication back to the N1 System Manager management server. This feature requires enabling SSHv1 for the N1 System Manager management server.
Workaround: If you do not want to enable SSHv1 and the serial console Web Browser interface, you can use the serial console feature from the n1sh command line interface.
The show server command and the browser interface server list show the wrong version of an installed provisionable server with Red Hat Enterprise Linux AS/ES 4.0 Update 1 OS. For example, Linux RedHat 4ES U4 is shown instead of Linux RedHat 4ES U1.
Workaround: There is no workaround.
The Web Console feature requires that the browser client have access to the management network of the target system where the Web Console is hosted.
Workaround: Run the browser interface on a host which has access to the management network of the target system.
The serial console feature fails on Sun X4100 and Sun Fire X4200 servers when using either the command line (connect server command) or the browser interface.
Workaround: Retry the serial console.
After the management server reboots, it takes at least five minutes for all the servers by using the command line (show server command) or the browser interface. This issue only happens on the initial display of the servers.
Workaround: No workaround available.
To use the Serial Console feature from the Server Details page in the browser interface, the Java Plugin 1.2 or later must be installed on the system where you are running the browser. Not all of the supported browsers for the N1 System Manager have this installed.
This section describes known firmware update issues.
Dual-core Sun Fire V20z and Sun Fire V40z servers require a 2.3.x and greater firmware revision. N1 System Manager does not prevent you from deploying firmware revisions below 2.3.x. Deploying firmware revisions below 2.3.x may result in issues with the server's service processor.
Workaround: Double check the firmware revision before updating.
If a Solaris OS update installation fails, a copy of the admin file used for the installation is not removed from the provisionable server. If the failure was due to a corrupt or invalid admin file, subsequent OS update installations will not replace the faulty admin file and it may cause continued failures.
Workaround: Delete the package-filename.admin file in the provisionable server's /tmp directory and retry the OS update installation. If you specified a customized admin file for the OS update, ensure that the admin file is valid.
If the name of a package and the name of the file installing that package differ and the package requires use of an admin file to install and uninstall, the package can be installed on a target host using an OS Update, but the OS Update (the package) cannot subsequently be uninstalled from the N1 System Manager.
Workaround: You can do one of the following to delete the improperly-named package:
Rename the package file to match the name of the package before using N1 System Manager to install the package.
Rename the admin file in the provisionable server's /tmp directory to match the name of the package. For example, package-name.admin instead of package-filename.admin.
Manually remove the package from the provisionable server using pkgrm.
OS monitoring support cannot be initialized when redeploying to a provisionable server where the IP address is the same IP address assigned in a previous deployment. An IpUnreachableException is generated. This occurs because the /.ssh/known_hosts file contains the original deployment IP address.
Workaround: Log in to the management server as root, and either edit the /.ssh/known_hosts file and remove the ssh key entry for the server, or remove the /.ssh/known_hosts file.
If you attempt to load a Red Hat Linux update onto a server installed with the Solaris OS, the Sun N1 System Manager will initiate the Update job. This Update job will fail.
Workaround: Make sure that the update is compatible with the installed OS. You can view a provisionable server's OS by using the show server command, and you can view the OS type for an OS update by using the show update command.
The unmonitored and unknown filter values for a server's utilization (OS Resource monitoring) do not work. Specifically, the following commands are unavailable:
N1-ok> show server utilization=unmonitored N1-ok> show server utilization=unknown |
Workaround: There is no workaround.
Issuing the show server health=unmonitored command returns no server list, even if servers are in the Unmonitored state.
Workaround: No workaround available.
The create notification command fails if an apostrophe is used for the description attribute.
Workaround: Escape the apostrophe with another apostrophe (for example, Paul”s Notification) or do not use an apostrophe in the description.
Even after all jobs are finished running, the clock icon next to the servers in the View Selector section may still display, which is a problem with the refresh feature.
Workaround: Click the Refresh button or press F5 to refresh the browser interface.
When OS monitoring is stopped on a provisionable server and subsequently restarted, the OS status is not updated until 10 minutes later, even when the user-specified polling value is less than 10 minutes.
Workaround: No workaround available.
If the provisionable server's network interface is unavailable, the OS resource health status in the browser interface should change to unreachable on the next refresh interval. Currently, this status does not change.
By default, the network and refresh intervals are set to 10 minutes when the N1 System Manager software is installed. If the network and refresh intervals are different, then the OS resource health and network status will be updated at different intervals. This in turn causes different results to display when the provisionable server's data network interface is unreachable, and the network polling interval is lower than the OS resources refresh interval.
The network status is updated in the server's detail page in the browser before it is updated in the main server page. Because of these differences, the OS resource health status in the server main page changes to “unreachable” on the next OS resources refresh interval (default value: 10 minutes).
Workaround: Use the show server server name command to view the OS resource health status.
If a Create OS job is running and the system runs out of disk space, the job status shows “running”. When disk space is cleaned up, and the N1 System Manager is restarted, the job status changes to “complete” even though the Create OS job has failed.
The failed job's state will remain shown as “complete” and cannot be corrected.
Workaround:
Free up at least 3 Gbytes of disk space.
Stop and restart the N1 System Manager.
Resubmit the Create OS operation.
Filesystem monitoring does not work on a Linux provisionable server with a non-ext3 filesystem, even though the appropriate OS monitoring support has been added. Only ext3 filesystems can be monitored on Linux servers with the N1 System Manager.
Workaround: Reinstall the provisionable server with an OS profile that creates an ext3 filesystem.
Because the OS health monitoring agent caches the monitoring data every 5 minutes, setting the OS resource refresh interval to less than 5 minutes may retrieve existing cached data and have no apparent effect, which could lead to invalid conclusions of reported monitoring data.
Workaround: Set the OS resources monitoring interval to at least 5 minutes.
The OS Monitoring install job will time out and fail if the file system device name on the provisionable server is more than 20 characters in length. OS resource monitoring will not be available in this situation. This issue occurs most often on provisionable servers on which the logical volume management (LVM) feature of the operating system is being used. OS resource monitoring will not be available in this situation.
The Base management feature can still be added to the provisionable server, but not the OS monitoring feature.
Workaround: Make sure that the file system device name is within the 20 character limit.
If the provisioning network uses the IP address form x0.0.0.y (10.0.0.34) and if hostname resolution fails, the hostinstall.pl script will not be correctly generated and the script will not be able to contact the management server to configure the provisionable server. This issue affects both the add server server feature osmonitor and add server server feature basemanagement commands.
Workaround: Manually add the x0.0.0.y form IP to the hostinstall.pl script on the management server. On a Red Hat management server, edit the /var/opt/sun/scs/web/pub/hostinstall.pl file. On a Solaris management server, edit the /var/opt/SUNWscs/web/pub/hostinstall.pl file.
Line 33 should look like:
my @CSHostAddrs = ( 'ns1m','172.20.48.120' );
Add the IP address to the list:
my @CSHostAddrs = ( 'ns1m','172.20.48.120','10.0.0.1' );
When the total job load is high enough to prevent the next job in the queue from running, the Job Details screen shows the running jobs' status as “running”, and the status for other jobs is shown as “Not Started”. The queued jobs will run after one or more of the running jobs have completed and the total job load is low enough to allow the next job in the queue to run.
See Job Queuing Behavior in Sun N1 System Manager 1.1 for further information.
Non-ASCII objects created using the N1 System Manager display random characters if you start N1 System Manager in following ways:
Running the /etc/init.d/n1sminit command in a non-UTF8 locale
Rebooting the management server in a non-UTF8 locale
Workaround: Use either of the two following methods.
Temporary solution: set the LANG environment variable to the UTF8 locale and restart the N1 System Manager. For example:
# export LANG en_US.UTF-8 # /etc/init.d/n1sminit stop # /etc/init.d/n1sminit start |
Permanent solution:
On a Solaris based management server:
Edit the file /etc/default/init and change the LANG value to en_US.UTF-8.
On a Linux based management server:
Edit the file /etc/sysconfig/i18n and change the LANG value to en_US.UTF-8.
The Create OS Profile wizard incorrectly shows ja_JP.EUC_JP as a selection. If ja_JP.EUC_JP is selected, then the load OS process will fail.
Workaround:
Specify ja_JP.UTF-8 when creating the OS profile: and deploy it to the provisionable server, for example:
N1-ok> set osprofile osprofile_name language=ja_JP.UTF-8
Deploy the profile using the load server command or the Load OS wizard in the browser interface.
Log in to the provisionable server as root, and open the /etc/default/init file for edit.
Replace the text ja_JP.UTF-8 with ja_JP.eucJP, then save and close the /etc/default/init file.
Clicking the Help button in a localized environment causes all text in the browser interface Command Line pane to display in English, except for tab completion text.
Workaround: Refresh the browser interface by clicking the browser Reload button.
If you specify a OS Distribution name with non-ASCII characters during the create os command, the OS distribution name will not display properly with the show os command.
The load server command fails to install ALOM firmware if the firmware name is non-ASCII.
Workaround: Change the firmware name to ASCII using the set firmware command.
The Python version (2.3) on a default Solaris management server does not provide adequate internationalization support for the n1sh command.
Workaround: Install Python 2.4 or later on the Solaris management server. The Python executable must be /usr/bin/python2.4.