This chapter describes the Sun N1TM System Manager 1.2 issues that are known to be problems.
Here is a list of updates and changes for the N1 System Manager 1.2 release:
New hardware support: Sun Fire X2100, Sun Fire T1000, Sun Fire T2000. Serial console and firmware update features are not available for the Sun Fire X2100 server.
New OS support on specific provisionable servers: Red Hat Enterprise Linux 3.0 WS, Red Hat Enterprise Linux 4.0 WS, and SUSE Linux Professional 9.2/9.3.
New server list filtering support in the browser interface and command line.
New advanced sorting on the server list in the browser interface.
Job details output provides the associated command run by user.
This section lists the features and software that are not supported in Sun N1 System Manager 1.2 release.
The N1 System Manager product is localized only for Japanese. The browser interface help system and command line help pages are in English only, however, Japanese versions of the entire command line help pages and important areas of the browser interface help system are available in the Sun N1 System Manager 1.2 Command Line Reference Guide (Ja, 819-4875) and the Sun N1 System Manager 1.2 Administration Guide (Ja, 819-4873), respectively.
The serial console and firmware update features are not available for the Sun Fire X2100 server.
This section describes known documentation updates, including documentation errata.
In the DHCP Service Conflict With N1 Grid Service Provisioning System in Sun N1 System Manager 1.2 Installation and Configuration Guide (page 37), "Sun N1 Grid Service Provisioning System" should read "Sun N1 Service Provisioning System", and "ISP" should read "OS Provisioning.”
This section provides documentation errata in the command line help pages.
Some parts of the command line tab help and the help pages show incorrect information about what to enter for IPMI credentials. All IPMI credential values in the command line require a user name and password pair value instead of just a password as documented. All IPMI values must be in the following syntax: user-name/password
Sun Fire V20z and V40z servers require only a password for IPMI credentials.
You can enter a question mark (?) for any password attribute value if you do not want the password to display in the command line. Once you enter the command, you are prompted for the password. Examples include the rootpassword and agentssh attributes.
The force and netboot attributes are documented in the command line help pages without a corresponding value. You must specify true as their values to provide a valid command, such as force=true or force true.
The default role for the root user is automatically set to Admin after you reboot the management server or if you restart the N1 System Manager. It is still possible to set the root user's default role to a different role, but this is not a permanent assignment.
The following attributes have been added for the load group command and when specifying multiple servers with the load server command:
bootnetworkdevice bootnetworkdevice – The server's provisioning network interface is used to install the server. Valid Solaris values are bge0 (default), bge1, bge2, and bge3. Valid Linux values are eth0 (default), eth1, eth2, eth3, and eth4.
networkdevice networkdevice – (Linux only) The server's provisioning network interface after the server is installed. Default is eth0.
When installing the Red Hat 4 OS on Sun Fire X2100 servers, the bootnetworkdevice and networkdevice values must both be set to eth1. The default values do not work for this situation.
This section describes known N1 System Manager installation issues.
If the N1 System Manager installation process is interrupted and restarted, the N1 System Manager installation can fail in step 5, “Install OS provisioning components”. If this issue occurs, a subsequent uninstall and reinstall of the N1 System Manager will fail.
The installation log file /var/tmp/installer.log.latest shows the following after initial installation failure:
Installing Master Server ... Error! Missing file (looked for /opt/SUNWn1sps /N1_Grid_Service_Provisioning_System_5.1 /server/postgres/postgresql.conf.in)! print() on closed filehandle GEN0 at /usr/perl5/5.8.4/lib/i86pc-solaris-64int/IO/Handle.pm line 399. SPS install failed with exit status: 256 ----------------------------- 2k. Which port should Postgres listen on? (default: 5434) [1024-65535] spawn id(3) is not a tty. Not changing mode at /usr/perl5/site_perl/5.8.4/Expect.pm line 375. admin admin admin ** Invalid Input. Enter a numeric value for the port number. 2k. Which port should Postgres listen on? (default: 5434) [1024-65535] spawn id(3) is not a tty. Not changing mode at /usr/perl5/site_perl/5.8.4/Expect.pm line 375. admin admin admin ** Invalid Input. Enter a numeric value for the port number. 2k. Which port should Postgres listen on? (default: 5434) [1024-65535
The installation log shows the following after uninstall and reinstall of the N1 System Manager software:
Error! Failed to initialize the database (exit value was 1). Exiting.. print() on closed filehandle GEN0 at /usr/lib/perl5/5.8.0 /i386-linux-thread-multi/IO/Handle.pm line 395. SPS install failed with exit status: 256 |
Workaround: Perform the workaround procedure below that is applicable to the operating system installed on your management server. Depending on how the installation error occurred, some of the workaround steps might not complete successfully. If a workaround step does not complete successfully, go to the next step.
Solaris based Sun Fire X4100 or X4200 management server:
Stop the server and agent.
# su - n1gsps -c "/opt/SUNWn1sps/N1_Service_Provisioning_System_5.1/server/bin/cr_server stop" # su - n1gsps -c "/opt/SUNWn1sps/N1_Service_Provisioning_System/agent/bin/cr_agent stop" |
Uninstall service provisioning manually.
# /opt/SUNWn1sps/N1_Service_Provisioning_System_5.1/cli/bin/cr_uninstall_cli.sh # /opt/SUNWn1sps/N1_Service_Provisioning_System_5.1/server/bin/cr_uninstall_ms.sh |
Remove the following packages.
SUNWspsc1
SUNWspsms
SUNWspsml
# pkgrm SUNWspsc1 # pkgrm SUNWspsms # pkgrm SUNWspscl |
Type y in response to prompts asking “Do you want to remove this package? [y,n,?,q]”. If the message pkgrm: ERROR: no package associated with SUNWspscl appears, that package has already removed by step 2. Continue removing packages.
Delete the service provisioning directory and files.
# cd / # rm -rf /opt/SUNWn1sps/ # rm /n1gc-setup/sps/state # rm /n1gc-setup/state/0installSPS.pl.state |
Reboot the management server and then install the N1 System Manager software.
Linux based Sun Fire X4100 or X4200 management server:
Stop the server and agent.
# su - n1gsps -c "/opt/sun/N1_Service_Provisioning_System_5.1/server/bin/cr_server stop" # su - n1gsps -c "/opt/sun/N1_Service_Provisioning_System/agent/bin/cr_agent stop" |
Delete the service provisioning directory and files.
# cd / # rm -rf /opt/sun/N1_Grid_Service_Provisioning_System_5.1 # rm -rf /opt/sun/N1_Grid_Service_Provisioning_System # rm -rf /opt/sun/N1_Service_Provisioning_System # rm -rf /opt/sun/N1_Service_Provisioning_System_5.1 # rm /n1gc-setup/sps/state # rm /n1gc-setup/state/0installSPS.pl.state |
Reboot the management server and then install the N1 System Manager software.
This section describes known security issues.
The event refresher frame reloads every 10 seconds, which updates the user's session timestamp. Therefore, the browser interface session will never time out.
Workaround: Explicitly log out when you are done using the browser interface.
This section describes known performance issues.
The delete server and set server monitored commands may take a long time to complete. These commands do not generate a job, so you have to wait until the command finishes before you can submit another command. This is especially important to note if you try to enable/disable monitoring on a group of servers.
This section describes known discovery issues.
When discovering a large group of servers (30 to 40 servers), the hardware model value does not display in the server list for some servers. For example, server 10.18.0.38 does not have the hardware information listed:
Name Hardware Hardware Health Power OS Usage OS Resource Health 10.18.0.31 V20z Good On - - 10.18.0.32 V20z Good On - - 10.18.0.33 V20z Good On - - 10.18.0.35 V20z Good On - - 10.18.0.36 V20z Good On - - 10.18.0.37 V20z Good On - - 10.18.0.38 - Good On - - 10.18.0.39 V20z Good On - - |
Workaround: Use the refresh attribute with the set server or set group command to update the servers in the list.
This section describes the known OS provisioning (deployment) issues.
OS deployment of Red Hat Linux 3.0 Update 2 might stop and enter interactive mode due to a time out issue. This is an intermittent problem.
Workaround: Stop the OS deployment job and retry the OS deployment again. If OS deployment consistently fails, you will have to use a later version of the Red Hat OS.
When specifying the swap partition for an OS profile using either the browser interface or the command line, you must specify an unrequired mount point. If you specify a mount point, a separate file system is actually created.
Workaround: Specify swap as the mount point for the swap partition, which serves as a placeholder and is ignored. The following is a command line example:
add osprofile myprofile partition swap type swap size 1024 device c0t0d0s1 sizeoption fixed |
The baud rate for the BIOS console must be set to 9600 (default) or OS deployment to a Sun Fire V20z or V40z server will fail. This means that you cannot change the consolebaudrate value in the load server command or the Load OS wizard in the browser interface.
If the SP console baud rate is set to something other than 9600, the OS deployment will succeed but the console through the connect server command will display garbage characters.
Workaround: You must change the baud rate for the BIOS console manually after an OS deployment. To do this, reboot the target server and enter the BIOS setup screen during the boot sequence. Consult the server's user manual to see how to change its BIOS settings.
If you specify a period in an OS profile name when using the browser interface wizard to create an OS profile, an "invalid profile name" error occurs. A period should be an acceptable character for an OS profile name.
Workaround: If you want to include a period in the OS profile name, use the create osprofile command.
Once you load a SUSE OS profile on a Sun Fire X4000 series server, that same OS profile and associated OS distribution cannot be used on Sun Fire V20z and V40z servers. Loading a SUSE OS profile on a Sun Fire X4000 series server actually modifies the associated SUSE OS distribution, which makes that OS distribution unusable by Sun Fire V20z and V40z servers.
Workaround: You must create separate SUSE OS distributions and OS profiles for Sun Fire V20z/V40z servers and Sun Fire X4000 series servers.
This section describes the known browser interface and command line interface issues.
If the management IP addresses of two discovered servers are swapped, the detailed server information displayed for each of the servers with the swapped addresses will be the information for the other swapped server. For example, if server A and server B have their management IP addresses swapped, “show server A” will show server B's information and “show server B” will show server A's information.
Workaround: Delete both of the servers with swapped IP addresses and then rediscover them. This will result in any user supplied information about the server being lost.
The browser interface uses frames that are synchronized. If you click the browser's Back button in one of the frames, the frames can get out of sync.
Workaround: Press F5 or refresh the page to synchronize the frames.
The last line of the serial console launched from the N1 System Manager browser interface is not displayed in the serial console window.
Workaround: Press Enter or Return to display the last line.
The applet used for serial console access from the Web Browser interface uses SSHv1 only for communication back to the N1 System Manager management server. This feature requires enabling SSHv1 for the N1 System Manager management server.
Workaround: If you do not want to enable SSHv1 and the serial console Web Browser interface, you can use the serial console feature from the n1sh command line interface.
To use the Serial Console feature from the Server Details page in the browser interface, the Sun Java Plugin 1.4.2 or later must be installed on the system where you are running the browser. Not all of the supported browsers for the N1 System Manager have this installed.
The browser interface server details and the show server command displays the wrong swap information for Sun Fire X4100 and Sun Fire X4200 Servers with Firmware Level 6464 and the Red Hat Operating System.
Workaround: Use the serial console to access the server and find out the correct swap information by using the top command.
This section describes known firmware update issues.
Dual-core Sun Fire V20z and Sun Fire V40z servers require a 2.3.x and greater firmware revision. N1 System Manager does not prevent you from deploying firmware revisions below 2.3.x. Deploying firmware revisions below 2.3.x may result in issues with the server's service processor.
Workaround: Double check the firmware revision before updating.
If the ftp service is not enabled on the management server, firmware updates on ALOM-based servers fail with the following error message in the job output:
An exception occurred trying to update server-name. Please refer to the log file for more information. |
Workaround: Enable the ftp service on the management server. See Enabling FTP on the Management Server in Sun N1 System Manager 1.2 Site Preparation Guide for details.
The create update command allows you to create a Linux OS update from a Solaris package, even though this is not a valid procedure. If you happen to do this and then try to install the OS update on a Linux system, the Update job is accepted but it eventually fails with error messages that do not help diagnose the underlying problem.
Workaround: Make sure that the update is compatible with the installed OS. You can view a provisionable server's OS by using the show server command, and you can view the OS type for an OS update by using the show update command.
If a Solaris OS update installation fails, a copy of the admin file used for the installation is not removed from the provisionable server. If the failure was due to a corrupt or invalid admin file, subsequent OS update installations will not replace the faulty admin file and it may cause continued failures.
Workaround: Delete the package-filename.admin file in the provisionable server's /tmp directory and retry the OS update installation. If you specified a customized admin file for the OS update, ensure that the admin file is valid.
The create update command does not work if you specify a valid Solaris package or patch through a URL (http://). An error similar to the following is displayed:
# ./n1sh create update sol file http://10.11.1.35/scs/SVR4/SCSFpoppl.pkg ostype solaris10x86 File "http://10.11.1.35/scs/SVR4/SCSFpoppl.pkg" exists but is not a valid update file. |
Workaround: You must first download the package or patch to a location that is accessible from the management server and then specify a fully qualified path to the package or patch.
The create notification command fails if an apostrophe is used for the description attribute.
Workaround: Escape the apostrophe with another apostrophe (for example, Support”s Notification) or do not use an apostrophe in the description.
Even after all jobs are finished running, the clock icon next to the servers in the View Selector section may still display, which is a problem with the refresh feature.
Workaround: Click the Refresh button or press F5 to refresh the browser interface.
If a Create OS job is running and the management server runs out of disk space, the job status shows “running”. When disk space is cleaned up, and the N1 System Manager is restarted, the job status changes to “complete” even though the Create OS job has failed.
The failed job's state will remain shown as “complete” and cannot be corrected.
Workaround:
Free up at least 3 Gbytes of disk space.
Stop and restart the N1 System Manager.
Resubmit the Create OS operation.
When the total job load is high enough to prevent the next job in the queue from running, the Job Details screen shows the running jobs' status as “running”, and the status for other jobs is shown as “Not Started”. The queued jobs will run after one or more of the running jobs have completed and the total job load is low enough to allow the next job in the queue to run.
See Job Queueing in Sun N1 System Manager 1.2 Administration Guide for further information.
If you upgrade the OS monitoring agents to the 1.2 release by using the agentupgrade script, the following memory threshold values are not updated properly:
memusage.mbmemused (Memory in use)
msmusage.mbmemfree (Memory free)
memusage.pctmemused (Percentage of memory in use)
memusage.pctmemfree (Percentage of memory free)
This issue may put the associated provisionable servers in a “Failed Critical” state.
Workaround: After you upgrade the OS monitoring agents, use the set server or set group commands to change the threshold values on the servers.
If an OS monitor threshold is exceeded, email notifications may stop being sent. This is an intermittent problem.
Workaround: Perform the following procedure on the management server.
Shutdown N1 System Manager.
# /etc/init.d/n1sminit stop |
For a Red Hat management server, change directory to /etc/opt/sun/cacao/modules.
# cd /etc/opt/sun/cacao/modules |
For a Solaris management server, change directory to /etc/opt/SUNWcacao/modules.
# cd /etc/opt/sun/SUNWcacao/modules |
Backup the XML files that need to be updated.
# cp coreservicemodule.xml coreservicemodule.xml.save # cp com.sun.hss.domain.xml com.sun.hss.domain.xml.save |
Remove the following lines from the coreservicemodule.xml file.
<path-element> file:/opt/sun/n1gc/lib/activation.jar </path-element> <path-element> file:/opt/sun/n1gc/lib/mail.jar </path-element>
Remove the following lines from the com.sun.hss.domain.xml file.
<path-element> file:/opt/sun/n1gc/lib/mailapi.jar </path-element> <path-element> file:/opt/sun/n1gc/lib/imap.jar </path-element> <path-element> file:/opt/sun/n1gc/lib/pop3.jar </path-element> <path-element> file:/opt/sun/n1gc/lib/smtp.jar </path-element>
Add the following line to the com.sun.hss.domain.xml file.
<path-element> file:/opt/sun/n1gc/lib/mail.jar </path-element>
Start the N1 System Manager.
# /etc/init.d/n1sminit start |
Changing a server's agentssh value by using the set server command does not work.
Workaround: If you want to change a server's agentssh value, you must delete the server, discover it again, and use the add server command to set the agentssh value.
Setting the fsusage.pctused threshold using the set group command does not work. For example:
N1-ok> set group test-systems threshold fsusage.pctused warninghigh=70 criticalhigh=80 Invalid value "fsusage.pctused". |
Workaround: You must set the fsusage.pctused threshold on a single server by using the set server command. You can create an n1sh customized script to help automate this procedure for a large number of servers. See To Run a Script of N1 System Manager Commands in Sun N1 System Manager 1.2 Administration Guide for more information.
After you reset (reboot) a provisionable server using the reset command or the Reset menu item in the browser interface, the server's OS resource health state changes to “Failed critical” until a refresh occurs after five to ten minutes. This happens even though the server's OS health state is good.
Workaround: After you reset a provisionable server, you can refresh the server using the set server refresh command, or wait five to ten minutes for the server's status to refresh automatically.
Non-ASCII objects created using the N1 System Manager display random characters if you start N1 System Manager in following ways:
Running the /etc/init.d/n1sminit command in a non-UTF8 locale
Rebooting the management server in a non-UTF8 locale
Workaround: Use either of the two following methods.
Temporary solution: set the LANG environment variable to the UTF8 locale on the management server and restart the N1 System Manager. For example:
# export LANG en_US.UTF-8 # /etc/init.d/n1sminit stop # /etc/init.d/n1sminit start |
Permanent solution:
On a Solaris based management server:
Edit the file /etc/default/init and change the LANG value to en_US.UTF-8.
On a Linux based management server:
Edit the file /etc/sysconfig/i18n and change the LANG value to en_US.UTF-8.
The load server command fails to install ALOM firmware if the firmware name is non-ASCII.
Workaround: Change the firmware name to ASCII using the set firmware command.
The Python version (2.3) on a default Solaris management server does not provide adequate internationalization support for the n1sh command.
Workaround: Install Python 2.4 or later on the Solaris management server. The Python executable must be /usr/bin/python2.4.
If you deploy Solaris 10 with an OS profile that has a particular installation language set, the installation is performed in interactive mode and you must select a language when prompted. The deploy OS job will eventually time out if you do not make the language selection. The following languages create this behavior:
ja_JP.eucJP
no_NO.ISO8859-1
th_TH.TIS620
ko_KR.UTF-8
sh_BA.ISO8859-2
zh_CN.EUC
zh_CN.UTF-8
Workaround: Because the installation is no longer automated, you must monitor the deployment through the server's serial console and make the language selection. You can choose Serial Console from the Actions menu in the browser interface or use the connect server command.
This problem occurs if an ALOM-based, Solaris provisionable server is running Non–English locale and you power on and off the server using the stop and start commands, respectively.
Workaround: Use the reset server command on the server.
If a Solaris SPARC provisioning server is running a non-English locale, the package information does not display in the server details output.
Workaround: Edit the file /etc/default/init on the provisionable server, change the LANG value to en_US.UTF-8, and reboot the server.
When adding OS monitoring agents to a provisionable server, there are two situations that fail because the server is running a non-English locale.
Issue 1: "Either invalid credentials were used or ssh key in known_hosts file has changed" error message.
Issue 2: "Driver not found for the specified device" error message for Solaris 10 HW2 provisionable servers.
Workaround:
Issue 1: Delete the provisionable server's SSH credentials in the management server's known_hosts file: /.ssh/known_hosts file for a Solaris management server and /root/.ssh/known_hosts for a Red Hat management server. Then, try to add the OS monitoring agents again.
Issue 2: There is no workaround for a non-English locale. You must run the management server in the en_US.UTF-8 locale and reinstall Solaris 10 HW2 with the en_US.UTF-8 locale on the provisionable server. Then, try to add the OS monitoring agents again.