The installation process is segmented to isolate most problems to a specific phase of the installation. The primary goal is to simplify the debugging of problems in the installation.
This chapter contains 4 sections:
Section 1 contains the most common problems encountered in the installation process.
Section 2 discusses problems that are encountered after building and installing the SNC kernel and booting it.
Section 3 discusses problems encountered after installing the Sun hardware and booting the kernel.
Section 4 discusses problems that are encountered after integration of the hardware and software and verification.
"Watchdog Reset" errors during boot.
Backplane jumpers misconfigured or switches or jumpers on the SNC are set incorrectly. Correct the jumper or switch problem and reboot.
"Device Not Found" errors during boot.
Backplane jumpers misconfigured or switches or jumpers on the SNC are set
incorrectly or incompatible board or incorrect config file addresses. Correct
the jumper problem and reboot.
"Spurious Interrupt" error during boot.
Backplane jumpers or switches/jumpers on an SNC are misconfigured. Correct
the problem and reboot.
"Bus Timeout" errors during boot.
Backplane jumpers are misconfigured. Correct the jumper problem and reboot.
"No Such Device" error when attempting to neload.
Failed to make the devices for the SNC. Make the devices for the ne interface
using /dev/MAKEDEV.ne
Bad File error during boot process.
The kernel can not be found by the boot program. This can be caused by failure
to move the kernel to the right location after it was built or a failure to create a
link to it.
Extremely slow NFS traffic and frequent "NFS Server Not Responding" messages, and low CPU utilization.
Varying panics including "Bad Trap," "Bus Error," and "Data Fault" while running in single or multi-user mode.
This problem can be caused by a CPU board not up to the most recent revision.
The SNC does not off-load any NFS processing.
Use the nfstat command to determine if the specified SNC board is preprocessing NFS requests.
Check the
Check the onboard NFS parameters with the sncnet(8) command. If the value returned is zero (0), set it to one (1) with the sncnet command.
Check the neX configuration flags with the ifconfig(8c) command. If the PROMISC flag is set, the SNC is operating in promiscuous mode, and all received packets are forwarded to the host processing. Terminate the application that has enabled promiscuous mode processing. Some examples of applications that set promiscuous mode include etherfind, traffic, and snoop.
Use the nestat neX command to verify that the NFS requests are being processed by the SNC. If the NFS request count returned is 0, then verify that NFS requests are coming from the client to the server on the network segment that the SNC is connected to. If the client is using a generic name (see Section F.5, "NFS Mounts), then verify that
was used to associate an alternate IP address with the SNC board.
The SNC is recognized at boot time, but does not receive the software download.
Check the device number assigned in
panic: out of kernelmap for devices
This panic can occur if all loaded devices compete for more than the available amount of kernelmap address space. The following full SPARCserver 690 configuration exceeds the total amount of available kernelmap space:
If the system has a cgsix card installed, remove the card and use tty for the console; the system will boot.
Two or more SNC's were installed, but only one is recognized.
Check the kernel configuration file. Make sure the
If the
If there is a problem with the kernel file, boot with your old kernel to get your system up and running. Most errors in this phase of the installation are related to misplacing the SNC kernel or the links to it. Check this first. Another common problem encountered in this phase is that the server does not have the complete SunOS distribution installed. Hence, the kernel built in this step has erroneous code.
Varying panics including "Bad Trap," "Bus Error," and "Data Fault" can be caused by a CPU board not up to the most recent Sun revision.
If the boot program can access the SNC kernel, then the most likely cause of
errors is typing errors in the kernel data and configuration files while installing
manually. These mistakes usually show up as syntax errors or other errors
during the
These types of errors should not occur during an automatic installation (i.e., one where the installation script performs the file editing for you).
Most hardware problems show up when you try to reboot your system. One common error in rebooting a system is failure to allow the disk drives to come up to full speed before you power up the system. Power up the disk drives first and wait for the ready indication before you power up the server. If you cannot see the drive's ready indication, wait 60 seconds before powering up the server.
The most common errors encountered in the hardware installation process are
related to backplane jumper configuration. These show up as bus timeout
errors, spurious interrupts, watchdog resets, and device not found errors.
Power down the system and double check the backplane jumpers. Se
Verify that all boards and cables are properly seated. We have found that the Ethernet cable can cause intermittent faults when spacers are left on the connector. The boards and cables should be firmly in place and not prone to wiggling. Power down the system, and press each of the boards and connectors firmly into place. Make sure that each board is held in place with 2 screws, then try to reboot.
Most of the problems encountered during system integration are the result of mistakes made during the software and hardware phases of the installation, but are not seen until you attempt to integrate the system. To recover from these problems it is usually necessary to go back to the software or hardware installation section.
Although you may have to redo some of the steps in the software or hardware installation sections, it is not necessary to de-install the hardware before you deal with software installation problems.
Spurious interrupts and system panics during multi-user operation can be caused by conflicts with interrupt vectors used by the hardware. See Appendix C for a discussion of how to correct this problem.
If your server failed to come up in multi-user mode, but boots successfully in
single user mode, this is usually a sign of a problem with the Ethernet
connection or with the
If, during the SNC software installation, you have configured out the Sun
native Ethernet interface (ie0 or le0), and you attempt to reboot your system
without any operational SNC boards installed, your system may hang during
the reboot if it depends on NIS. You must then reboot in single-user mode and
reconfigure your native interface (just create the required
Then connect the native interface to the Ethernet. Once you verified that the SNC interfaces are properly installed and operating, you can safely disable the native interface if you so desire.
A "No such device" error is caused by a failure to invoke a
If you observe no I/O activity on the SNC after installation, along with a "no power" indication on the transceiver unit, it is possible that the thin gray ribbon cable connecting the SNC board to the Ethernet socket is loose. Remove the SNC board and verify that the socket is snugly attached.
If you are using more than one Ethernet interface, and you encounter anomalous behavior, such as lost or duplicated Ethernet packets, reexamine the system configuration.
If you have two or more Ethernet interfaces using the same Ethernet segment, or if you have two or more interfaces on Ethernet segments that are connected by MAC-layer bridge, you must modify the /etc/rc.boot file for your server.
Note - By default, the Ethernet address of an SNC is initialized to the Ethernet address of the host system.
Each SNC has a factory-programmed, unique, Ethernet address. This address can be used by the SNC software instead of the Sun system address.
To change the /etc/rc.boot to do this, find the line in the file that reads
and change it to
and then reboot the system.
Note - If other systems communicate directly with the server (on the same IP network) through this interface, they must also be updated to match this change. Any outdated entry in these other machines' ARP tables must be deleted.