This chapter describes how to troubleshoot a computer running SunLink Server software. It identifies the various tools that are available to you for use in the troubleshooting process and provides a high-level approach to use whenever troubleshooting is required.
Troubleshooting SunLink Server systems involves gathering data about the problem and analyzing that data to determine the specific cause of the problem. The SunLink Server program includes a number of data-gathering tools. Additionally, more complex data-gathering tools may be available from your support personnel.
This chapter introduces the various tools that are provided with SunLink Server software and describes situations in which using them may be appropriate.
Administrators often can reduce the amount of time required to solve problems by observing the following guidelines:
Be aware of and familiar with the tools and services that can be used for server troubleshooting.
Configure the available server utilities to gather the necessary data as a general practice.
Assess the status of the server at regular intervals.
Follow a logical and comprehensive procedure when attempting to isolate a server problem.
There will be times when a particular problem requires more complex data-gathering than can be provided using the standard SunLink Server product package. In these situations, special debugging versions of the software may be needed to gather more detailed data about the problem. This type of data-gathering may require the assistance of a technical support person to help with instructions on how to use the tools involved.
SunLink Server provides a variety of tools that you can use as troubleshooting aids. These tools can be arranged into the following three categories:
Tools used for assessing the status of the server
Tools used for automatic notification of the status of the server
Tools used for debugging specific server problems
The following sections summarize the tools found in each category and briefly describe the use of each in a troubleshooting context.
The SunLink Server program includes multiple tools that you can use to assess the operational status of the server at any given time. Frequent assessment of server status will improve your ability as a server administrator to notice a problem or trend quickly.
Periodic review of server status will provide a fairly stable basis for understanding how a normal problem-free server appears. Over time, information that deviates from the norm will be an indication that something has changed and warrants your attention.
Tools for assessing the status of the server are discussed in the following sections.
A number of events related to the daily operation of the server can be tracked using the SunLink Server Manager event logs (see Chapter 3, Configuring and Managing SunLink Server Software). These events are maintained in one of three event logs: system, security, and application. Administrators should develop and implement an event logging policy and include a review of event logs as a regular part of troubleshooting activities.
Administrators will find it particularly useful to characterize the typical use of the server by manipulating event log data using a spreadsheet or word processing program. You can use this approach to generate a standard operating profile of the server and to predict trends in server usage.
You can also view event logs by using the elfread command. For more information, type man elfread at the SunLink Server command prompt.
SunLink Server maintains detailed statistics about its current usage as well as cumulative usage over a particular period of time. It is always helpful to review these statistics on a regular basis as well as when a server problem is encountered.
To view data about current server use, use the SunLink Server Manager Information view (see "How to View SunLink Server Information"). This provides details about current client-server sessions and the resources being used by those sessions:
Solaris user name of the current SunLink Server Manager session
Solaris server Name
Solaris hardware type
Solaris version
SunLink Server system name
SunLink Server system's domain name
SunLink Server system's role (if BDC, then the name of the PDC is also provided)
SunLink Server software version number
State of the server (stopped or running)
State of the Schedule Database wizard (scheduled or not scheduled)
To view cumulative server usage data, you can use the net statistics command at the SunLink Server command prompt. This command provides cumulative totals for a variety of server activities. Administrators who review the server statistics provided by using this command on a regular basis will find it easier to recognize and address changes in server operation.
The following statistics are maintained for the SunLink Server system, and are available by way of the net statistics command:
Table 6-1 Cumulative Statistics Descriptions
Statistic |
Description |
---|---|
Refreshed at |
Tells when this set of statistics began (either at the last server startup or the last time the statistics were cleared). |
Sessions accepted |
Tells how many times users connected to the server. |
Sessions timed-out |
Tells how many user sessions were closed because of inactivity. |
Sessions errored-out |
Tells how many user sessions ended because of error. |
Kilobytes sent |
Tells how many Kbytes of data the server transmitted. |
Kilobytes received |
Tells how many Kbytes of data the server received. |
Mean response time (msec) |
Tells the average response time for processing remote server requests. This always will be 0 for Solaris system servers. |
System errors |
This does not apply to Solaris system servers. |
Permission violations |
Tells when a user attempts to access resources without the required permissions. |
Password violations |
The number of incorrect passwords that were tried. |
Files accessed |
The number of files that were used. |
Comm devices accessed |
Not supported in the SunLink Server program. |
Print jobs spooled |
The number of print jobs that were spooled to printer queues on the server. |
Times buffers exhausted |
The number of shortages of big and request buffers. Always set to 0 for Solaris system servers. |
Administrators can display and control sessions between clients and the server. You can use this information to gauge the workload on a particular server.
To display session information from a Windows NT Workstation computer or a Windows client computer using Server Manager:
Start Server Manager.
Select the SunLink Server system about which you want to view session information.
Click on the USERS button.
You also can display session information using the net session command at the SunLink Server command prompt.
You may see sessions displayed that do not show user names. The sessions are a result of administrative activity and should not be deleted.
An administrator can disconnect a user from the server at any time. Closing a user session does not prevent the user from reconnecting.
To disconnect a user session from a Windows NT computer or from a Windows client computer using Server Manager:
Start Server Manager.
Select the SunLink Server system about which you want to view session information.
Click on the USERS button.
Highlight the user and select the Disconnect button.
You also can disconnect a user session by using the net session command at the SunLink Server command prompt.
When a user uses a shared file, the file is open. Sometimes a file will be left open, perhaps even with a lock on it, because of an application program error or some other problem. Such files will remain open and unavailable to other users. Administrators can close these files.
To close an open resource from a Windows NT computer or a Windows client computer using Server Manager:
Start Server Manager.
Select the SunLink Server whose data you want to view.
Click on the IN USE button.
Highlight the open resource and select the Close Resource button.
You also can close an open resource by using the net file command at the SunLink Server command prompt.
SunLink Server maintains a separate print log for each printer share and each Solaris system printer it uses. These log files record any message generated because of a printer fault or print job error.
An administrator should check these log files periodically to determine whether any such errors are occurring. The logs can be accessed from a client computer by linking to the PRINTLOG shared resource.
The logs also can be accessed from the server. They are in the following directory: /opt/lanman/shares/printlog
Quick response time is critical when dealing with server problems. Being aware of a problem at the time it occurs can decrease greatly the effect that the problem may have on the server user community.
You can configure SunLink Server software to notify specified users when a problem occurs. You can also configure the Solaris system to generate and notify you when problems occur. The following sections discuss these features.
SunLink Server software includes an Alerter service that you can use to notify specified users of the occurrence of a particular event. An administrator should use this service in order to make server problems known immediately. Prompt action to resolve server problems often can minimize their effect. The following examples illustrate situations that could generate alerts:
The number of server errors exceeds a threshold set in the SunLink Server Registry.
The number of bad access attempts exceeds a threshold set in the SunLink Server Registry.
The number of bad password attempts exceeds a threshold set in the SunLink Server Registry.
Errors were encountered during start of the Net Logon service.
A printer is malfunctioning.
A print request has been deleted or completed.
One of the benefits of SunLink Server software is the availability of the inherent scripting features provided by the Solaris operating system. Combining these features with the data-gathering tools provided by SunLink Server software, an administrator can create a powerful tool that can be used to assess the health of a SunLink Server system at any given time.
For example, using the Solaris system job scheduling feature (CRON), various data-gathering tools provided by SunLink Server, and some of the standard Solaris system commands for checking file system integrity and free space, administrators can write scripts that perform various system and server checks and then send the results to Solaris system administrators at regular intervals.
SunLink Server software includes Solaris system commands that you can use to troubleshoot server problems. You execute these commands at the SunLink Server command prompt. This section summarizes these commands and describes the roles they can play in troubleshooting a server.
For more information about each command, type man command at the SunLink Server command prompt.
The lmshell command is useful for emulating an MS-DOS client session when you do not have access to an actual MS-DOS client. This command is especially useful when troubleshooting a connectivity problem between a client and server. Using the lmshell command, you can mimic a client logon and resource linking by executing the net logon and net use commands in lmshell at the SunLink Server command prompt.
The lmstat command interrogates the server's shared memory image to gather a variety of data about the current state of the server. This command is especially useful when you want to determine which server process a client session is on.
SunLink Server software is composed of a set of cooperative processes. When the server is running, enter the following command:
ps -ef | grep lmx
Executing this command generates a display similar to the following:
root 17726 1 0 12:03:36 0:00 lmx.alerter
root 17713 17461 0 12:03:32 0:00 lmx.srv -s 1
root 17722 17874 0 12:03:35 0:00 lmx.srv -s 2
root 17726 1 0 12:03:36 0:01 lmx.dmn
root 17728 1 0 12:03:36 0:01 lmx.browser
root 17744 1 0 12:03:28 0:00 lmx.ctrl
In this example, there are two lmx.srv server processes (17713 and 17722). The server may have nine clients with current sessions.
How does the administrator know to which lmx.srv process a client is connected? Executing the lmstat -c command at the server prompt usually provides the answer. The system displays output similar to the following:
Clients:
BANANA.SERVE~X (nwnum=0, vcnum=0) on 17713
ORANGE (nwnum=0, vcnum=0) on 17713
PEAR (nwnum=0, vcnum=0) on 17722
Notice that each client name has an associated process ID number. This is the process ID of the lmx.srv process that currently is serving that client. The vcnum value specifies whether this is the client computer's first VC or an additional one.
Being able to determine the process ID of the lmx.srv process that is serving a client is particularly useful when using lmstat -w or the Solaris system truss( ) command. Both commands require a process ID as part of their startup arguments. (The -w option is not valid on all operating systems.)
The regconfig command is used to query or change SunLink Server Registry key information. You can use this command to change any value in the Registry. (You also can use the Windows NT Registry Editor to change key values.)
You can also use the regconfig command to reinitialize the SunLink Server Registry with system defaults.
For more information about the Registry, see Appendix A, SunLink Server Registry.
The regcheck command is used to check and repair the SunLink Server Registry file. This command checks only the internal structure of the SunLink Server Registry file; it does not check the validity of any data that may be stored in it.
If the internal structure of the Registry file is found to be invalid, use the regcheck command to make the necessary repairs.
The samcheck command is used to check, dump, and fix the SAM database. You can use this command to determine whether the user accounts database has been corrupted and optionally, to fix it.
The samcheck command also can be used to output the contents of the user accounts database to stdout in human-readable format.
The srvconfig command is used to display the current default settings of all the server parameters in the lanman.ini file. (It also is a good way to check the location and spelling of any parameter you want to modify.)
The lanman.ini file contains several configuration parameters that you can modify. Default settings are used for most of these parameters. However, a certain number of them can be changed, overriding the default values set at server installation.
To display the default settings of the lanman.ini file, use the following command:
srvconfig -p | more
This command generates a listing of all of the parameters in the lanman.ini file and their default settings.
The acladm command is used to check and repair problems found in the Access Control List.
Be sure to examine the options that are available with this command before executing it. Type the man acladm command at the SunLink Server command prompt.
SunLink Server troubleshooting involves using a systematic approach to isolate the problem and then gathering detailed data in order to identify the specific module causing the problem. The following sections provide simple procedures that you can use to isolate a server problem. It then offers some suggestions on how to gather additional information about the problem.
The SunLink Server program runs on a Solaris system computer. The server depends upon a fully functional NetBIOS network to perform its file- and print-serving functions.
A NetBIOS network typically includes the following components: an application that provides a NetBIOS protocol interface; an application that provides a network transport protocol interface, such as TCP/IP (although some transport implementations include NetBIOS within a common module); and an application that provides drivers for the network adapter interface (which also may be part of the transport module).
Every NetBIOS network component must be configured and operational in order for SunLink Server to function in a network environment. Additionally, similar modules must be functioning on the machine that is attempting to use the file and print services of the SunLink Server program, such as a Windows NT Workstation computer or Microsoft Windows client computer.
When a NetBIOS network is not available, the system typically displays the following message when you start the server:
unable to post servername on any network
Reviewing all of the modules involved in the end-to-end connection between a client and SunLink Server, it is easy to see that isolating a problem is the first step for problem solving in a client-server networking environment.
Before assuming that the problem is with the server, you must ensure that other networking software is functioning properly. This is particularly true with new installations in which the opportunity for a transport or physical network problem is the greatest.
It is fruitless to perform an exhaustive check of every layer of software for a problem that affects only a single client or user. Experience will help you to determine when to use a comprehensive problem isolation procedure or a server-specific problem isolation procedure. The following sections offer guidelines on how to perform both procedures. Use the one that best fits your current problem description.
Before assuming that the server is the cause of all network problems, it is worthwhile to perform checks to verify the sanity of the network. This is particularly important when all or a very large portion of server users are reporting a problem at the same time.
Use the following steps to verify the sanity of the network.
The first item to check is the physical network. The majority of today's networking hardware provides status indicators that you can use to assess the state of the various network links (for example, 10-BASE-T Hubs use LEDs). Always check these links for any signs of problems with the physical network such as excessive re-transmissions, link Integrity mismatches, and jabber conditions.
Even in cases in which only a single client is affected, never assume that is it not a bad cable connection. For a single client it is easy to check to determine whether the problem occurs regardless of which server the client tries to use.
If a client cannot "see" anything on a network that is otherwise functioning without incident, then it is safe to assume that the problem is related to that client's network configuration. If however, that same client can see other nodes on the network but cannot connect to a particular server, then the network path to that server, the server itself, or the account being used by that client are likely candidates for trouble.
There are several third-party products available that you can use to monitor the health of the physical network. It is worthwhile to check network traffic periodically with one of these devices to see whether there are problems occurring with the physical network.
If the physical network appears to be functioning properly, the next step is to determine whether the various computers on the network can "see" each other from the perspective of a transport protocol. Most transport protocol applications include a connectivity test tool that can be used to verify connectivity at the transport level between a client and the server over the network.
If you cannot reach a server machine from a particular client with the ping command, then neither will that client computer be able to connect to the server. If you cannot ping a server from several client computers, then one of the following conditions may be present: the server is not running, the transport protocol is not running, or there is a configuration problem that is disrupting network connectivity.
Review the recommendations in your transport protocol software documentation. If appropriate, continue with the procedures described later in this section on assessing the status of the NetBIOS protocol and SunLink Server software.
Check the NetBIOS protocol layer. Most NetBIOS modules provide test tools that test the connectivity between NetBIOS names over the network.
Connectivity between nodes using TCP/IP may be available, but if connectivity between NetBIOS names is not working then SunLink Server software will not work. All SunLink Server communications are based on NetBIOS name sessions. Use the test tools provided with your protocol software to verify NetBIOS level connectivity. If you find a problem, isolate it according to the information provided with the NetBIOS protocol documentation.
If all of the network connectivity modules check out properly, the next item to verify is the Solaris operating environment on the computer hosting the SunLink Server program. The operating system provides a variety of log files and system checks that can be performed to verify proper operation. For information on these checks, see your Solaris system administrator documentation.
SunLink Server software is particularly sensitive to the following system problems:
Insufficient disk space in critical file systems such as root ( / ) or /var
Insufficient system memory causing excessive swapping
CPU bound conditions
Unbalanced disk loads
Improperly tuned kernel parameters such as maximum number of open files
Operating system problems usually will affect all or most client computers connected to the server. Do not spend much time on this step if you are troubleshooting an individual client problem.
If you determine that all of the underlying software is functioning properly, then you should check the SunLink Server system for problems. Problem isolation on the server often is dependent on the type of problem reported by the user community.
If only a single user is experiencing a problem, then you can narrow your focus quickly to the operations that this user is attempting to perform.
If a group of users is experiencing problems but many other users are not, then you should look for a common thread among the users with problems. For example:
Are they on the same hub?
Are they using the same applications or printers?
Are they on the same lmx.srv process?
Are they members of the same SunLink Server group?
If all users of a server are experiencing a problem, then you should start with more basic assessments of the state of the server. These are described in the following sections.
It is worthwhile to verify that the server is actually running. You can do this easily by entering the following command at the system command prompt:
ps -ef | grep lmx
The system display should include the following (at a minimum):
root 3554 3452 Feb28 19:39 lmx.srv -s 1
root 3452 1 0 Feb28 5:03 lmx.ctrl
root 3568 1 0 Feb28 2:16 lmx.dmn
This display indicates that the three required server processes are in fact running, the daemon (lmx.dmn), the control process (lmx.ctrl) and at least one worker process (lmx.srv). You also may see other processes, such as lmx.browser and lmx.alerter.
Additional multiple worker processes, each with a unique number displayed at the end of the line, may be displayed. The server spawns new worker processes based on the number of clients supported by the server. As more client sessions are started, more lmx.srv processes may be started, each with a unique process ID and number. This is normal.
If the server is not running, use the net start server command at the command prompt.
If one of the required server processes is not running, determine whether all of the server services started properly. A situation can occur when several server processes are running but you still cannot use the server because a particular service did not start. This is especially true for the Net Logon service. To check which services are running, enter the following command at the command prompt:
net start
The system displays a list of the services that currently are active on the server.
It is critical that the Net Logon and Server services are displayed. If they are not shown, then the server has a problem. Often the Net Logon service will not start because of a problem with the server name, domain name, or domain configuration.
Check the error logs for problems as described in the next section.
Always check the error logs used by the server. You can view the system, security, and application logs from a client computer using Event Viewer, from the SunLink Server system using SunLink Server Manager, or at the system console using the elfread command. You also can view the logs in the PRINTLOG share area if there is a printing-related problem. For problems related to server startup, you can check the lmxstart.log located in the /var/opt/lanman/logs directory.
If there are entries in any of these logs, save them for future reference. Never discard or overwrite error messages since they may indicate the cause of the problem. These logs may have to be supplied to support personnel at a later date.
The following message is particularly indicative of a server problem:
A server process has unexpectedly terminated
This message indicates that a server process has encountered an unexpected error. Depending on how your server is configured, there may be a core file located on your system.
If the value of the CoreOk keyword is set to 1 (yes) in the SunLink Server Registry, then a core file is located somewhere on the system. The CoreOk value is in the following key:
SYSTEM\CurrentControlSet\Services\ AdvancedServer\ProcessParameters
Go to the root directory, and execute the following command to search the file system for core files:
find . -name "core*" -print
Save any files that you may find. If the coreok parameter is set to no, then core files will not be created. You may want to set the CoreOk keyword to yes in order to capture core files, which are useful for debugging purposes.
Some server resources are shared automatically every time the server is started. These resources are used in the background by clients while performing other server activities.
The list of resources shared by default includes:
ADMIN$
C$
D$
IPC$
LIB
NETLOGON
PRINTLOG
PRINT$
USERS
The resources followed by a dollar sign ($) are special resources required for server administration and communication. (An additional special resource -- REPL$ -- is available when the Directory Replicator service is running.)
Never attempt to delete or re-share these resources. If any of these resources are absent, the server will not function properly. If you detect that one of these resources is missing, stop and restart the server to determine whether they are shared at server startup. If they are not displayed, contact your service representative.
The remaining resources are default resources typically used by clients during logon (NETLOGON), to connect to home directories (USERS), and to access utilities or error logs (DOSUTIL, OS2UTIL, PRINTLOG). These items may be deliberately absent from your server. However, if you did not unshare them, then a problem with the server caused them to be removed.
You can conduct a simple test to determine whether the server is communicating over the network. Issue the following command at the system console.
net view
The system displays the name of the server and other servers operating in the same domain. If your server name is displayed, execute the same command, adding the server name:
net view \\asutrial
The system displays a list of shared resources similar to the following:
Shared resources at \\asutrial
SunLink Server Systems
Sharename Type Used as Comment
----------------------------------------------------------------
DOSUTIL Disk DOS Utilities
LIB Disk Programming Aids
NETLOGON Disk Logon Scripts Directory
OS2UTIL Disk OS/2 Utilities
PRINTLOG Disk LP Printer Messages
USERS Disk User Directory
Other entries may be displayed if you added shared resources to your server.
If either of these commands fails consistently, then there is a problem with broadcast communications over the network. If these commands succeed, you can use the tests in the next section.
When a connectivity problem occurs, ensure that your server has not exceeded the maximum number of clients that it is configured to support. This number is indicated by the maxclients parameter in the server lanman.ini file. It can be displayed using the srvconfig - g maxclients command.
Execute the regcheck -C command to determine whether the internal format of the Registry file has been corrupted. If this command detects corruption, execute the regcheck -R command to repair the Registry file.
If invalid values have been entered in the SunLink Server Registry, then you can use the regload command to reinitialize all Registry values to their defaults.
Attempt to log on to the server from a client computer. If the logon is successful, link a virtual drive ID to a shared resource. Then, view the contents of the linked drive.
If you have problems with these steps, isolate each problem using the following procedure.
If you can communicate with the server but cannot access a shared resource, check the following items:
Verify that the shared resource exists by using the net view \\servername command. If the shared resource name is not displayed, then it does not exist. In that event, you must re-share the resource.
Link to the shared resource while logged in as administrator. If this fails and the resource exists, then the resource may be shared incorrectly. Delete and re-share the resource. If this succeeds, then proceed to the next step.
If the resource is a disk resource, check both levels of permissions associated with the shared resource. First check the share permissions using Server Manager. Then check the permissions on the shared directory using Windows Explorer at an administrative client.
Verify that the resource can be used using either group membership or on a per-account basis for that particular user. Also, verify that the access permissions on the resource allow the desired action to be performed (for example, the user has read-only permission but is attempting to edit a file). Also verify that the maximum user limit for a particular shared resource is not being exceeded.
On the shared resource, check the file attributes and the Solaris system access permissions.
If necessary, use the Properties menu in Windows Explorer.
Use the udir command to display Solaris system permissions (user, owner, group).
The SunLink Server program recognizes only the following types of file systems:
cdfs
nfs
s5
sfs
ufs
vxfs
File systems other than those listed above will be treated as an s5 file system. If you want all of your unknown file systems to be treated as a type other than s5, set the fsnosupport parameter in the [fsi] section of the lanman.ini file to the name of a recognized file system. Then, stop and restart the server.
If you want to set each unknown file system individually to a specific known file system, follow these steps:
At the Solaris system prompt, type the following command, replacing pathname with the actual name of the path to the unknown file system, and press Enter:
df -n pathname
The system displays the mount point and file system type as specified by the Solaris operating system.
Set the fsmap parameter in the [ fsi ] section of the lanman.ini file as follows:
unknown:s5,sfs:vxfs,unixfilesystem:filesystem, ...
Replace unixfilesystem with the name of the file system type returned in Step 1. Replace filesystem with the name of the SunLink Server file system type you want to use.