I Troubleshooting

This appendix summarizes tools, tips, and techniques you may employ to self-diagnose ACSLS issues at your site. The range of troubleshooting resources include logs, key observation points, and diagnostic probes.

If the procedures in this appendix do not provide a solution to your problem and Oracle Tape Support assistance is required, see the Oracle Knowledge Management document 1013165.1 for instructions on how to generate an ACSLS support bundle.

ACSLS Event Log

The ACSLS Event Log is the first stop for useful information if problems with your library operation occur. This log contains information about library events, status changes and errors. All sub-components within ACSLS will report events to the acsss_event.log by sending messages to a process called the Event Logger. The standard Event Log, which is automatically created when ACSLS is installed, is contained in the file $ACS_HOME/log/acsss_event.log and where $ACS_HOME is usually /export/home/ACSSS/.

Logged events include:

Significant Events

Significant Events are normal events that can help you manage the library. For example, events are logged when an audit is initiated or terminated, a device changes state, or a CAP is opened or closed.
Library Errors

Library Errors are events where both fatal and nonfatal hardware and software errors are logged. Examples include: LSM failures; problems with cartridges; database errors; process failures; and library communications failures.

Each message in the Event Log includes a time stamp, the name of the component reporting the message, and a description of the event. For a complete explanation of each message, consult the ACSLS Messages manual.

A window on the ACSLS console displays a running tail of the Event Log. You can generate a similar display from any shell window.

As user acsss, run the following command:

acs_tail $ACS_HOME/log/acsss_event.log

To view the entire Event Log, use a text editor, such as vi, that enables you to navigate through the log, search for specific errors, or follow specific sequences of events.

Managing the Event log

ACSLS continues sending messages to the acsss_event.log.

When this file reaches a threshold size (500KB by default), the file is renamed to event0.log and saved in the log directory. The acsss_event.log then continues as a new file.
When acsss_event.log again reaches the threshold size, the event0.log is renamed to event1.log and the acsss_event.log is renamed event0.log.
This process continues for as many log files that are configured for retention.

By default, nine Event Log files are retained in the log directory. With each subsequent threshold, the oldest file is removed and all remaining files are sequentially renamed.

You can configure the maximum size of the acsss_event.log and the number of log files to retain using acsss_config, Option 2. Refer to "Setting CSI Tuning Variables".

Using greplog to Search Event Logs

The diagnostic tool, greplog, enables you to perform keyword searches through any and all Event Log files. Used very much like the UNIX grep utility, greplog will return the complete log message associated with a given keyword expression. This enables you see the messages' date and time stamp, message number, and the function text related to every message containing that expression.

Format

greplog [-iv] pattern file_1 file_2 ... feline

Options

-i instructs greplog to ignore the case of the search pattern expression.
-v instructs greplog to filter out all messages containing the expression and to display all of the entries in the log file. The exception are those entries which match the pattern expression.

pattern: the pattern is the search criteria to be used.
```
file_1 file_2 ... file_n 
```

The greplog tool accepts multiple file parameters and wildcard expressions in the file list.

Examples

To display all occurrences within an event sequence, use the sequence number.
```
greplog 1392 acsss_event.log 
```
To search the Event Log for all messages about volume CART89:
```
greplog CART89 acsss_event.log 
```
To search all archived copies of the Event Log for messages about tape mounts:
```
greplog -i mount event*.log 
```

Additional Logs

The acsss_event.log contains all of the messages pertaining to any aspect of the ACSLS running processes. However, there are additional files in the log directory that contain status information about external utilities, such as backup and restore and installation utilities.

acsss.pid - Stores the process id of the currently running acsss_daemon.
acsss_config.log - Contains a summary of each library configuration.
acsss_config_event.log - Contains event messages that were posted by the acsss_config routine.
bdb_event.log - Contains event messages that were posted by the database backup utility, bdb.acsss.
cron_event.log - Contains messages that were posted by cron utilities. To see the cron schedule, run the command crontab -l.
acsls_start.log - Contains startup or shutdown messages involving the acsls service.
di_trace.log - Contains trace information related to the database interface.
ejectingLogs - A directory containing summary information from ejecting.sh operations for the past ten days.
install.log - Contains event messages posted while running the installation script, install.sh.
ipc_trace.log - Contains trace information pertaining to ACSLS inter-process communications.
rdb_event.log - Contains event messages that were posted by the database restore utility, rdb.acsss.
timed_bkup.sh.log - Contains event messages related to the automatic database backup utility.

Additional trace logs may be found in the log directory depending upon the specific tracing that you have enabled on your system. The logs include the following:

acsss_stats.log - Volume statistics tracing is enabled by acsss_config.
acsss_trace.log - Client-server tracing is enabled at the request of Software Support personnel.
acslh.log - Host-LMU tracing is enabled at the request of Software Support personnel.
scsilh.log, mchangerX.log, scsipkt.log - All of these contain traces of SCSI communications to a SCSI-attached library, and are enabled at the request of Software Support.

Trace Log Management

Trace logs that are enabled at the request of Software Support can grow quite rapidly. These logs need to be monitored and managed in order to mitigate problems of a full disk.

The utility monitor.sh is provided to perform automatic log management and archiving services. The syntax, is:

monitor.sh <name of log>

When this utility is enabled to monitor a specific log, it will allow the log to grow to a size of 1MB (default) and then will compress the log using gzip, placing the compressed logfile with a time stamped name in the ACSSS/log/log_archives subdirectory. This operation will continue if tracing stays enabled.

Java Component Logs

A number of logs are maintained by the Java components in ACSLS including ACSLS GUI and Logical Library software components. These logs are found in the $ACS_HOME/log/sslm directory.

WebLogic installation procedures are logged in the weblogic.log. WebLogic and ACSLS GUI operations are logged in the AcslsDomain.log and the AdminServer.log.

An audit trail of user activity in the Web-based GUI is found in the guiAccess.log.

Transactions between Java components and legacy ACSLS components are logged in the surrogate_trace.0.log.

IPC packets between Java client components and the ACSLS server are traced in the acslm_ipc_trace.0.log.

Errors encountered by the ACSLS GUI are logged in the gui_trace.0.log.

Low-level communication between the SMCE and the SCSI (fibre) client are logged in the smce_trace.0.log.

These logs are found in the $ACS_HOME/log/sslm directory.

Key Observation Points

There are numerous utilities that enable you to verify the status of various aspects of ACSLS.

psacs - shows a summary of all of the ACSLS running processes. It is the best indication whether ACSLS is running or not. A typical output should display no fewer than twelve different processes, all children of a common parent process.
acsss status - checks if the acsdb database service is running
To display your ACSLS release and maintenance level:
- On Solaris:
  
  pkginfo -l STKacsls
- On Linux:
  
  rpm -q ACSLS
- On Solaris or Linux:
  
  in_get_version

Diagnosing ACSLS Startup Problems

Look at the acsss_event.log.
Look at the acsls_start.log.
Look at the end of the acsss_event.log for messages explaining the problem.
Refer to the ACSLS Messages Guide for an explanation of the messages and what you can do to resolve them.
Display the status of ACSLS services with acsss l-status.

Use acsss l-status to show a status summary of the ACSLS services. For each service, the logfile entry points to log data which may contain detailed messages explaining the condition that prevented ACSLS from starting up.
ACSLS time outs during startup
On Solaris, to display the calculated ACSLS start-up timeout period based on your configuration use acsss timeout.

Testing Library Connections

ACSLS provides utilities for verifying a good physical connection to the library. The tool you select is best determined by the context of your activity.

testports

This utility tests the connection to each library that has been configured to StorageTek ACSLS. It is also the easiest to use and is the most comprehensive. The test is un-obtrusive and does not impact normal library operations. Since testports uses the StorageTek ACSLS database to determine the library port name and library type, the library must already have been configured to StorageTek ACSLS in order for testports to function.

For TCP/IP libraries, testports verifies the connection and whether the library is online and in use by StorageTek ACSLS.
For SCSI and serial-attached libraries, the 'acs' and 'port' must be offline in order for testports to open the test connection.

To run this utility, the command syntax, is:

testports

The library compat level or microcode level displays.

testlmutcp

This utility submits a TCP/IP packet to a network-attached library.

To test the library connection, include the library hostname or ip address in the command line:

testlmutcp <ip_address> or

testlmutcp <hostname>

To test the connection while the library is online to ACSLS, specify an unused socket number between 50002 and 50016. For example:

testlmutcp <ip_address>:50002

A successful response will include the compatibility level of the attached library.

testlmu

This utility can be used to test connectivity between ACSLS and legacy StorageTek serial-attached libraries. To run this utility, submit the devlink path to the serial port device node:

testlmu /dev/term/0

The library must be offline to ACSLS in order for testlmu to open the serial connection.

pinglmu.sh

This utility enables you to verify communication between ACSLS and a serial-attached library while the library is online to ACSLS. A successful response includes the library compatibility level.

probescsi.sh

This utility exercises the connection between the ACSLS server and a SCSI or fibre-attached library. To run this utility, specify the devlink path to the mchanger device. The syntax, is:

probescsi.sh /dev/mchangerX

where X is the specific mchanger instance of the library being tested.

The library must be offline to ACSLS in order for probescsi to open the SCSI connection. A successful response includes the microcode level of the attached library.

probeFibre.sh

This utility discovers all fibre-attached libraries that are reachable from the ACSLS server. The syntax, is:

probeFibre.sh

A successful response displays the model number of each fibre-attached library along with its target, LUN IDs, and the World Wide Port Name (WWPN).

Using the -v option, you can also display the model number of the host bus adapter.

probeFibre.sh -v

showDevs.sh

This utility reveals details about every mchanger device for which a mchanger link has been created.

showDevs.sh

Displays library model, revision level and the capacity of each attached mchanger library.
showDevs.sh -w

This option also includes the WWPN of each library.
showDevs.sh -s

This option also includes the serial number of each library.

Troubleshooting an Offline SL4000 ACS

When network disruptions occur, the SL4000 library may disable the OSCI channel for ACSLS, placing an ACS offline. Use the following procedure to re-enable the ACS:

Verify that the SL4000 library is online and in an operational (or degraded) state.
Verify that the SL4000 OSCI connection to ACSLS is enabled. In the SL4000 BUI, select Notifications and then SCI screen. Here you can enable the OSCI connection. Refer to the SL4000 User's Guide.
Use the testports utility to verify the network connection. See "testports".
Verify that port, lsm, and acs are online. Enter the following commands in order, replacing acs_num with the acs number:
```
vary port acs_num,0 online
vary lsm acs_num,0 online
vary acs acs_num online
```

Testing a Client Connection

Client applications communicate with ACSLS over TCP/IP using the RPC (remote procedure call) protocol. If a client system cannot communicate with ACSLS, you can use rpcinfo to test whether ACSLS is reachable from the client machine.

From the ACSLS server, verify that ACSLS is running.

psacs
From the ACSLS server, verify that the RPC daemon is running.

ps -ef | grep rpc
From the ACSLS server, verify that program number 300031 is registered for TCP and IDP.

rpcinfo | grep 300031

This program number confirms that ACSLS is running and that ACSLS has registered with RPC.
From the client machine, or any UNIX machine on the network, use rpcinfo to exchange a packet with program number 300031 on the ACSLS server.

Specify the IP address of the ACSLS server along with the program number.

rpcinfo -t <ip address> 300031

If the communication exchange was successful, the rpcinfo utility will display the following message:

program 300031 version 1 ready and waiting

program 300031 version 2 ready and waiting

This confirms that ACSLS is available for client connections across the network.

CAP in Fibre Library Attached Through a Bridged Drive is Locked

A CAP in a Fibre-attached library that is connected through a bridged drive may become locked when a different ACSLS instance takes over management of the library. For details about and solutions to this problem, see "CAP (Mailslot) Not Opening During an Eject" in the SL150 Appendix.

Gathering Diagnostic Information for Oracle Support

As part of the service call, Oracle Support may ask you to send the entire set of diagnostic logs and other diagnostic information for analysis. All of this data can be collected with a single command:

get_diags

When this utility has collected all of the information, it prompts you to either email the data or to make it available for manual transfer.

If you elect to e-mail the data directly from the ACSLS machine, make sure that email communication is possible between your ACSLS machine and the Internet. Your enterprise may have a firewall to prevent e-mail going directly from the target machine. In this case, you can e-mail the information to yourself within the enterprise and then forward the diagnostic data to Oracle Support.

Alternatively, you can elect to transfer the information manually. The get_diags utility advises you where to find the waiting tar packages for transfer. Typically, the staging area for diagnostic data is /export/backup/diag/acsss.

For additional details on how to generate an ACSLS support bundle, see the Oracle Knowledge Management document 1013165.1.

ACSLS and Security-Enhanced Linux (SE Linux)

SE Linux is enabled by default in Oracle Linux. Beyond standard Unix level access control, SE Linux enforces access to system resources according to user role and the immediate context domain. When SE Linux enforcement is enabled, the ability of ACSLS to access its own PostgreSQL database would be inhibited without a special policy that establishes the role and context for such access. You may need to disable SE Linux, or to run in permissive mode, if problems occur.

Un-installing SE Linux Policy Modules for ACSLS

Three SE Linux policy modules are loaded into the kernel when you install ACSLS: allowPostgr, acsdb, and acsdb1. These modules provide the definitions and enforcement exceptions that are necessary in order for ACSLS to access its own database and other system resources while SE Linux enforcement is active. With these modules installed, you should be able to run normal ACSLS operations, including database operations such as bdb.acsss, rdb.acsss, db_export.sh and db_import.sh without the need to disable SE Linux enforcement.

For purposes of quicker software upgrades, the SE Linux policy modules that were loaded by ACSLS are not removed automatically when you uninstall the ACSLS package. To remove them manually, get a listing of the ACSLS modules:

# semodule -l | grep acsdb

# semodule -l | grep allowPostgr

For each module, remove the module in the following manner:

# semodule -r <module name>

Managing SE Linux Enforcement

After installing ACSLS, if you encounter access related issues where the system responds with 'permission denied' while traditional file permission settings appear to be valid, the source of the access denial may be SE Linux.

To verify whether SE Linux enforcement is enabled, run the command: sestatus

# sestatus
SELinux status:   enabled
Current mode:     enforcing

You can disable SE Linux enforcement temporarily using the command: setenforce:

# setenforce Permissive

With SE Linux enforcement in the permissive mode, you can now check whether access to the failed resource can be restored. If the necessary resource is available to the authorized user in the permissive mode but not in the enforcing mode, this suggests the need for an updated SE Linux policy

To disable SE Linux security permanently (across boots):

Edit the file: /etc/selinux/config
Change: SELINUX=enforcing to SELINUX=permissive

To re-enable SE Linux enforcement, root needs to have the sysadm_r role.

# newrole -r sysadm_r
# setenforce enforcing

After you have verified that SE Linux was the cause of the apparent restriction, you can view the actual rules that disallowed access to the needed resource by looking at the SE Linux audit log.

# vi /var/log/audit/audit.log

The audit.log provides a summary for each access attempt that succeeded or failed SE enforcement. You should look for the events that failed. For ACSLS, look specifically at events related to the users acsss and acsdb.

You can view the SE Linux context attributes associated with any given file or directory:

# ls -Z <file name>

You can view the context attributes of a given process or those of your current shell using the command: secon. It is possible to change the context attributes of a file or directory using the command chcon. Consult the man pages for these operations.

It is possible to create a policy module in response to the failed operations found in the audit.log.

# cd /var/log/audit
# audit2allow -a -M <ModuleName>

This evaluates the failures that were logged by SE Linux and it creates a policy module file <ModuleName>.pp. This file can now be loaded into the Linux kernel in order to allow the operations that had been blocked.

# semodule -i <ModuleName>.pp

Since audit2allow creates a policy that enables all of the restrictions identified in the audit.log, it is wise to make sure that the audit.log contains only those operations that you specifically want to allow. You can save the original audit.log and create a new one.

# mv audit.log audit1.log
# touch audit.log

Proceed with the operations that you want to capture before you create a policy module for them.

For more information regarding SE Linux, consult the man page:

# man selinux

Verifying the GUI is Operational

The checkGui.sh utility checks common factors to assess whether the GUI is operational. In the event that the GUI is not working, this utility may lead users to the likely cause of the problem.

This utility checks the following:

Is weblogic enabled on the system?
Are there phantom or stale processes that may inhibit weblogic operation?
Is the SlimGUI application successfully deployed?
Can weblogic and the GUI respond to an http request sent to localhost?
Can weblogic respond to an http request sent to the internet address of the host?
Is the firewall service enabled on the server? If so, is there a policy to accept incoming requests to weblogic ports 7001 and 7002?

On Linux systems, you may find that the firewall called iptables is enabled by default. You can disable iptables completely, or you can add a policy to accept incoming traffic to ports 7001 and 7002.

To enable these ports (as root), edit the file /etc/sysconfig/iptables. Add the following two lines:
```
-A INPUT -p tco --dport 7001 -j ACCEPT
-A INPUT -p tco --dport 7002 -j ACCEPT
```
Make sure that you do not insert these rules after another rule that will match incoming packets before they are examined. For example, do not append them to the end of an iptables chain after a rule to REJECT all.

If you are using the iptables command to add these rules:
- List (iptables -L) or Print (iptables -S) the table.
- Add the rules.
  
  Just Appending (iptables –A) the rules to the end of a chain may not produce your desired outcome because prior rules may prevent the new rules from matching any input.
  
  Perhaps Insert (iptables -I) the rule by rulenum.
- List (iptables -L) or Print (iptables -S) the table after the change and make sure that existing rules do not prevent the new rules for ports 7001 and 7002 from ever being examined.
  
  This ensures that the new rules can match an incoming packet.
  
  The checkGui.sh utility checks for the existence of rules to ACCEPT input on ports 7001 and 7002. It does not verify that these rules are in the right iptables chain or that the new rules will actually be processed. In other words checkGui.sh does not verify that there are no prior rules which would prevent the examination of the new rules.
Restart iptables:
```
service iptables restart
```

The comparable service on Solaris is ipfilter, which is not normally enabled by default.

GUI Troubleshooting Tips

The following table discusses some GUI troubleshooting tips.

Table I-1 GUI Troubleshooting Tips

Problem	Solution
I have placed https://<hostname> in the browser but the response page declares "Unable to connect”.	The correct URL is: https://hostname.domain:7002/SlimGUI/faces/Slim.jsp
The ACSLS GUI page is incomplete. Some frames are incomplete or entire sections missing.	Click the refresh button on your browser.
Java WebLogic rejects what I know to be a valid user i.d. and password. I can't log in	Consult with your local ACSLS administrator. If you are the administrator, use the userAdmin.sh utility to list users, add a user, or change a user password. If the users still have trouble logging in, check your system for sufficient memory, and then restart the ACSLS GUI with option-5 of userAdmin.sh. Alternatively, you can restart WebLogic using svcadm disable weblogic and svcadm enable weblogic.
A java error stack trace is displayed in one or more windows of the GUI.	Press the refresh button on your browser. If the problem persists, use `acsss` `status` to verify that ACSLS services are running. If services are not running, bring them up with `acsss enable.` If ACSLS services are running, restart the GUI using userAdmin.sh. Alternatively, you can restart WebLogic with svcadm disable weblogic and svcadm enable weblogic. If you do not have `root` access to the system, you can shutdown all services with `acsss shutdown`, then, restart them up with `acsss enable`. The GUI is restarted by this process.
The selection for "Logical Libraries” is missing in the index tree frame.	You must first create a logical library. Select: Configuration and Administration ->Logical Library Configuration ->Create Logical Library.
No volumes are listed on the Volumes page under Tape Library Operations or Tape Libraries & Drives.	This is an indication that an initial audit has not been taken for the library. Select: Tape Library Operations ->Audit`.`
No volumes are listed on the Volumes page under Logical Libraries.	This is a sign that volumes have not yet been assigned to the logical library. Select: Logical Library Configuration ->Assign Volumes.
GUI response time is slow	Increase the Alert Update Interval under the Preferences button in the GUI masthead.
I need to add GUI users, change the passwords of GUI users, or set `acsls_admin` password.	See "userAdmin.sh". This utility lets you add users, change user's passwords, and tells you how to reset the `acsls_admin` password.
Browser requires a security certificate.	See "Configuring a Self-Assigned Digital Certificate for HTTPS"

Linux 7.3 and 7.6 Performance Issues

Under Linux 7.3 and 7.6, significant ACSLS performance slowdown conditions may occur. These conditions are traced to missing host entries and the use of the SSSD service.

Slow resolution of UIDs, groups, automounts, permissions, and related details can have wide-ranging impacts. Many applications, including ACSLS, use the su command, which requires resolution of system user details prior to proceeding.

Missing Host Entry

Make sure that you do not have a missing host entry in your /etc/hosts file. Include both the fully qualified host name and shortened version. For example:

[root@mycomputer ~]# cat /etc/hosts
10.0.0.0      mycomputer.us.mydomain.com     mycomputer

SSSD

Use of the Linux System Security Services Daemon (SSSD) service can impact ACSLS performance, causing numerous types of slowdown conditions. These conditions revolve around refreshing of a cache and re-walking trees of user names, groups, permissions, and so on.

If you encounter a slowdown condition, use the following guidelines to determine whether the condition is related to your use of SSSD:

Diagnosis 1

Verfiy that the machine is responding slowly to any su command.
Check the /etc/nsswitch.conf file for occurrences of "sssd". If you do not find occurrences of "sssd" then proceed with Diagnosis 2.
For diagnostic purposes, remove any occurrences of "sssd". No restart required.
Test the su command again several times, to varying users. If response time is improved, then the SSSD service is the root cause of your slowdown condition.

Diagnosis 2

If there are no occurrences of "sssd" in the /etc/nsswitch.conf file, SSSD can still cause slow performance.

As root, enter the following command to display the status of the SSSD service:
```
systemctl status sssd.service
```
If the service is active, enter the following command to turn it off:
```
systemctl stop sssd.service
```
Enter the following command to disable the SSSD service:
```
systemctl disable sssd.service
```
Test the su command again several times, to varying users. If response time is improved, then the SSSD service is the root cause of your slowdown condition.

SL4000 OCSI Retention Time

The OSCI retention time determines how long the SL4000 library retains events on the OSCI channel if ACSLS and the SL4000 become disconnected due to network outages or other network related issues.

The OSCI retention time is set to two hours and is configured by ACSLS when creating the OSCI destination channel on the library. This is an internally configured time and is not configured by the installer or other ACSLS UI.

If ACSLS and SL4000 connectivity is lost for longer than the two hour retention time, OSCI events may be lost. In most cases, no further action is required once connectivity is restored. However, if CAP enter or eject operations were occurring at the time of connectivity loss and the OSCI retention time has elapsed, it may be necessary to stop and restart ACSLS to re-sync ACSLS with the library.