G Troubleshooting

This appendix highlights the most common problems encountered when installing RUEI, and offers solutions to locate and correct them. The information in this appendix should be reviewed before contacting Customer Support.

More Information

Note the following:

Contacting Customer Support

If you experience problems with the installation or configuration of the RUEI, you can contact Customer Support. However, before doing so, it is strongly recommended that you create a Helpdesk report file of your installation. To do so, select System, then Configuration, and then Helpdesk report. This file contains extended system information that is extremely useful to Customer Support when handling any issues that you report. Please note that this file contains information in a proprietary format. Do not try to modify its contents.

In addition, extended information about internal errors is available by enabling Session debugging. To do so, select the Session debug option from the Help menu. For further information, see the Oracle Real User Experience Insight User's Guide.

G.1 Running the ruei-check.sh Script

It is recommended you use the ruei_check.sh script to troubleshoot installation issues. When first run, the script requires you to specify an installation type (reporter, processor, collector, or database). Be aware this selection is saved to file. Therefore, if you want to run the script and be able to specify a different installation type, you need to delete the file /tmp/ruei-system-type using the following command:

rm /tmp/ruei-system-type

You can specify the parameters shown in Table G-1.

Table G-1 ruei-check.sh Parameters

Parameter Description

system

Performs basic system checks, as well as a a number of prerequisites checks. These include interfaces that can be monitorable interfaces, that the Oracle database starts correctly, and that the Apache web server, PHP, and Zend optimizer are correctly configured.

preinstall

Checks whether the Oracle database is correctly configured.

postinstall

Checks if the RUEI RPMs have been installed correctly.

all

Performs all the above checks in the indicated sequences.


For example:

cd /root/RUEI/132
./ruei-check.sh all

The use of this script is fully described in Appendix E, "The ruei-check.sh Script".

G.2 ORA-12805: Parallel Query Server Died Unexpectedly

When executing a parallel statement using a partial-partition wise join, where the set of partitions accessed is pruned at runtime to no partitions or partitions without any segments, you may run into an error similar to the following:

ORA-12805: PARALLEL QUERY SERVER DIED UNEXPECTEDLY 

This is caused by a known bug in Oracle Database 11.2 and can be addressed by employing the following workaround to make GUI queries not parallel by setting db_gui_dop to 1:

$ execsql config_set_value wi_core db_gui_dop 1

However, Oracle recommends that you upgrade to database version 11.2.0.3 and download and install patch 13582702 available at the following location:

https://support.oracle.com/epmos/faces/ui/patch/PatchDetail.jspx?_afrLoop=33337295036267&patchId=13582702

G.3 The ruei-prepare-db.sh Script Fails

If the ruei-prepare-db.sh script fails, this can be because the database listener has not been started correctly due to a failing DNS look up. To resolve this problem, do the following:

  • Ensure the /etc/hosts file includes your host.

  • Ensure entries in the /etc/nsswitch.conf file are specified in the required (sequence hosts: files DNS).

Note:

The ruei-prepare-db.sh script can be run with the delete option to remove the current database and install a new one.

G.4 Starting Problems

If the system does not seem to start, or does not listen to the correct ports, do the following:

  • Restart each Collector service. To do so, select System, then Maintenance, then Network data collectors, select each attached Collector, and select the Restart option from the menu. This is described in more detail in the Oracle Real User Experience Insight User's Guide.

  • Review your network filter definitions. This is described in the Oracle Real User Experience Insight User's Guide. In particular, ensure that no usual network filters have been applied. This is particularly important in the case of VLANs.

  • Ensure that RUEI is listening to the correct protocols and ports. This is described in the Oracle Real User Experience Insight User's Guide.

  • Verify that the Collector interfaces are up.

Resources and Log Files

If during, or directly after running the Initial setup wizard (described in Section 6.2, "Performing Initial RUEI Configuration"), the system returns an error, there are the following resources and log files available to help you in debugging:

  • RUEI_DATA/processor/log/gui_debug.log: a proprietary debug and log file that shows low-level system information. Although it contents may be difficult to read, you can find standard system error messages listed here.

  • /var/log/httpd/access_log and /error_log: the Apache daemon access and error log files. If any part of the HTTP or PHP execution of the RUEI user interface is in error, it will show up in these log files. (Note that these are not the log files used by RUEI for HTTP data analysis).

Root-Cause Analysis

Before starting to address specific issues, it is important to understand the basic operation of data collection, data processing, and data reporting. Any root-cause analysis of RUEI problems should take the following:

  • Verify data collection. Select System, then Status, and then Collector status. Select a Collector from the displayed list, and verify that the system interfaces are showing traffic activity on TCP, Ethernet, and HTTP level.

  • In addition, verify that there are no problems with the SSL data decryption. It is normal that some errors occur (especially shortly after startup). But if SSL traffic is to be decrypted, the error rate can never be 100%.

  • Verify data processing. Select System, then Status, then Data processing. A screen similar to the one shown in Figure 6-6 appears. It should indicate some activity.

G.5 Data Collection Problems

If the data collection service is not running, or will not start, do the following:

  • Use the TCP diagnostics facility to verify that RUEI "sees" all required network traffic. The use of this tool is described in Appendix F, "Verifying Monitored Network Traffic".

  • Ensure the network cards used for data collection are running in promiscuous mode. This can be verified by issuing the command ifconfig ethN (where N is the number of the network interface being used for data collection). It should return an output similar to the following:

     ethn      Link encap:Ethernet  HWaddr 00:15:17:3E:26:AF           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1           RX packets:0 errors:0 dropped:0 overruns:0 frame:0           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0           collisions:0 txqueuelen:1000           RX bytes:0 (0.0 GiB)  TX bytes:0 (0.0 GiB)           Memory:b9120000-b9140000
    
    

    Note that you may want to repeat the above command to view changes in network traffic while diagnosing network issues.

  • If the network interface is not available, make sure the ONBOOT parameter is set to YES.

  • Verify there is no IP address assigned to the network interface being used for data collection. If there is a configured IP address, remove it.

    Note:

    Do not set to 0.0.0.0 or 127.0.0.1. Remove the configured IP address completely.

G.6 Data Processing Problems

If, for any reason, data processing does not start, try to restart it by selecting System, then Maintenance, and then System reset. The System reset wizard appears. Select the Restart system processing option. Note that restarting system processing can take between 5 and 30 minutes.

In general, if no data is being processed, verify your system's configuration as described in Section 6.5, "Verifying and Evaluating Your Configuration". If you do not apply any configuration to the system, no data processing will take place.

If you are using an environment with multiple Collectors, ensure all Collectors are up and running normally. To do so, select System, then Status, and then Collector status. A failing Collector can become a block to further data processing of the system's data.

G.7 E-Mail Problems

Sending E-mails is RUEI functionality that is handled on a system level, together with your Mail Transfer Agent (MTA), such as Sendmail or Postfix. If problems occur when sending E-mails, do the following:

  • If mail is sent correctly by RUEI to your MTA, the user interface will report "Message sent successfully" when you attempt to send a daily, weekly, or monthly report manually.

  • If mail could not be sent correctly by RUEI to your MTA, verify that the MTA is up and running. Alternatively, analyze the mail settings by selecting System, then Maintenance, and E-mail configuration.

  • If the mail was sent successfully, but not delivered to the recipient, analyze the operation of your MTA to further identify the root cause of the mails that are not delivered.

  • Refer to the /var/log/maillog file for reported mailing issues.

Common issues with E-mail delivery often involve an incorrectly configured MTA, or an MTA that is not allowed to send E-mail within the Data Center or corporate network.

G.8 SSL Decryption Problems

In order to decrypt SSL traffic, the Collector needs to have the SSL key and certificate available. To enable SSL decryption, you should do the following:

  • Upload the SSL key through the appropriate Collector.

  • Enable the SSL key by entering the required decryption passphase (when applicable).

The certificate needs to be uploaded to the Collector(s) by selecting Configuration, then Security, and then SSL keys. To check the status of the SSL decryption, select System, then Status, and then Collector status, and select the Collector for which you want SSL decryption analysis. Within the SSL encryption page, note the following:

  • Decryption errors will occur if there is no SSL key uploaded.

  • The percentage of successful decryption will be a low number shortly after uploading and activating the appropriate SSL keys.

  • This percentage should rise in the first minutes and hours after uploading the SSL keys.

RUEI accepts PKCS#12 and PEM/DER encoding of SSL keys and certificates. Basically, this means both the certificate and key should be concatenated into one file. If you have separate key and certificate files, you can create a PKCS#12-compliant file by issuing the following command:

openssl pkcs12 -export -in certificate.cer -inkey key.key -out pkcs12file.p12 -passout pass:yourpassphrase

Where:

  • certicate.cer is your CA root certificate file.

  • key.key is the server's SSL key file.

  • pkcs12file.p12 is the output file name for the PKCS#12-encoded file.

  • yourpassphrase is the passphrase you want to use to protect the file from unwanted decryption.

For example, consider the situation where the CA root certificate filename is ca_mydomainroot.cer, the server's SSL key is appsrv12.key, you want the output file to be called uxssl.p12, and want to protect this file with the passphrase thisismysecretpassphrase. The following command is required:

Openssl pkcs12 -export -in ca_mydomainroot.cer -inkey appsrv12.key -out uxssl.p12 -passout pass:thisismysecretpassphrase

Check the collector statistic page of RUEI for issues, specifically searching  for sessions labelled:

  • Ephemeral - These sessions provide forward secrecy and therefore cannot be monitored by RUEI. 

  • Anonymous - These sessions do not have a long-lived server key  and therefore cannot be monitored by RUEI. 

G.9 Missing Packages and Fonts Error Messages

It is strongly recommended that you not perform a "minimal" installation of Oracle Linux. If you do so, it can lead to a wide range of reported problems, depending on the components not included in the installation, but required by RUEI.

The most common of these are reported fontconfig error messages in the /var/log/http/error_log file. These can be fixed by installing the following fonts:

  • urw-fonts-noarch v2.3

  • ghostscript-fonts-noarch v5

  • dejavu-lgc-fonts-noarch v2

  • liberation-fonts v0.2

  • bitmap-fonts v0.3

Depending on your language settings, install all other required fonts.

However, other possible error messages include reported missing packages (such as librsvg2).

When a Yum repository is available, all dependencies available on the Linux 5.x DVD can be installed by issuing following command:

yum -y install gcc gcc-c++ compat-libstdc++-33 glibc-devel libstdc++-devel \
elfutils-libelf-devel glibc-devel libaio-devel sysstat perl-URI net-snmp libpcap \
sendmail-cf httpd php php-pear php-mbstring phpldap librsvg2 xorg-x11-xinit \
net-snmp-utils perl-XML-Twig

For RedHat Enterprise/Oracle Linux 6.x, issue the following command:

yum -y install gcc gcc-c++ compat-libstdc++-33 glibc-devel libstdc++-devel \
elfutils-libelf-devel glibc-devel libaio-devel sysstat perl-URI net-snmp \
libpcap sendmail-cf httpd php php-pear php-mbstring phpldap librsvg2 \
xorg-x11-xinit net-snmp-utils perl-XML-Twig rsync ksh openssl098e wget bc \
bind-utils

However, be aware that additional RPMs shipped with the RUEI installation zip file still need to be installed.

G.10 ORA-xxxxx Errors

If you receive any Oracle database errors, do the following:

  • Ensure that the /etc/sysconfig/httpd file contains the following lines:

    source /etc/ruei.conf
    

    If you have to add these lines, restart the Apache web server using the following command:

    service httpd restart
    
  • Ensure that the ewallet.p12 file is readable by the RUEI_USER specified user. Additionally, the cwallet.sso file should also be readable by the RUEI_GROUP specified group. On Linux/UNIX, this can by accomplished by issuing the following commands:

    chmod 600 ewallet.p12
    chmod 640 cwallet.sso
    
  • Ensure the same host name is specified in the /var/opt/ruei/tnsnames.ora, /etc/sysconfig/network, and /etc/hosts files.

    Note if you make changes to any of these files, you may need to reboot the server.

G.11 Oracle DataBase Not Running

Verify the Oracle database is up and running by changing to the moniforce user and obtaining an SQL*Plus prompt with the following commands:

su - moniforce
sqlplus /@connect-string

where connect-string is either RUEI_DB_TNSNAME or RUEI_DB_TNSNAME_CFG.

You should receive the SQL*Plus command line without being prompted for a password. This indicates that the Oracle wallet authentication was successful.

If necessary, re-start the Oracle database using the following command:

/etc/init.d/oracledb restart

G.12 General (Non-Specific) Problems

If you are experiencing problems with the reporting module, or find its interface unstable, it is recommended that you do the following:

  • Clear all content caching within your browser, and re-start your browser.

  • Examine the error log. This is described in the Oracle Real User Experience Insight User's Guide.

  • Select System, then Status, and verify correct operation of the core components by then selecting Data Collection, Logfile processing, and Data processing. If any of these components are in error, try to resolve them using the advice provided in this appendix.

G.13 Network Interface Not Up

If the network interface you intend to use for data collection is not Up (that is, the ONBOOT=YES parameter was not set), you can bring it immediately using the following command:

ifconfig ethN up

where N represents the necessary network interface.

G.14 OAM-Related Problems

In order to start isolating OAM-related problems, you should do the following:

  1. Logon to the Reporter system as the moniforce user.

  2. To obtain a sample value of the cookie, issue the following command:

    EXAMPLE_VALUE=$(zgrep ObSSOCookie \
    $WEBSENSOR_HOME/data/wg_localhost/http/`date +"%Y%m%d"`/*/http-*|\
    tail -1 |sed 's,^.*ObSSOCookie=\([^;[:space:]]*\)[;[:space:]].*$,\1,g')
    
  3. To view the obtained sample value, issue the following command:

    echo $EXAMPLE_VALUE
    

    You should check that the returned output is not empty and does not contain errors. The following is an example of the possible output:

    2bTxIrJxIGg%2FMrntHeRuhI1bADtml%2FNPXMho%2FuXK1S3PmiqdsQy4QAgcq0JiQbLfabIs1FBQc%2Bq1Nadjw7naVCqAyT7ir883GoGkSTX8ODtW7S1HQlbATMahOSYsTn8wshgg%2Fg5vi0d18%2F3Zw6tOdPevrhE0wTCk069p%2FkeIS8ftPBUSe6p9rEKiWBqyptQpUzW4SwfTz89iNxOoNULPkG4I5B%2BVa2ac4pgA4rc%2Bre%2BdFk3Gcm7dyu5XC%2BiQKRznERRE1t7wQb7RF5zjFL8hD6Jl0yquJytYPV3x7ufa%2BWatYE5uIHq3NdUKzuLq0214
    
  4. To specify the obtained value as the OAM cookie, issue the following commands:

    cp $WEBSENSOR_INI/../evt/OAM2* $WEBSENSOR_INI
    mklookup --match $EXAMPLE_VALUE|GET|/some/url.html '%' '%1[$OAM2UserName]'  %0
    

    Note:

    The URL should be a URL protected by OAM.

Reported Errors

If the following error is received:

*ERROR* - obssocookie: could not dlopen()
/opt/netpoint/AccessServerSDK//oblix/lib/libobaccess.so:
/opt/netpoint/AccessServerSDK//oblix/lib/libobaccess.so: cannot open shared
object file: Permission denied

This indicates that the moniforce user does not have the necessary permissions. You should logon to the Reporter system as the moniforce user, and issue the following commands:

find /opt/netpoint/AccessServerSDK -type d -exec chmod o+rx {} \;
find /opt/netpoint/AccessServerSDK -type f -exec chmod o+r {} \; 

If the following error is received:

*ERROR* - obssocookie: could not dlopen()
/opt/netpoint/AccessServerSDK//oblix/lib/libobaccess.so:
/opt/netpoint/AccessServerSDK//oblix/lib/libobaccess.so: wrong ELF class:
ELFCLASS32 

This indicates that the 32-bit version of the Access Gate SDK was installed instead of the required 64-bit version. The procedure to download and install the required Access Gate SDK is described in Section 7.1.1, "Downloading and Installing the Access Gate Software".

Note that the Access Gate SDK installation package includes a utility to uninstall the 32-bit version ( _uninstAccessSDK/uninstaller.bin).

If the following error is received:

Server is not authenticated to access the the OAM environment

This indicates that the creation of a trust between RUEI and the access server (described in Section 7.1.2, "Configuring the Access Gate Software on the RUEI Server") was not successfully performed, and should be repeated.

If the following error is received:

*ERROR* - obssocookie: environment variable OBACCESS_INSTALL_DIR not set

This indicates that the procedure described in Chapter 7, "Configuring the Oracle Access Manager" was not followed.

G.15 ruei-check.sh Script Reports PHP Timezone Error

The following error is reported by the ruei-check.sh script:

Checking if the PHP timezone has been set correctly:   [FAIL]
PHP and OS timezones do not match (os: winter +0000, summer +0100. php:
winter +0100, summer +0200)

This can easily be fixed by setting the TZ environment variable at the bottom of the /etc/ruei.conf file on the Reporter system as follows:

export TZ=Europe/Lisbon

G.16 ORA-00020: maximum number of processes (%s) exceeded

If this error is reported, you will need to increase the maximum number of processes available to the databases within your environment. To increase the maximum number of processes from the default (150) to 300, do the following:

  1. Logon as the oracle user to each database within your RUEI deployment.

  2. Obtain an SQL*Plus prompt by issuing the following command:

    sqlplus / as sysdba
    
  3. Issue the following commands:

    SQL> alter system set processes=300 scope=spfile;
    System altered.
    
    SQL> shutdown immediate
    Database closed.
    Database dismounted.
    ORACLE instance shut down.
    SQL> startup
    ORACLE instance started.
    

G.17 rsync Fails When user@ Argument not Specified

Version 3.0.6-4 of the rsync utility distributed as part of RedHat Linux 5.7 is known to contain the bug BZ# 726060. This leads to a failure and error when specifying the source or destination argument of the rsync command without the optional user@ argument.If you encountered this issue, it is recommended that you upload and install the RedHat update 2011:1112-1. It is available from the following location:

http://rhn.redhat.com/errata/RHBA-2011-1112.html

G.18 ORA-00600 Error Reported

The following error is reported when restoring a RUEI backup or deleting certain configuration items (for example, application, user id source, framework exception):

ORA-00600: internal error code, arguments: [kkmmctbf:bad intcoln]

This is caused by a known bug in Oracle database version 11.2.0.3.0. It can be fixed by downloading and installing the patch 13582702 available at the following location:

https://support.oracle.com/epmos/faces/ui/patch/PatchDetail.jspx?_afrLoop=33337295036267&patchId=13582702

G.19 Dropped Segments and Bad Checksums

If the collector is reporting a large number of dropped segments or segments with bad checksums, this could indicate a problem with the network card settings. This is because large segments, also referred to as Jumbo frames, are typically created by the network interface card driver due to an optimization feature called "receive offload". Typically, such segments created by the driver do not have a checksum (blank checksum) or have a random (junk) value as the checksum.

The dropped segments and bad checksums counters can be inspected by selecting System, then Status, then Collector Status, select the collector host, then TCP.

To see if large frames are present on the network card, select Interfaces on the collector status screen. Look at the value of the Largest Encountered Frame field for each interface and compare it with the value of the Configured Max Frame Size field. If the configured size is less than the largest encountered frame, then the collector is dropping these frames as they are too large for its internal capture buffers. In addition, the collector will issue a warning event in the Event Log when it encounters a frame larger than the maximum configured size.

Note that in RUEI 12.1.0.6, the maximum configured frame size has been set to 64KB by default. In addition, checksum validation rules have been relaxed to accept both frames with blank checksums, and large frames with a junk checksum. Therefore, the rest of this section is only applicable if you have reduced the frame size for any reason and are running into drops due to bad checksums.

Background to Receive Offload and Checksum Offload Settings

Some network drivers have provisions to combine multiple physical frames into a single, large frame (of anything up to 64Kb) that is then passed on to the kernel network stack in a single operation. Network card vendors may refer to this as frame coalescing, large receive offload or generic receive offload. The goal is to improve efficiency by reducing the number of interrupts and copy operations from driver to kernel.

In addition to frame coalescing, some network drivers also perform TCP checksum offloading, that is, they perform TCP checksum validations for incoming packets and compute and set the checksums for outgoing packets. The goal is to improve efficiency by offloading these tasks from the kernel software to the network card hardware.

To view the current offload settings of a network interface, use the following command (you may need to do this as the root user):

# ethtool -k eth1
Offload parameters for eth1:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
generic-receive-offload: on 

The actual interpretation of the fields might differ per driver, but typically, generic-receive-offload indicates that the network driver is coalescing frames. In addition, the driver may or may not be filling in checksums for the resulting large frames, this can depend on other offload settings, such as rx-checksumming or tcp-segmentation-offload or simply differ per driver implementation.

Frames for which the checksum was not filled in correctly are dropped because they fail the checksum validations performed by the collector.

Note that checksum validation is only be attempted on frames that are not coalesced, and that have a checksum field which is not blank.

Disable Offloading Settings

To avoid large frame issues, disable the offload settings of the network card in order to stop it from coalescing frames altogether. Disable the generic-receive, tcp-segmentation and generic-segmentation offloads using the following command for each interface that the Collector is monitoring:

# ethtool -K eth1  tso off gso off gro off

Enable Jumbo Frames

If you are still observing Jumbo frames after disabling offloading for all capture interfaces, then you need to increase the maximum frame size of the collector. Go to the Configuration tab, and click on the Security button on the left. Next, click on Jumbo frames in the Security panel in the lower left of the screen. Follow the instruction on screen to set the maximum frame size.

Disable TCP Checksum Validation in Collector

To disable TCP checksum validation on the reporter system, enter the following commands as the RUEI user:

execsql config_set_profile_value System_name config TcpDoChecksum add no

where System_name is the collector profile you want to configure. Replace this with the actual collector profile name.

Disable TCP Checksums Offloading

If the driver is not filling in TCP checksums properly due to checksum offloading, and the collector statistics show checksum errors, disable it via the command:

# ethtool -K eth1 rx off tx off

The driver implementations might differ for different vendors, to the point that some might not even let you change any of these settings. Contact Oracle Support if you are unable to change the settings of your drivers successfully and are still observing packet loss in the collector.

G.20 Errors During Installation on RedHat Enterprise/Oracle Linux 6.x

Section 2.1.6.2, "Installing RedHat Enterprise/Oracle Linux 6.x Prerequisites" of this guide instructs you to issue the following command to install all optional fonts.

rpm -Uhv fonts-*

This command may fail with a message similar to the following:

Transaction Check Error: 
file /usr/share/fonts/opensymbol/opens___.ttf conflicts between attempted 
installs of openoffice.org-opensymbol-fonts-1:3.2.1-19.6.0.1.el6_2.7.noarch 
and libreoffice-opensymbol-fonts-1:4.0.4.2-9.0.1.el6.noarch 

To workaround this issue, use the following command to install fonts:

yum install -y *-fonts --exclude=libreoffice*

G.21 SSL Error on RedHat Enterprise/Oracle Linux 6.x

An error similar to the following may be displayed:

appSensor, version ux-collector-12.1.0.6.1-20140818-collector (Aug 18
2014(11:41:41), adc4150376) RUEI_12.1.0.6.0_LINUX.X64_140818 , 64-bit
Copyright (c) 2003, 2014, Oracle, All rights reserved.
Running as instance wg
Reading configuration in "wg/config".
Finished loading configuration.
Device "eth0" initialized for capture                                      
OK
#####   Cannot open /u01/ruei/opt/collector/lib64/libssl.so:
/u01/ruei/opt/collector/lib64/libssl.so: symbol EVP_aes_128_gcm, version
libcrypto.so.10 not defined in file libcrypto.so.10 with link time reference
#####   Plugin "libssl" failed to load
Loading plugin "libssl"                                                  
FAILED
#####   Error reading plugin configuration
 Cannot open /u01/ruei/opt/collector/lib64/libssl.so:
/u01/ruei/opt/collector/lib64/libssl.so: symbol EVP_aes_128_gcm, version
libcrypto.so.10 not defined in file libcrypto.so.10 with link time reference
 Plugin "libssl" failed to load
 wg/config/plugins.cfg, 15: User give up
 Error reading plugin configuration
 Collector exited, initialization failed 

This indicates that the incorrect version of OpenSSL is running. Make sure that you have applied the latest OpenSSL patches for your operating system using the appropriate commands (for example, yum update or up2date). Applying the latest OpenSSL patches helps improve the security of the system.