A Troubleshooting Enterprise Manager

This appendix describes solutions to common problems and scenarios that you might encounter when installing or upgrading Enterprise Manager.

Installation Issues

This section lists some of the most commonly encountered installation issues, and their resolutions.

Installation Fails with an Abnormal Termination

If there is a daily cron job that is running on the system where you are installing Grid Control that cleans up the /tmp/ directory, the installation might fail with an abnormal termination and the installActions.err file will log the following error: java.lang.UnsatisfiedLinkError: no nio in java.library.path.

The workaround is to set the TMP and TEMP environment variables to a directory other than the default /tmp and execute the ./runInstaller.

PERL Environment Variable is Forced on the environment During an Enterprise Manager 10g R2 (10.2.0.2) Installation

In a Microsoft Windows environment, if you have an existing PERL5LIB environment variable, the Enterprise Manager Grid Control installation will forcible overwrite this variable, in turn, forcing other applications on this host to use the new Perl version that get installed during the Management Service installation.

To work around this issue, rename the existing Perl variable as PERL5LIB_TMP before the OMS installation starts. You can later (after the installation is complete) change the PERL5LIB_TMP variable to PERL5LIB

Note:

If the Perl environment variable is not set, remove this variable from the Environment Variables. To do this, from the Control Panel, go to Environment Variable under Systems.

Installation Fails Due to Network Configuration Issues

The installation of Enterprise Manager Grid Control may fail if there are network configuration and connectivity problems.

To avoid this issue, do the following:

Ensure that the HOSTS file mentions the IP address and full canonical name followed by a short name or alias of the computer where OMS or Management Agent is to be installed.
Ensure that there are no connectivity issues between the server processes.

Obtaining Full Canonical Name

The fully qualified domain name can be obtained with a type of double-lookup or reverse-lookup. Check the /etc/nsswitch.conf file to see how lookups are to be performed by the operating system.

To get the fully qualified domain name, on the computer where you want to install OMS or Management Agent, run the following commend:

hostname

To get the IP address, on the computer where you want to install OMS or Management Agent, run the following commend:

ifconfig

(For Microsoft Windows computer, run ipconfig)

Checking HOST File for Correct Format

The HOSTS file should have the following format:

10.10.10.10 omsmachine.domain omsmachinealias 
20.20.20.20 agentmachine.domain agentmachine

Note:

On Microsoft Windows computer, the HOSTS file is under $WINDOWS/system32/drivers/etc. On Unix machines, it is under /etc.

If OMS and Management Agent are on the same computer, then it is acceptable to have one IPaddress and host name listed.

IMPORTANT:

IDo not run OMS on a computer that is DHCP-enabled. Oracle strongly recommends you to use a static host name and IP address assigned on the network for Enterprise Manager Grid Control components to function properly.

Verifying Network Connectivity

Run the following to test the network configuration and ensure that there are no connectivity issues:

On the computer where you want to install OMS:

ping <ip_address>
ping  <omsmachine.domain>
ping <omsmachine>
nslookup <ip_address>
nslookup <omsmachine.domain>
nslookup <omsmachine>

On the computer where you want to install Management Agent:

ping <ip_address>
ping  <agentmachine.domain>
ping <agentmachine>
nslookup <ip_address>
nslookup <agentmachine.domain>
nslookup <agentmachine>

Management Agent Installation Fails

If the Management Agent installation fails, look into the emctl status log to diagnose the reason for installation failure. You can view this log by executing the following command:

<AGENT_HOME>/bin/emctl status agent

A sample log file follows and shows some of the typical problem areas shown in bold.

Oracle Enterprise Manager 10g Release 10.2.0.0.0.Copyright (c) 1996, 2005 Oracle Corporation.  All rights reserved.---------------------------------------------------------------Agent Version     : 10.2.0.2.0
OMS Version       : 10.2.0.2.0
Protocol Version  : 10.2.0.2.0Agent Home        : /scratch/OracleHomes2/agent10gAgent binaries    : /scratch/OracleHomes2/agent10gAgent Process ID  : 9985Parent Process ID : 29893
Agent URL         : https://foo.abc.com:1831/emd/main/
Repository URL    : https://foo.abc.com:1159/em/upload
Started at        : 2005-09-25 21:31:00Started by user   : pjohnLast Reload       : 2005-09-25 21:31:00Last successful upload                       : (none)
Last attempted upload                        : (none)Total Megabytes of XML files uploaded so far :     0.00Number of XML files pending upload           :     2434Size of XML files pending upload(MB)         :    21.31Available disk space on upload filesystem    :    17.78%Last attempted heartbeat to OMS              : 2005-09-26 02:40:40Last successful heartbeat to OMS             : unknown---------------------------------------------------------------
Agent is Running and Ready

Prerequisite Check Fails with Directories Not Empty Error During Retry

During an agent installation using Agent Deploy, the installation fails abruptly, displaying the Failure page. On clicking Retry, the installation fails again at the Prerequisite Check phase with an error stating the directories are not empty.

This could be because Oracle Universal Installer (OUI) is still running though the SSH connection that is closed on the remote host.

To resolve this issue, on the remote host, check if OUI is still running. Execute the following command to verify this:

ps -aef | grep -i ora

If OUI is still running, wait till OUI processes are complete and restart the SSH daemon. Now, you can click Retry to perform the installation.

Note:

For more information on running the prerequisite checks in standalone mode, see Running the Prerequisite Check in Standalone Mode.

Agent Deployment on Linux Oracle RAC 10.2 Cluster Fails

Agent deployment on a 10.2 release of an Oracle RAC cluster may fail due to a lost SSH connection during the installation process.

This can happen if the LoginGraceTime value in the sshd_config file is 0 (zero). The zero value gives an indefinite time for SSH authentication.

To resolve this issue, modify the LoginGraceTime value in the /etc/ssh/sshd_config file be a higher value. The default value is 120 seconds. This means that the server will disconnect after this time if you have not successfully logged in.

To resolve this issue, modify the LoginGraceTime value in the /etc/ssh/sshd_config file to be a higher value. If the value is set to 0 (zero), there is no definite time limit for authentication.

SSH Verification Fails During Agent Installation

The most common reasons for SSH Verification to fail are the following:

The server settings in /etc/sshd/sshd_config file do not allow ssh for user $USER.
The server may have disabled the public key-based authentication.
The client public key on the server may be outdated.
You may not have passed the -shared option for shared remote users, or may have passed this option for non-shared remote users.

Verify the server setting and rerun the script to set up SSH successfully.

Note:

For more information on how to set up SSH, see SSH (Secure Shell) Setup.

Sample sshd_config File

The following sshd_config file sample is a server-wide configuration file with all the variables.

#  $OpenBSD: sshd_config,v 1.59 2002/09/25 11:17:16 markus Exp $# This is the sshd server system-wide configuration file.  See# sshd_config(5) for more information.# This sshd was compiled with PATH=/usr/local/bin:/bin:/usr/bin# The strategy used for options in the default sshd_config shipped with# OpenSSH is to specify options with their default values where# possible, but leave them commented out.  Uncommented options change a# default value.#Port 22#Protocol 2,1#ListenAddress 0.0.0.0#ListenAddress ::# HostKey for protocol version 1#HostKey /etc/ssh/ssh_host_key# HostKeys for protocol version 2#HostKey /etc/ssh/ssh_host_rsa_key#HostKey /etc/ssh/ssh_host_dsa_key# Lifetime and size of ephemeral version 1 server key#KeyRegenerationInterval 3600#ServerKeyBits 768# Logging#obsoletes QuietMode#SyslogFacility AUTHSyslogFacility AUTHPRIV
#LogLevel INFO

# Authentication:#LoginGraceTime 120#PermitRootLogin yes#StrictModes yes

#RSAAuthentication yes#PubkeyAuthentication yes#AuthorizedKeysFile .ssh/authorized_keys# rhosts authentication should not be used#RhostsAuthentication no# Don't read the user's ~/.rhosts and ~/.shosts files#IgnoreRhosts yes# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts#RhostsRSAAuthentication no# similar for protocol version 2#HostbasedAuthentication no# Change to yes if you don't trust ~/.ssh/known_hosts for# RhostsRSAAuthentication and HostbasedAuthentication#IgnoreUserKnownHosts no# To disable tunneled clear text passwords, change to no here!#PasswordAuthentication yes#PermitEmptyPasswords no# Change to no to disable s/key passwords#ChallengeResponseAuthentication yes# Kerberos options#KerberosAuthentication no#KerberosOrLocalPasswd yes#KerberosTicketCleanup yes#AFSTokenPassing no# Kerberos TGT Passing only works with the AFS kaserver#KerberosTgtPassing no# Set this to 'yes' to enable PAM keyboard-interactive authentication# Warning: enabling this may bypass the setting of 'PasswordAuthentication'#PAMAuthenticationViaKbdInt no#X11Forwarding noX11Forwarding yes#X11DisplayOffset 10#X11UseLocalhost yes#PrintMotd yes#PrintLastLog yes
#KeepAlive yes
#UseLogin no#UsePrivilegeSeparation yes#PermitUserEnvironment no#Compression yes#MaxStartups 10# no default banner path#Banner /some/path#VerifyReverseMapping no#ShowPatchLevel no# override default of no subsystemsSubsystem   sftp    /usr/libexec/openssh/sftp-server

SSH Setup Fails with "Invalid Port Number" Error

The SSH script when executed, is built to automatically verify the setup at the end, by executing the following command:

ssh -l <user> <remotemachine> 'date'

At the time of verification, you may encounter an "Invalid Port Error" indicating that the SSH setup was not successful.

This can happen if the ssh.exe (sshUserSetupNT.sh script) is not being invoked from the cygwin home directory.

To resolve this issue, ensure the sshUserSetupNT.sh script on the local OMS machine is being executed from within the cygwin (BASH) shell only. The script will fail to execute if done from outside this location.

If there are multiple Cygwin installations, and you want to find out which ssh.exe is being invoked, execute the following command:

C:\Cygwin\bin\which ssh

For example, when you execute the previously mentioned command, and it returns a result that is similar to the following:

\cygdrive\c\WINDOWS\ssh

This indicates that the ssh.exe file from Cygwin is not being invoked as there is C:\windows that is present before C:\Cygwin\bin in the PATH environment variable.

To resolve this issue, rename this ssh.exe as follows:

-C:\cygwin>move c:\WINDOWS\ssh.exe c:\WINDOWS\ssh.exe1 
          1 file(s) moved.

Now, execute the C:\Cygwin which ssh command again.

The result should be similar to "\usr\bin\ssh".

This verifies that ssh.exe file is being invoked from the correct location (that is, from your C:\Cygwin\bin folder).

Note:

You must ensure C:\cygwin is the default installation directory for the Cygwin binaries.

If you install Cygwin at a location other than c:\cygwin (default location), it can cause the SSH setup to fail, and in turn, the agent installation will fail too.

To work around this issue, you must either install Cygwin in the default directory (c:\cygwin), or update the ssPaths_msplats.properties file with the correct path to the Cygwin binaries.

You can look into the following remote registry key to find out the correct Cygwin path:

HKEY_LOCAL_MACHINE\SOFTWARE\Cygnus Solutions\Cygwin\mounts v2\

Note:

For more information on how to set up SSH, see SSH (Secure Shell) Setup.

sshConnectivity.sh Script Fails

If you are executing the sshConnectivity.sh script on Cygwin version 5.2, the script may fail and result in the following error:

"JAVA.LANG.NOCLASSDEFFOUNDERROR"

To workaround this issue, ensure the Oracle home in the Cygwin style path is defined as follows:

ORACLE_HOME="c:/oraclehomes/oms10g/oracle"

You can find out the currently installed Cygwin version by executing the uname command on the Cygwin window.

Note:

For more information on using the sshConnectivity.sh script, see Setting Up SSH Using sshConnectivity.sh (Only For 10.2.0.2 Enterprise Manager Grid Control).

Troubleshooting the "command cygrunsrv not found" Error.

During the SSH daemon setup, you may encounter a "command cygrunsrv not found" error. This can occur due to one of the following two reasons:

The sshd service is not running.
The Cygwin installation was not successful.

If SSHD Service Is Not Running

Create the sshd service, and then start a new sshd service from the cygwin directory.

To create the SSHD service, you must execute the following command:

ssh-host-config

The Cygwin script that runs when this command is executed will prompt you to answer several questions. Specify yes for the following questions:

privilege separation
install sshd as a service

Specify no when the script prompts you to answer whether or not to "create local user sshd".

When the script prompts you to specify a value for Cygwin, type ntsec (CYGWIN="binmode tty ntsec").

Now that the SSHD service is created, you can start the service by executing the following command:

cygrunsrv -start sshd

If Your Cygwin Installation Was Unsuccessful

If restarting the SSHD service does not resolve the error, then you must reinstall Cygwin. To do this:

Remove the Keys and Subkeys under Cygnus Solutions using regedit.
Remove the Cygwin directory (C:\cygwin), and all Cygwin icons.
Remove the .ssh directory from the Documents and Settings folder of the domain user.
Reinstall Cygwin.

For detailed instructions on Cygwin installation, see Setting Up SSH Server (SSHD) on Microsoft Windows (Only For 10.2.0.1 and 10.2.0.3 Enterprise Manager Grid Control)
Execute the following command to start SSH daemon:
```
cygrunsrv -start sshd
```

SSH Setup Verification Fails with "Read from socket failed: Connection reset by peer." Error

After the SSH setup is complete, the script automatically executes the following verification command:

ssh -l <user> <remotemachine> 'date'

If this command returns an error stating "Read from socket failed: Connection reset by peer", this means SSH was incorrectly set up. To resolve this issue, go to the remote machine where you attempted to set up SSH and do the following:

Stop the SSHD service (cygrunsrv -stop sshd).
Go to the etc directory (cd /etc).
Change the SSH file owner to the appropriate system (chown <SYSTEM> ssh*).

Go to the Cygwin command prompt and execute the following:

chmod 644 /etc/ssh*
chmod 755 /var/empty
chmod 644 /var/log/sshd.log

Now, execute the verification command from the Management Service (OMS) machine (ssh -l <user> <remote machine> 'date'). This should display the date correctly, suggesting the SSH setup was successful.
Finally, start the SSHD service (from /usr/bin/sshd), or by executing cygrunsrv -start sshd.
Now, execute the verification command again from the OMS machine (ssh -l <user> <remote machine> 'date'). This should display the date correctly, suggesting the SSH setup was successful.

SSHD Service Fails to Start

During SSHD configuration, the SSHD service is created for the local account by default. When you log in as a domain user, this account is not recognized by the service, and does not start up.

To resolve this issue, you must change the SSHD service "Log On As" value from LocalSystem to the domain user. To do this, complete the following steps:

Right-click on My Computer and select Manage.
In the Computer Management dialog box that appears, click Services under Services and Applications.
In the right pane, select the Cygwin SSHD service, right-click and go to Properties.
In the Cygwin SSHD Properties window that appears, select This Account.
Now, specify the appropriate domain name and user (in the form of domain\user, for example, FOO-US\pjohn).
Specify the password for this user, and click Apply.

Now, go to the Cygwin command prompt and execute the following:

chmod 644 /etc/ssh*
chmod 755 /var/empty
chmod 644 /var/log/sshd.log

Start SSHD by executing the following command:
```
/usr/sbin/sshd
```

Timezone Prerequisite Check Fails

The timezone prerequisite check (timezone_check) will fail if the TZ environment variable is not set on the SSH daemon of the remote host.

To resolve this issue, you must set the TZ environment variable on the SSH daemon of the remote host. See Setting Up the Timezone Variable on Remote Hosts for more information.

Alternatively, you do the following:

If you are installing or upgrading the agent from the default software location, set the timezone environment variable by specifying the following in the Additional Parameters section of the Agent Deploy application:
```
-z <timezone>
For example, -z PST8PDT
```
If you are installing the agent from a nondefault software location, you must specify the timezone environment variable using the following command:
```
s_timeZone=<timezone>
For example, s_timezone=PST8PDT
```

OMS Version Is Not Displayed

If the OMS version is not displayed in the log file, it could mean that the installed agent is not registered with a secure and locked Management Service (OMS).

You can verify this by executing the following commands:

emctl status oms
emctl status agent

To resolve this issue, you must manually secure the Management Agent by executing the following command. However, note that even after securing the Management Agent, some data might still be transferred over the network without being encrypted.

<AGENT_HOME>/bin/emctl secure agent -reg_passwd <password>

Discrepancy Between Agent and Repository URL Protocols

If the agent installation is successful, the protocol for both agent and the repository URLs are the same. That is, both URLs start with the https protocol (meaning both are secure).

If the protocol for the agent URL is displayed as http instead of https, this means that the agent is not secure.

To resolve this issue, you must secure the agent manually by executing the following command. However, note that even after securing the Management Agent, some data might still be transferred over the network without being encrypted.

<AGENT_HOME>/bin/emctl secure agent -reg_passwd <password>

Last Successful Upload Does Not Have a Time Stamp

If there is no time stamp against this parameter in the log (displays Null), it means that the agent is unable to upload any data.

To resolve this issue, you must perform a manual upload of the data by executing the following command, and then check the log again:

<AGENT_HOME>/bin emctl upload

emctl status Log File is Empty

If the agent is not ready and running, the emctl status log displays only the copyright information. None of the parameters listed in the sample log is displayed.

The issue can occur due to any of the following reasons:

Agent is not secure: To manually secure the agent, execute the following command. However, note that even after securing the Management Agent, some data might still be transferred over the network without being encrypted.
```
<AGENT_HOME>/bin emctl secure agent -reg_passwd <password>
```
Agent is not running: Check if the agent is running. If not, you can start the agent manually by executing the following command:
```
<AGENT_HOME>/bin emctl start agent
```
Agent port is not correct: Verify whether the agent is connecting to the correct port. To verify the port, look into the sysman/config/emd.properties file:

You must also ensure the following are correctly set in the emd.properties file:
1. REPOSITORY_URL: Verify this URL (http://<hostname>:port/em/upload). Here, ensure the host name and port are correct.
2. emdWalletSrcURL: Verify if the host name and port are correct in this URL (http://<hostname>:port/em/wallets/emd).
3. agentTZRegion: Ensure the time zone that is configured is correct.

Installation Fails When No Group ID or Group Name Is Created

Before you begin the installation of a Management Agent, ensure that the target host where you want to install the Management Agent has the appropriate users and operating system groups created. For information about creating required users and operating system groups, see Operating System Groups and Users Requirements.

Also ensure that the target host has the group name as well as the group id created. Otherwise, the installation will fail.

To check whether the group id, group name, user id, and user name are created, run the id command.

For example, when you run the id command, your result of the command could be something like this. Here you can see that the group 621 does not have a group name.

uid=31000(rparasur) gid=621 groups=502(g502),621,8500(dba),42424(svrtech)

To create the group name, run the following command (specific to this example):

groupadd g621 -g 621

Now if you run the id command, you will see this:

uid=31000(rparasur) gid=621(g621) groups=502(g502),621(g621),8500(dba),42424(svrtech)

This way, ensure that your target host has the user id, user name, group id, and group name created. For information about creating required users and operating system groups, see Operating System Groups and Users Requirements.

Installation Fails with Host Name Error on the Product-Specific Prerequisite Check Page

While installing a Management Agent, the installation can fail on the product-specific prerequisite page with a host name error. This happens when the host name specified in the host file does not map to the correct IP address of that host.

To resolve this issue, first run the following command on the host to list all the IP addresses configured and then note the IP address for that host:

ipconfig /all

After noting the IP address of that host, open the host file and check the entries for this host name. Ensure that the host name maps to the correct IP address of that host.

Installation Fails While Securing Agent Communication with OMS

While installing a Management Agent, an Agent Registration password is requested if the Management Service that is specified is found to be running in a secure mode. However, despite providing the password, if the installation fails while securing the communication between the Management Agent and the Management Service, then check the password you provided.

If the password you have provided is incorrect, first, obtain the correct password from the user who configured the Management Service for SSL.

If you have forgotten the password, then reset it using the emctl command or using the Grid Control console.

If you want to reset the password using emctl command, then run the following command from the Oracle home directory of the Agent:

emctl setpasswd <old password> <new password>

If you want to reset the password using the Grid Control console, then login to Grid Control as SYSMAN, select Setup from the top-right corner of the page, select Registration Passwords from the left menu panel, and on the Registration Password page, click Add Registration Password.

Oracle Configuration Manager Fails While Installing Management Agent

While installing the product, on the My Oracle Support page, you are required to provide email address and My Oracle Support (formerly Metalink) password. After providing the required information, when you try to navigate to the next screen and proceed with the installation, you may see some errors stating the installation of Oracle Configuration Manager failed. This error occurs when you (the user installing the Management Agent) does not have write permission on crontab.

To resolve this issue, check whether you (the user installing the Management Agent) have the necessary write permission on crontab. If you do not, then create an entry for your user account in the /usr/lib/cron/cron.allow file.

Configuration Issues

This section lists some of the most commonly encountered configuration issues, and their resolutions.

Configuration Assistants Fail During Enterprise Manager Installation

During the installation, if any of the configuration assistants fail to run successfully, then you can choose to run them in a standalone mode.

One of the ways to run the configuration assistants in a standalone more is to use the runConfig tool. The following is the sytax of its usage where options, such as OPTION1, OPTION2, and so on are options as described in Appendix A, "Options You Can Specify to Run runConfig Tool".

./runConfig.sh OPTION1=value1 OPTION2=value2 ...

Another way to run the configuration assistants in a standalone more is to use the configToolFailedCommands script that is created in the respective Oracle home directories. The syntax of its usage is:

./configToolFailedCommands

Note:

The individual log files for each configuration tool are available at the following directory:

ORACLE_HOME/cfgtoollogs/cfgfw

Besides the individual configuration logs, this directory also contains cfmLogger_timestamp.log (The timestamp depends on the local time and has a format such as cfmLogger_2005_08_19_01-27-05-AM.log.). This log file contains all the configuration tool logs.

For more information about the installation logs that are created and their locations, see Appendix B, "Installation and Configuration Log Files".

The following sections describe how the runConfig tool can be used to run the configuration assistants in a standalone mode, particularly when they fail during the installation.

Invoking the One-Off Patches Configuration Assistant in Standalone Mode

During the installation process, this configuration assistant is executed before the Management Service Configuration Assistant is run.

This configuration assistant applies the one-off patches that are required for a successful Enterprise Manager 10g Release 2 installation.

To run this configuration assistant in standalone mode, you must execute the following command from the Management Service Oracle home:

<OMS_HOME>/perl/bin/perl <OMS_HOME>/install/oneoffs/applyOneoffs.pl

Invoking the Database Configuration Assistant in Standalone Mode

To run the Database Configuration Assistant, you must invoke the runConfig.sh script as:

<DB_Home>/oui/bin/runConfig.sh ORACLE_HOME=<DB_HOME> ACTION=Configure MODE=Perform

On Microsoft Windows, replace runConfig.sh with runConfig.bat in the previously mentioned command.

Invoking the OMS Configuration Assistant in Standalone Mode

To run the OMSConfig Assistant, you must invoke the runConfig.sh as the following:

<OMS_Home>/oui/bin/runConfig.sh ORACLE_HOME=<OMS_HOME> ACTION=Configure MODE=Perform

On Microsoft Windows, replace runConfig.sh with runConfig.bat in the previously mentioned command.

Invoking the Agent Configuration Assistant in Standalone Mode

To run the AgentConfig Assistant, you must invoke the runConfig.sh as the following:

<Agent_Home>/oui/bin/runConfig.sh ORACLE_HOME=<AGENT_HOME> ACTION=Configure  MODE=Perform

On Microsoft Windows, replace runConfig.sh with runConfig.bat in the above-mentioned command.

Note:

While the preceding command can be used to execute the agentca script, Oracle recommends you execute the following command to invoke the configuration assistant:

Agent_Home/bin/agentca -f

If you want to run the agentca script on a Oracle RAC, you must execute the following command on each of the cluster nodes:

Agent_Home/bin/agentca -f -c "node1,node2,node3,...."

See Reconfiguring and Rediscovering Agent for more information.

Invoking the OC4J Configuration Assistant in Standalone Mode

If you want to deploy only the Rules Manager, execute the following commands:

/scratch/OracleHomes/oms10g/jdk/bin/java -Xmx512M -DemLocOverride=/scratch/OracleHomes/oms10g -classpath
/scratch/OracleHomes/oms10g/dcm/lib/dcm.jar:/scratch/OracleHomes/oms10g/jlib/e mConfigInstall.jar:/scratch/OracleHomes/oms10g/lib/classes12.zip:/scratch/Orac leHomes/oms10g/lib/dms.jar:/scratch/OracleHomes/oms10g/j2ee/home/oc4j.jar:/scr atch/OracleHomes/oms10g/lib/xschema.jar:/scratch/OracleHomes/oms10g/lib/xmlpar serv2.jar:/scratch/OracleHomes/oms10g/opmn/lib/ons.jar:/scratch/OracleHomes/om s10g/dcm/lib/oc4j_deploy_tools.jar oracle.j2ee.tools.deploy.Oc4jDeploy -oraclehome /scratch/OracleHomes/oms10g -verbose -inifile /scratch/OracleHomes/oms10g/j2ee/deploy.master -redeploy

On Microsoft Windows, replace runConfig.sh with runConfig.bat in the previously mentioned command.

Options You Can Specify to Run runConfig Tool

You can specify the following options to execute the runConfig tool.

ORACLE_HOME

This is the absolute location of the Oracle home. All products/top-level components under this Oracle home that have been installed using the Oracle Universal Installer (OUI) 10g R 2 (10.2) are eligible for the ACTION. Products installed using an OUI that is earlier to 10.2 are not eligible for this ACTION.

ACTION

This is a mandatory option. This option can have values such as configure/clone / addnode/addlanguage/deconfigure/patchsetConfigure.

MODE

This is optional, and can have values such as perform/showStatus/listTools. For example, if the value is perform, then that ACTION is performed.

If MODE is absent, the MODE option will assume a default value of listTools.

If the value is showStatus, the status of the last-performed ACTION is displayed to the user.

Examples

Tool1 - Optional    - Failed      Tool2 - Recommended - Succeeded       Tool3 - Optional    - Succeeded

If the value of MODE option is listTools, a list of recommended/optional/other tools for the specified ACTION are displayed.

Example

Recommended Tools(1): Tool2       Optional Tools (2): Tool1, Tool3       Other Tools(0):

COMPONENT_XML

This is optional. You can specify a comma-separated list of Aggregate XML names from the OH/inventory/ContentsXML/ConfigXML/ and only these XMLs and the items dependent on them will be configured. If there are two components with the same name in the ORACLE_HOME, the one that is of a later version is considered for the ACTION option.

RESPONSE_FILE

This is optional. This is the absolute location of the response file that is used to overwrite some existing parameters. Pairs such as ComponentID|variable = value are to be specified in this file, per line, per variable as:

oracle.assistants.server|var1=true oracle.network.client|var2=orcl

Example

RESPONSE_FILE=/scratch/rspfile.properties

Note:

Secure variables are not stored in the instance aggregate XML files and hence while running runConfig, if any of the configuration tools that you want to run use secure variables, such as passwords, you must supply the value of these secure variables using the RESPONSE_FILE option of runConfig. Otherwise, the tools with secure variables as arguments fail.

INV_PTR_LOC

This is optional. This is the full path of oraInst.loc file.

The orainst.loc file contains inventory_loc=<location of central inventory>

inst_group=<>

Example

INV_PTR_LOC=<absolute path of oraInst.loc>

RERUN

This is optional. Possible values are true and false. RERUN has a default value of false. This means that only failed tools or those tools that were skipped are executed. All those tools that were successfully executed are skipped during the rerun.

A RERUN=true value will execute all the tools anew, including the tools that completed successful runs.

Typical Usage of the runConfig.sh

A typical usage of the runConfig.sh script is as follows:

./runConfig.sh ORACLE_HOME=<path of database home> ACTION=configureMODE=perform COMPONENT_XML={encap_emseed.1_0_0_0_0.xml}

Note:

On Microsoft Windows, replace runConfig.sh with runConfig.bat or just runConfig (without the file extension).

runConfig Log Files

The log files for runConfig configActions<timestamp>.log/.err are generated under ORACLE_HOME/cfgtoollogs/oui/.

Enterprise Manager Deployment Fails

Enterprise Manager deployment may fail due to the Rules Manager deployment failure.

To resolve this issue, redeploy Enterprise Manager by following these steps:

Move OH/j2ee/deploy.master to OH/j2ee/deploy.master.bak.
Execute the OH/bin/EMDeploy script.
Restore the OH/j2ee/deploy.master. That is, execute mv OH/j2ee/deploy.master.bak OH/j2ee/deploy.master

Oracle Management Service Configuration Fails

Oracle Management Service configuration may fail due the following reasons.

Oracle Management Service Fails While Deploying Enterprise Manager AgentPush Application

The cfgfw logs display the following error:

Redeploying application 'EMAgentPush' to OC4J instance 'OC4J_EMPROV'. FAILED! ERROR: Caught exception while deploying 'EMAgentPush' to 'OC4J_EMPROV':java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

This error is due to Ipv6 entries in /etc/hosts file. When prompted to execute root.sh or when configuration fails, do the following:

In <OMS Home>/sysman/install/EMDeployTool.pm, include "-Djava.net.preferIPv4Stack=true" in the command executed in deployEmEar ().
In <OMS Home>/opmn/conf/opmn.xml, include "-Djava.net.preferIPv4Stack=true" in java-options of all OC4J processes.

In 'Enterprise Manager with new Database' Install, Oracle Management Service Configuration Fails While Unlocking Passwords

The cfgfw logs display the following error:

Failed to initialize JDBC Connection

This is caused when listener does not start during NetCA execution and the following error will be present in the installActions log:

Listener start failed. Listener may already be running.

To rectify this error, add the following line in <DB Home>/network/admin/listener.ora:

SUBSCRIBE_FOR_NODE_DOWN_EVENT_<listener_name>=OFF

Then, restart the listener.

Dropping of Repository Hangs If SYSMAN Sessions are Active

While installing Enterprise Manager using existing database, Oracle Management Service configuration hangs while dropping the repository. This is due to active SYSMAN sessions connected to the database.

To resolve this issue, shutdown any existing Enterprise Manager sessions (both Grid Control and Database Control) or other SQLPLUS SYSMAN sessions.

If Oracle Management Service Configuration is Retried, oracle.sysman.emSDK.svlt.ConsoleServerHost and oracle.sysman.emSDK.svlt.ConsoleServerName in emoms.properties are Swapped and There is an Extra Underscore in ConsoleServerHost

This problem only occurs with 10.2.0.1.0 Additional Oracle Management Server installation.

To resolve this issue, swap the values and remove the extra underscore in ConsoleServerName in emoms.properties present in <OMS_ORACLE_HOME>/sysman/config directory.

Enterprise Manager Upgrade and Recovery Issues

The Enterprise Manager 10g Release 2 upgrade is an out-of-place upgrade, meaning that Enterprise Manager 10g Release 2 Oracle homes are separate from the old homes. If you decide to abort the upgrade process during the copying phase (copying of the binaries), you can simply revert to your old 10g Release 1 installation.

The upgrade process creates a new OMS home and a new database home. The Upgrade assistants upgrade the datafiles and SYSMAN schema, and then configure the new Oracle homes.

Caution:

Do not abort the upgrade process during the configuration phase, as this will corrupt the installation. You will not be able to revert to the old 10g Release 1 installation either.

Agent Upgrade Issues

This section lists some of the issues that you may encounter during an agent upgrade.

Agent Does Not Start Up After Upgrade

During an agent upgrade from 10.1.0.n to 10.2.0.2, the agent may fail to start up after upgrade if the time zone that is configured for the upgraded agent is different from the originally configured agent.

You can correct this issue by changing the time zone. To do this, execute the following command from the upgraded agent home:

emctl resetTZ agent

This command will correct the agent-side time zone, and specify an additional command to be run against the repository to correct the value there.

Caution:

Before you change the time zone, check if there are any blackouts that are currently running or scheduled to run on any of the targets that are monitored by the upgraded agent. Do the following to check this:

In the Grid Control console, go to the All Targets page under Targets and locate the Agent in the list of targets. Click the agent name link. The Agent home page appears.
The list of targets monitored by the agent will be listed in the Monitored Targets section.
For each target in the list, click the target name to view the target home page.
Here, in the Related Links section, click Blackouts to check any blackouts that are currently running or may be scheduled to run in the future.
If such blackouts exist, you must stop all the blackouts that are running on all the targets monitored by this agent.
From the console, stop all the targets that are scheduled to run on any of these monitored targets.
Now, run the following command from the agent home to reset the time zone;
```
emctl resetTZ agent
```
After the time zone is reset, you can create new blackouts on the targets.

Missing Directories When Upgrading Agent from 10.1.0.5 to 10.2.0.1

In a Windows NT RAC agent upgrade scenario, after the AgentOnly shiphome installer has completed installation, the utility <Upgrade AOH>/oui/bin/upgrade has to be executed on every single node in the RAC to complete the agent upgrade.

Agent Deploy Application Keeps Running Even After Upgrade on an NT Shared Location Is Successful

To upgrade the Management Agent on a shared cluster, follow these steps:

On the target machine, make sure that the .backup extension is associated with cmd.exe.
If this is not the case, then during the upgrade when the Agent Deploy application displays installing on the progress page, check the target machines.
With the exception of the first host in the hosts list, a message may appear for all other hosts displaying the following text:
```
Windows cannot open this file emctl.bat.upgrade.backup
```
Select cmd.exe to open this file and then you will see the upgrade process proceed.

To check the extension association, open Microsoft Windows File Explorer. From the menu, select Tools, then select Folder Options, and then select the File Types tab. Search for the .backup extension.

Enterprise Manager Recovery

This sections provides the instructions to be followed to perform an Enterprise Manager recovery.

Steps to Follow for Agent Recovery

Use the following instructions to perform an agent recovery:

After exiting the installer, you must open a new window and change the directory to the <New_AgentHome>/bin.
Execute the script ./upgrade_recover.
You can then start the old agent and continue using it. If you want to remove the installed binaries of the new agent home, use the Remove Productions function of the installer.

Steps to Follow for OMS Recovery

If the schema has been upgraded or the upgrade was incomplete, you must manually restore the database to the backup that was taken prior to executing the OMS upgrade.

You can determine the status of the repository upgrade by looking into the log file at <New_OMSHome>/sysman/log/emrepmgr.log.<proc_id>. The last line of the log file provides the status of the upgrade. If the upgrade was completed without errors, it reads Repository Upgrade Successful. If not, the message Repository Upgrade has errors… is displayed.

Follow these instructions to perform an OMS recovery:

Note:

Before you attempt to restore the database, you must exit the Upgrade wizard. You must also ensure there are no OMS processes that are running. See Shut Down Grid Control Before Upgrade for more information on shutting down the Enterprise Manager processes.

Caution:

Ensure all OMS processes are completely shut down. If not, the system may become unstable after the upgrade.

Restore the database to the backup. See Oracle Database Administrator's Guide for more information.
After the database is restored, start the database and listener to ensure successful restoration.
Open a new window and change the directory to the <New_OMSHome>/bin.
Now, execute the ./upgrade_recover.

Start the old OMS and continue to use it. If you want to remove the binaries of the newly installed OMS home, use the Remove Productions function in the installer.

Steps to Re-create the Repository

If the Management Service configuration plugin fails due to the repository creation failure, rerunning the configuration tool from Oracle Universal Installer drops the repository and re-creates it. However, if you want to manually drop the repository, complete the following steps:

Dropping the Repository

Stop the OPMN processes (<OMSHOME>/bin/opmnctl stopall), Management Service (<OMS_HOME>/bin/emctl stop oms), and Agent (<AGENT_HOME>/bin/emctl stop agent) before dropping the repository.
Set ORACLE_HOME to OMS_OracleHome
Execute OMS_Home/sysman/admin/emdrep/bin/RepManager <hostname> <port> <SID> -action drop -output_file <log_file>

Creating the Repository

Set ORACLE_HOME to OMS_OracleHome.
Execute OMS_Home/sysman/admin/emdrep/bin/RepManager <hostname> <port> <SID> -action create -output_file <log_file>.

Note:

After recreating the repository, you must run the following command on all the Management Service Oracle homes to reconfigure the emkey:

emctl config emkey -repos -force

This command overwrites the emkey.ora file with the newly generated emkey.

Caution:

While recreating the repository using ./Repmanager -action create command, you may encounter the following error message:

java.sql.SQLExecution: ORA-28000: the account is locked during recreation of repository.

Workaround

This error may occur if there are processes or multiple Management Services that are trying to connect to the database with incorrect SYSMAN credentials. If there are multiple login failures, the database becomes locked up and shuts down the monitoring agent.

You can resolve this issue by shutting down all the Management Services connected to the database, along with the monitoring agent.

Repository Creation Fails

When installing Enterprise Manager using an existing database, the repository creation fails.

This may happen if the profile of the Password Verification resource name in the database has a value that is other than Default. To resolve this issue:

Change the Password Verification profile value to Default.
Create the repository using RepManager command.

You may also see the following error at the end of the installation when repository creation fails:

Enter user-name: Error accessing PRODUCT_USER_PROFILE 
Warning:  Product user profile information not loaded! 
You may need to run PUPBLD.SQL as SYSTEM

In this case, ensure that PRODUCT_USER_PROFILE exists in the repository. Otherwise, run the PUPBLD.SQL as SYSTEM.

Collection Errors After Upgrade

If you upgrade only the Management Service to 10g Release 2 without upgrading the monitoring agent, you may encounter the following collection errors:

Target Management Services and Repository
Type OMS and Repository
Metric Response
Collection Timestamp <session_time_stamp>
Error Type Collection Failure
Message Target is in Broken State. Reason - Target deleted from agent

To resolve this issue, upgrade the monitoring agent along with the Management Service to 10g Release 2.

Oracle Management Service Upgrade Issues

You may encounter problems during Management Service upgrade where the upgrade process aborts due to the following reasons.

OMS Upgrade Stops at OracleAS Upgrade Assistant Failure

The installation dialog box and the configuration framework log file (located at<New_OracleHome>/cfgtoollogs/cfgfw/oracle.sysman.top.oms_#date.log) lists SEVERE messages indicating the reason the Oracle Application Server Upgrade Assistant fails.

If the message displays permission denied on certain files, it means that the user running the installer may not have the correct permissions to run certain iAS configurations.

To resolve this issue, comment out the OracleAS configuration that contains these files and then retry the upgrade again. You can reapply the configurations after the upgrade is successfully completed.

OMS Configuration Stops at EMDeploy Failure

The most common reasons for EMDeploy to fail are if:

All Enterprise Manager processes are not shut down completely.

To shut down Enterprise Manager, execute the following commands:
```
<Oracle_Home>/opmn/bin/opmnctl stopall
<Oracle_Home>/bin/emctl stop em
```
See Shut Down Grid Control Before Upgrade for more information.
Symbolic links have been used instead of hard links

The <Oracle_Home>/Apache/<component> configuration files must be examined to ensure only hard links (and no symbolic links) were referenced. See Check for Symbolic Links for more information.

After you have successfully resolved these issues, perform the redeploy steps manually and click Retry on the Upgrade wizard.

OMS Configuration Stops at Repository Schema Failure (RepManager)

The most common reason the repository schema configuration fails is when it is not able to connect to the listener. The configuration framework log file (<New_OracleHome>/cfgtoollogs/cfgfw/oracle.sysman.top.oms_#date.log) indicates the reason for the repository schema upgrade failure.

To resolve this issue, you must verify whether or not the listener connecting to the OMS is valid and active.

Also, if you have installed the OMS using the Install Enterprise Manager Using New Database installation type, ensure there are no symbolic links being referenced. After you have successfully established the listener connections, click Retry on the Upgrade wizard.

Monitoring Agent Does Not Discover Upgraded Targets

If you have upgraded an Enterprise Manager Grid Control target (for example, database) independently (that is using a regular upgrade mechanism other than the Oracle Universal Installer), the monitoring agent may fail to discover this upgraded target.

This can happen if you have specified a different Oracle home value for the upgraded target other than the one that already existed.

To resolve this issue, you must manually configure the targets.xmlfile of the monitoring agent to update the configuration details of the upgraded Oracle home information, or log in to the Enterprise Manager console, select the appropriate target, and modify its configuration parameters to reflect the upgraded target parameters.

CSA Collector Is Not Discovered During Agent Upgrade

When a 10g Release 1 Management Service and its associated (monitoring) agent are upgraded at the same time, the agent upgrade does not discover the CSA Collector target.

To discover this target, you must run the agent configuration assistant (the agentca script) using the rediscovery option. See Rediscover and Reconfigure Targets on Standalone Agents for more information.

ias_admin Password Is Set To welcome1 After Upgrade

To resolve this issue, run the following command:

<New OMS Home>/bin/emctl set password welcome1 <New Password>

Oracle Management Service Upgrade Fails If Older Listener Is Running On A Port Other Than 1521

To resolve this issue, do the following:

Stop the older listener when prompted to execute allroot.sh. The Oracle Management Service upgrade will fail.
Set the listener from the new database to run from the same non-1521 port.
Run the upgrade again.

If TAF String Is Used in Grid Control, then Upgrade Fails

If TAF (Transparent Application Failover) string is used in 10.2.0.x.x GC, then the patching process will fail while patching an existing release of Enterprise Manager Grid Control to 10.2.0.2.0, 10.2.0.3.0, or 10.2.0.4.0 or higher.

To resolve this issue, follow below workarounds.

Workaround 1:

Change the ConnectDescriptor in the emoms.properties to the sqlnet ConnectDescriptor. Then try running the config tool again. After the installation completes, change the connect string back to TAF.

For example, the sqlnet connect descriptor is:

oracle.sysman.eml.mntr.emdRepConnectDescriptor=(DESCRIPTION\=(ADDRESS_LIST\=(ADDRESS\=(PROTOCOL\=TCP)(HOST\=stada37.us.oracle.com)(PORT\=1521)))(CONNECT_DATA\=(SERVICE_NAME\=emrep11.us.oracle.com)))

Alternatively, you can use this approach for Workaround 1:

Exit the OUI installer. Change the ConnectDescriptor in the emoms.properties to the sqlnet Connect Descriptor. Then run the following runConfig command to continue the install, and then after the installation completes, change the connect string back to TAF.

Navigate to the bin directory:
```
cd <OMS_ORACLE_HOME>/oui/bin
```

Run the following command:

For Linux:

./runConfig.sh ORACLE_HOME=<OMS_Oracle_Home> ACTION=patchsetConfigure MODE=perform COMPONENT_XML=oracle.sysman.top.oms.<version>.xml

Here, the version can be '10_2_0_3_0'.

For Microsoft Windows:

runConfig.bat ORACLE_HOME=<OMS_Oracle_Home> ACTION=patchsetConfigure MODE=perform COMPONENT_XML=oracle.sysman.top.oms.<version>.xml

Here, the version can be '10_2_0_3_0'.

Workaround 2:

If you upgraded the repository manually after the installation fails, then in order to run the remaining config tools by skipping the Repository Upgrade Config Tool, follow these steps:

Exit the OUI installer.
Edit the response file by adding the following parameter:
```
b_reposPatchUpgrade =false
```
Navigate to the bin directory:
```
cd <OMS_ORACLE_HOME>/oui/bin
```

Run the following command:

For Linux:

./runConfig.sh ORACLE_HOME=<OMS_Oracle_Home> ACTION=patchsetConfigure MODE=performn COMPONENT_XML=oracle.sysman.top.oms.<version>.xml

Here, the version can be '10_2_0_3_0'.

For Microsoft Windows:

runConfig.bat ORACLE_HOME=<OMS_Oracle_Home> ACTION=patchsetConfigure MODE=perform COMPONENT_XML=oracle.sysman.top.oms.<version>.xml

Here, the version can be '10_2_0_3_0'.

Troubleshooting Management Agent Issues

The Oracle Management Agent (Management Agent) monitors and manages different entities in your enterprise. If the system crashes or hangs due to an error, you can use the following options to diagnose, investigate and troubleshoot the problem:

Generating a Diagnosable Dump
Reference Counting
Tracking CPU Thread Times

Generating a Diagnosable Dump

A memory dump is printed onto the file system when any of these following events occur:

System crash or a memory exception
struct_id errors
Assert failure

A dump file can also be an user generated event. The dump file provides a snapshot of the state of the agent when the crash occurred. It contains the stack and memory details, the thread that is being used, and the heap memory. The dump file is useful in determining the cause of the issue and helps better diagnosis. You can examine the stack thoroughly and view the actual thread from which the event occurred.

The dump file can be viewed on a per thread basis or component-by-component basis. Depending on the error situation, a fatal or a non-fatal dump is generated. An agent crash for example, generates a fatal dump. The dump file including its location can be customized by specifying the following properties in the emd.properties file.

DUMPDIR: This is the directory in which the dump file is generated. By default, the file is stored in ORACLE_HOME/sysman/dump directory.
SIZETRACELIMIT: The trace file should not exceed the value specified here. The default limit is 20 MB.
MAXTRACES: The maximum number of trace files that can be retained at given point in time. By default a maximum of 10 files can be retained. The retaining policy ensures that fatal dumps are retained and non-fatal dumps are deleted.

Example 18.1 Non-Fatal Dump
EMAGENT Ver:10.2.0.5.0NON-FATAL DUMP
User-initiated dumpstate


----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
nmegsm_beginDump()+  call     B6F21870             B3F70E80 ? 2 ? B3F70E80 ?
855                                                818A248 ?
nmemdisp_DumpStateR  call     B6F277E0             804F328 ? B70D86AC ? 0 ? 1 ?
eq()+280                                           B3F70F10 ? 0 ?
nmemdisp_Dispatcher  call     nmemdisp_DumpStateR  804F328 ? 80D3930 ? 0 ?
_main()+1693                  eq()                 FFFFFFFF ? 77AFF4 ? 77C820 ?
nmehl_processIncomi  call     00000000             80B2D08 ? 0 ? 0 ? 0 ? 0 ? 0 ?
ngRequest()+1028
nmtw_runWork()+53    call     00000000             80B2D08 ? B71CC0EC ?
                                                   B3F713A4 ? B6F727D7 ?
                                                   8188CF8 ? 8085400 ?
nmttp_run()+129      call     B6F282E0             8188CF8 ? 8085400 ?
                                                   B70EFAF0 ? B71CC0EC ?
                                                   804F328 ? 8070810 ?
nmttp_runSystemThre  call     nmttp_run()          8188CF8 ? 8ADFF4 ? 0 ?
ad()+33                                            B3F714C8 ? 8A43CC ? 8188CF8 ?
start_thread()+172   call     00000000             8188CF8 ? 2 ? 2 ? 2 ?
__clone()+94         call     00000000             B3F71BA0 ? 0 ? 0 ? 0 ? 0 ?
                                                   0 ?
--------------------- Binary Stack Dump ---------------------
========== FRAME [1] (nmegsm_beginDump()+855 -> B6F21870) ==========
defined by frame pointers 0xb3f70ef4  and 0xb3f70a68
CALL TYPE: call   ERROR SIGNALED: no   CALLER: nmegsm_beginDump
Dump of memory from 0xb3f70af4 to 0xb3f70ef4
B3F70AF0          00000000 00000000 00000000      [............]
B3F70B00 00000000 00000000 00000000 00000000  [................]
        Repeat 55 times
B3F70E80 B6F75934 B6F75922 B6F75934 B6F75922  [4Y.."Y..4Y.."Y..]
B3F70E90 0818A248 0818A248 00000000 00000000  [H...H...........]
B3F70EA0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
B3F70EC0 00000000 00000000 00000000 B71CC0EC  [................]
B3F70ED0 080D3930 080D38F8 B3F70F10 00006480  [09...8.......d..]
B3F70EE0 080B3560 0804F328 00000001 0818A254  [`5..(.......T...]
B3F70EF0 00000000                             [....]           
========== FRAME [2] (nmemdisp_DumpStateReq()+280 -> B6F277E0) ==========
defined by frame pointers 0xb3f71138  and 0xb3f70ef4
CALL TYPE: call   ERROR SIGNALED: no   CALLER: nmemdisp_DumpStateReq
Dump of memory from 0xb3f70ef4 to 0xb3f71138     
B3F70EF0          B3F71138 B6F33C04 0804F328      [8....<..(...]
B3F70F00 B70D86AC 00000000 00000001 B3F70F10  [................]
B3F70F10 00000000 00000209 08053980 00000209  [.........9......]
B3F70F20 B4F98320 006B29B4 006B29B4 B7D546D7  [ ....)k..)k..F..]
B3F70F30 00000000 08189410 00000209 000001E9  [................]
B3F70F40 00000185 B4F9956C 00000209 00000209  [....l...........]
...
----- End of Call Stack Trace -----
------------------------------------------------------------------
Dumping component: targets
------------------------------------------------------------------
nmeetm_TargetManager = 0xb518a510

{


HEX dump
B518A510 00003072 0804F328 080850D8 B518A578  [r0..(....P..x...]
B518A520 B518A6B0 B518A700 080AC248 00000000  [........H.......]
B518A530 0000000A 00000000 0806D250 00000000  [........P.......]
B518A540 49767EC9 000000F2 00000000 080960F8  [.~vI.........`..]
B518A550 00000000 B5147780 00000000 00000000  [.....w..........]
B518A560 00000078 00000004 00000008 00000000  [x...............]
B518A570 00000000                             [....]
  ub4 struct_id = 12402
  nmectx* gctx = 0x804f328
  nmeulctx* lctx = 0x80850d8
  nmeumx_Mutex* mutex = 0xb518a578
  nmeumx_Mutex* tgtChgMutex = 0xb518a6b0
  nmeumx_Mutex* MXProgDisc_nmeetm = 0xb518a700
  nmedts_Targets* targets = 0x80ac248
  nmedts_Targets* oldTgts = (nil)
  ub4 propCompThrds = 10
  boolean bDynPropChanged = FALSE
  nmedt_Target* hostTarget = 0x806d250
  OraText* agentTgtGuid = (nil)
  time_t load_timestamp = 0xb518a540
  ubig_ora fileSize_nmeetm = 242
  OraText* parseError = (nil)
  OraText* emdTargetName_nmeetm = 0x80960f8 "stadm48.us.oracle.com:6321"
  boolean disableSelfMon_nmeetm = FALSE
  nmeuv_Vector* tgtChgEvents_nmeetm = 0xb5147780
=>   0x80b3878
=>   0x80b3668
=>   0x80de290
=>   0xb5a0f3e0
  boolean bFromCLI = FALSE
  boolean bNotAllowSave = FALSE
  sword dynamicPropReComputeInterval = 120
  sword dynamicPropReComputeMaxTries = 4
  ub4 maxDataRowsetFiles = 8
  ub4 maxDataRowsetFilesProp = 0
  ub4 DynPropRecomputeSeed = 0
}nmeetm_TargetManager
...
------------------------------------------------------------------
Dumping component: refcnt
------------------------------------------------------------------
nmeuca_CircularArray = 0xb5197720

{
      {
                thread ID = 0xb3f70cc0 "3019316128"
                struct ID = 10302
                address = 0x806d250
                count = 4
                action = 0xb70f01a4 "acquire"
                activity ID = 589827
                time (centi-seconds ago) = 134524
               call stack =
                        nmedt_Target_addReferenceCount()+87
                        nmeetm_getTargetInstanceCheckDeleted()+175
                        nmeetm_getTargetInstance()+77
                        nmeetm_getTarget()+80
                        nmecci_run()+1043
        }
   }

Command Line Options to Generate the Dump File

The dump file can also be generated using command line options. The command line options available are:

$emctl dumpstate agent: Generates a dump file at the current state.

$emctl dumpstate agent <subsystem>: Provides the status of a specific subsystem.

$emctl dumpstate agent list: Provides the list of components in the dump file.

$emctl dumpstate agent <comp1> <comp2> <comp3>: Provides the list of components in the dump file separated by spaces.

Platforms Supported

Diagnosable dumps can be generated on the following platforms:.

Linux (x86 and x86_64)
Solaris_Sparc
HP-IA64.C32
AIX.PPC32
HP-PA32 (only object dumps - no stack traces)

Reference Counting

Many of the Agent's data structures have a reference counter. This facilitates sharing of data between the subsystems like targets, metrics, jobs, in Enterprise Manager and avoids the need for duplicate objects. Reference counting is very useful in detecting memory leaks and memory corruption. Reference counting maintains a list of all reference counted objects accessed by a specific thread.

When a data structure is accessed, the reference counter associated with it is incremented. When the data structure is no longer being accessed, the reference counter is decremented. If the reference counter value is zero, the memory can be safely freed.

If the free action finds the object is not in the list maintained, it indicates that there is a memory corruption. At thread exit, objects left behind in the list indicate that there is a memory leak. When these errors occur, a dump file is generated and the errors are written into the emagent.trc file.

The reference counter dump contains the following details:

Thread ID
Struct ID
Object Address
Current Count
Action (acquire or release)
Timestamp
Call Stack

Caller stack dumps are disabled by default. To enable it, you must specify a non-zero value for the refCntCallStackDepth property in the emd.properties file. For example, refCntCallStackDepth=0 should be "=5"

Example A-1 Sample emagent.trc file

2008-12-16 11:37:48,614 Thread-2718141344 ERROR ThreadPool: nmttp_run: Memory  leak: unreleased reference counted object: 84b25e0, structID: 11001
2008-12-16 11:37:48,614 Thread-2718141344 ERROR ThreadPool: nmttp_run: Memory  leak: unreleased reference counted object: 8642a08, structID: 11002
2008-12-16 11:37:48,614 Thread-2718141344 ERROR ThreadPool: nmttp_run: Memory leak: unreleased reference counted object: 8585f18, structID: 11002
2008-12-16 11:37:48,614 Thread-2718141344 ERROR ThreadPool: nmttp_run: Memory leak: unreleased reference counted object: 8557de8, structID: 11002
2008-12-16 11:37:48,614 Thread-2718141344 ERROR ThreadPool: nmttp_run: Memory
leak: unreleased reference counted object: 856d4d8, structID: 11002
2008-12-16 11:37:48,615 Thread-2718141344 WARN  diagnostics.statemgr: /ade/aime1_dadvfb0451_ag/oracle/sysman/dump/emagent_2938_20081216113748.diagtrc dumped
2008-12-16 11:37:48,693 Thread-2718141344 WARN  diagnostics.statemgr: size:
24615,/ade/aime1_dadvfb0451_ag/oracle/sysman/dump/emagent_2938_20081216113748.diagtrc

Example A-2 Reference Count Dump

EMAGENT Ver:10.2.0.5.0
NON-FATAL DUMP
Memory leak on reference counted objects

-------------------- Binary Stack Dump ---------------------

========== FRAME [6] (__clone()+94 -> 00000000) ==========
defined by frame pointers 0x0  and 0xa20384c8
CALL TYPE: call   ERROR SIGNALED: no   CALLER: __clone

----- Argument/Register Address Dump -----
Argument/Register addr=0xb70ebcb0.
Dump of memory from 0xb70ebc70 to 0xb70ebdb0
B70EBC70 74746D6E 75725F70 6E65206E 00726574  [nmttp_run enter.]
B70EBC80 74746D6E 75725F70 7865206E 203A7469  [nmttp_run exit: ]
B70EBC90 74737953 253D6D65 57202C64 656B726F  [System=%d, Worke]
B70EBCA0 64253D72 6F54202C 3D6C6174 00006425  [r=%d, Total=%d..]
B70EBCB0 6F6D654D 6C207972 206B6165 72206E6F  [Memory leak on r]
B70EBCC0 72656665 65636E65 756F6320 6465746E  [eference counted]
B70EBCD0 6A626F20 73746365 00000000 74746D6E  [ objects....nmtt]
B70EBCE0 3A632E70 746E4520 6E697265 6D6E2067  [p.c: Entering nm]
B70EBCF0 5F707474 546E7572 526B7361 79727465  [ttp_runTaskRetry]
B70EBD00 00000000 74746D6E 75725F70 7361546E  [....nmttp_runTas]
B70EBD20 64253D74 00000000 74746D6E 3A632E70  [t=%d....nmttp.c:]
B70EBD30 746E4520 6E697265 6D6E2067 5F707474  [ Entering nmttp_]
B70EBD40 546E7572 006B7361 546E7572 206B7361  [runTask.runTask ]
B70EBD50 61657263 61206574 72687420 3A646165  [create a thread:]
B70EBD60 73795320 3D6D6574 202C6425 6B726F57  [ System=%d, Work]
B70EBD70 253D7265 54202C64 6C61746F 0064253D  [er=%d, Total=%d.]
B70EBD80 68676948 74615720 614D7265 69206B72  [High WaterMark i]
B70EBD90 203A2073 00006425 00000000 6C756F43  [s : %d......Coul]
B70EBDA0 6F6E2064 72632074 65746165 72687420  [d not create thr]

Argument/Register addr=0xa2038ba0.
Dump of memory from 0xa2038b60 to 0xa2038ca0
A2038B60 00000000 00000000 00000000 00000000  [................]
A2038B70 00000000 00000000 003E6380 A2038E00  [.........c>....¡é]
A2038B80 00000000 B62A6F78 B62A7378 B62A7978  [....xo*?xs*?xy*?]
A2038B90 003E7820 00000000 00000000 00000000  [ x>.............]
A2038BA0 A2038BA0 08346750 A2038BA0 00000001  [?..¡éPg4.?..¡é....]
A2038BB0 002997A0 00000000 00000000 00000000  [?.).............]
A2038BC0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
 
A2038BE0 B5493BE0 00519110 000018C9 00000B7A  [¨¤;I¦Ì..Q.¨¦...z...]
A2038BF0 00000000 A2038400 00000000 00000001  [.......¡é........]
A2038C00 085598F0 00000001 08516510 00000001  [e.U......eQ.....]
A2038C10 08559C78 00000000 00000000 00000001  [x.U.............]
A2038C20 00000000 00000000 00000000 00000000  [................]
A2038C30 00000000 00000001 00000000 00000001  [................]
A2038C40 00000000 00000000 00000000 00000000  [................]
  Repeat 5 times
{
thread ID = 0xa2038108 "2718141344"
struct ID = 11002
address = 0x8642a08
count = 3
activity ID = 16
time (centi-seconds ago) = 22
nmecci_addReferenceCount()+74
nmectc_getCollectionItem()+145
nmemdisp_GetActiveCollection()+317
nmemdisp_Dispatcher_main()+3276
nmehl_processIncomingRequest()+1028
}nmeuca_CircularArray

Note:

You can also use the following emctl command to view the current reference counter history:

emctl dumpstate agent refcnt

Tracking CPU Thread Times

Monitoring and managing a target involves collecting metric data and executing jobs that perform operations on the target. The CPU usage can be high under the following conditions:

A large number of targets are being monitored
A CPU intensive metric is being collected too often
A job or a remote operation is not working correctly

By tracking the CPU usage per thread, you can identify the threads with high CPU usage and also identify the activities that are contributing to the high usage.

Any user operation consists of one or more activities (metric collection, jobs) whose CPU usage is tracked by the Agent. The Agent tracks CPU usage per thread at specified intervals and a dump file containing this information can be generated. The threads tracked include both Persistent and Transient threads.

Persistent threads are available throughout the lifetime of the Agent. Scheduling, HTTP Listener activity, are some of the activities related to persistent threads. Transient threads are initiated to perform a certain activity and are terminated when the activity has been completed. Metric Collection, user initiated operations like reload etc., are some of the activities related to transient threads.

You can configure the tracking and history intervals by adding the following properties to the emd.properties file.

CPUHistoryTrackInterval: The period for which the CPU usage history should be maintained. The default minimum period is 24 hours.
DisableCPUUsageTracking: Specifies whether Agent tracks CPU usage of activities. This flag is set to False by default. If this feature is enabled, the CPU usage history is stored in the $AGENTSTATE/sysman/emd/cputrack/ emagent_<pid>_<timestamp>_cpudiag.trc file. The dump file contains the following details:
- The time interval for this report.
- The CPU time the Agent process consumed during this interval.
- The distribution of this Agent Process CPU time among agent threads.
- Top targets with high CPU usage and the top 10 metric collections for each of these targets.
- Top metric collections that have the highest CPU usage across all targets and a breakup by target for each of these metrics.
- Top 10 target wide activities.
Note:
You can also use the following emctl command to generate a dump file:
emctl status agent cpu

Example A-3 Sample cpudiag.trc File

Summary
Interval=2009-01-14 03:07:49 - 2009-01-14 04:08:11
Process=59.872 seconds
HttpListener=0.41%
Scheduler=1.22%
HealthMonitor=0.14%
CollectionThreads=66.35%
DispatcherThreads=3.54%
JobThreads=0.00%
RecvletThreads=1.09%

                              <------Current------>  <-------Lifetime------->
Identifier    CTime  ETime   Num   CTime  ETime    Num  LastStartTime  LastEndTime        

Top 10 Persistent Threads
-Scheduler       0.733 3583.050  59  8.842  43773.370  721   2009-01-14  04:07:23
-HttpListener    0.243 3567.130  57  2.818  43787.850  700   2009-01-14  04:07:15
-HealthMonitor   0.082 3624.190  59  1.005  43837.380  707   2009-01-14  04:08:04

Top 10 Targets
oracle_database:database3 4.853    68.250
-health_check   0.614  2.030    241  7.263     66.890  2898  2009-01-14  04:08:03 2009-01-14 04:08:03
-sga_start      0.488  7.050    13   5.339     77.410  145   2009-01-14  04:08:02 2009-01-14 04:08:03
-Response       0.455  8.380    12   5.375    108.580  145   2009-01-14  04:03:33 2009-01-14  04:03:34
-propagation_msgstate_stats 0.426 8.460 12  5.011  102.230 145 2009-01-14 04:06:02  2009-01-14  04:06:03
-apply_queue_persq          0.422 8.440 12  4.811  102.980 145 2009-01-14 04:06:03 2009-01-14  04:06:03
-textIndexStats             0.403 1.820 61  4.349   22.850 725 2009-01-14 04:08:02 2009-01-14  04:08:02
-dumpFull                   0.193 8.710  5  1.819   87.720  49 2009-01-14 04:07:55 2009-01-14  04:07:57
-archFull                   0.172 6.480  5  1.704   63.460  49 2009-01-14 04:07:57 2009-01-14  04:07:58
-alertLog                   0.153 1.390  4  1.840   17.340  49 2009-01-14 04:05:28 2009-01-14  04:05:28
-wait_sess_cls              0.151 0.260  4  1.848    3.100  48 2009-01-14 03:59:19 2009-01-14  03:59:19

Top 10 Metrics
oracle_database:health_check  2.474  8.650
-oracle_database:database2       0.633  2.180  241  7.137  43.570  2900 2009-01-14 04:07:56  2009-01-14  04:07:56
-oracle_database:database        0.621  2.360  242  7.276  56.560  2902 2009-01-14 04:08:08  2009-01-14  04:08:08
-oracle_database:database3       0.614  2.030  241  7.263  66.890  2898 2009-01-14 04:08:03  2009-01-14  04:08:03
-oracle_database:Oemrep_Database 0.607  2.080  241  7.193  49.010  2921 2009-01-14 04:07:57  2009-01-14  04:07:57

Top 10 Target wide activities
-Upload                    2.102 4.020 292 33.501 208.740 3629 2009-01-14 04:08:01 2009-01-14  04:08:01
-AQRecvlet                 0.655 14526.670 242  12.330 174670.980 2916 2009-01-13 16:02:04
-GetCompositeMembersReq    0.016  0.010 12  0.193  0.190  147 2009-01-14 04:05:06 2009-01-14 04:05:06
-RefreshReq                0.000  0.000  0  1.809 59.280   8   
-RemoveTargetReq           0.000  0.000  0  0.694  6.030   12

Platforms Supported

This feature is available on platforms that support thread CPU usage. This includes AIX and LINUX (with new kernels) and Solaris 10. For platforms that do not have the support for thread CPU usage, only elapsed times are reported.

Management Agent Crashes When Target Type is "WMQ" in targets.xml File

If the Management Agent crashes when the target type is WMQ in the targets.xml file, then follow the workaround described in My Oracle Support note 419933.1.

Network Issues

This section lists network issues you may encounter during Enterprise Manager installation and configuration.

Incorrect Format For Entries In /etc/hosts File

This will cause the installation to hang and OUI-25031 or OUI-10104 errors in log files.

Entries in the /etc/hosts file should be in the following format:

IP_Address Canonical_Hostname Aliases

For example:

11.22.33.441 abc.xyz.com abc1 xyz2

When creating the /etc/hosts file, follow these rules:

Host name may contain only alphanumeric characters, hyphen, and period. The name must begin with an alphabetic character and end with an alphanumeric character.
Lines cannot start with a blank or tab character.
Fields can have any number of blanks or tab characters separating them.
Comments are allowed and designated by a pound sign (#) preceding the comment text.
Trailing blank and tab characters are allowed.
Blank line entries are allowed.
Only one host entry per line is allowed.

Forward lookup is finding IP address given the hostname. Reverse lookup is finding hostname given the IP address. Results of forward and reverse lookups should be the same. It is usually different because of case difference (upper/lower) in hostnames and aliases.

If DNS Server is configured in your environment, then ensure that the OMS host name can be resolved through DNS. For more information, contact your network administrator.

For 10.2.0.1 Enterprise Manager installations, if a host name contains an upper case letter, securing of Agent will fail.

Enterprise Manager Installation on Computers With Multiple Addresses

While installing Enterprise Manager or related components on Multi-homed (Multi-IP) machines, that is, a machine having multiple IP addresses, the host name is derived from the ORACLE_HOSTNAME variable that is passed along with -local while invoking the runInstaller.

For example, runInstaller ORACLE_HOSTNAME=foo.us.oracle.com -local

Agent Configuration Fails on A Non-Network Computer

To resolve this error, Oracle Management Service and target host where the Agent needs to be installed should be pingable.

Loopback Adapter On Windows and Related Known Issues

If installing Enterprise Manager or related components on a DHCP host, one needs to install a loopback adapter to assign a local IP address to that computer.

Note:

Refer to section 2.4.5 Installing a Loopback Adapter of the Oracle® Database Installation Guide 10g Release 2 (10.2) for Microsoft Windows (32-Bit) Part Number B14316-02 for more information.

Ensure that the following conditions are met:

The /etc/hosts file should contain the following entry:

<lopback IP Address><hostname.domainname> <hostname>

For example:

127.0.0.1 localhost.localdomain localhost

Ensure that the IP address specified in /etc/hosts is correct otherwise allocation of ports will fail

Installation Fails with Host Name Error on the Product-Specific Prerequisite Check Page

To resolve this issue, first run the following command on the host to list all the IP addresses configured and then note the IP address for that host:

ipconfig /all

After noting the IP address of that host, open the host file and check the entries for this host name. Ensure that the host name maps to the correct IP address of that host.

Other Installation and Configuration Issues

This section lists some of the generic errors that you may encounter during Enterprise Manager installation and configuration.

Storage Data Has Metric Collection Errors

The following Enterprise Manager collection error message may appear from agents installed through silent or agentdownload install mechanisms:

snmhsutl.c:executable nmhs should have root suid enabled.

Perform the required root install actions (using root.sh script on UNIX platforms only) to resolve this issue. It may take up to 24 hours before the resolution is reflected.

Cannot Add Systems to Grid Environment from the Grid Control Console

You cannot add new targets to your grid environment if you do not have an agent already installed.

To install the agent from your Grid Control console:

Log in to the Grid Control console and go to the Deployments page.
Click Install Agent under the Agent Installation section.
In the Agent Deploy home page that appears, select the appropriate installation option that you want to perform. See Chapter 10, " Deploying Management Agent" for more information.

Error During Deinstallation of Grid Control Targets

After deinstalling certain Grid Control targets, when you try to remove the same targets from the Grid Control console, you may encounter an exception with a message similar to the following:

java.sql.SQLException: ORA-20242: Target <target name> is monitoring other targets. It cannot be deleted.

To resolve this issue, deinstall the Grid Contol targets and wait for at least 15 minutes before you attempt to remove the targets from the Grid Control console using the Hosts page. This time is required for the deinstallation information to propagate to the Management Repository.

Discovery Issues

This section lists the discovery-related issues:

Unable to Discover Targets Deployed on Hosts

For discovering any target on a host, you must have a Management Agent running on that host. However, despite having a Management Agent, you may not be able to discover the targets in Enterprise Manager Grid Control.

The possible cause may be that the entries in the listener.ora file for that host are incorrect. To resolve this issue, ensure that the entries in the listener.ora file have canonic names.

For example, the host name may be sjohn-sun1. However, it's fully qualified name may be sjohn-sun1.server.com. For the discovery to happen successfully, you must ensure that the listener.ora file for that host contains the fully qualified name entry, that is, sjohn-sun1.server.com, and not sjohn-sun1.

Unable to View Metric Details of Targets

After you discover a target and add it to Enterprise Manager Grid Control, you may be able to view the status of that target, but you may not be able to view metric details. This is because the target is not fully configured and therefore, the Agent monitoring that target is unable to compute its dynamic properties or evaluate its metrics.

To circumvent this issue, after you discover the target, provide the password in the Monitoring Configuration page of the target. To access the Monitoring Configuration page, go to the Home page of the target and from the Related Links section, click Monitoring Configuration.

Need More Help

If this appendix does not solve the problem you encountered, try these other sources:

Oracle Enterprise Manager Release Notes, available on the Oracle Technology Network Web site (http://www.oracle.com/technology/documentation).
Oracle

(http://metalink.oracle.com/)

If you do not find a solution for your problem, log a service request.