10 Troubleshoot Oracle Management Cloud Agents

This topic covers the typical issues and their resolutions related to installing and working with Oracle Management Cloud agents.

The topic covers the following:

Oracle Management Cloud Agent Connectivity Issues

If you’re encountering any connectivity issues between the Oracle Management Cloud agents and Oracle Management Cloud, then you can run the following command to check the connectivity issues after installation:

<AGENT_BASE_DIRECTORY>/agent_inst/bin/omcli status agent connectivity

The command displays the list of connectivity issues, if any, at the agent.

Running connectivity command without –verbose flag

If you run this command without the –verbose flag, then the first line of output identifies the type of agent and whether or not it’s running. It also reports if there is any connectivity issues. If that's the case, the subsequent lines list all known communication issues identified for this type of agent given its current availability status. The issues output is organized in three columns: Symptom, Cause and Observed.

Symptom Cause Observed

This is the symptom of the issue. For example, one symptom is "The agent is unable to start."

This is the most likely cause of the issue. For example, one cause is "The agent is not registered."

This is the earliest time the agent made the observation.

Example 1: Cloud Agent communicates to Oracle Management Cloud successfully. Cloud Agent has no connectivity issues.

$ ./omcli status agent connectivity 
Oracle Management Cloud Agent
Copyright (c) 1996, 2019 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Cloud Agent is Running

No identifiable connectivity issues found between Cloud Agent and Management Cloud at 
https://d123.us2.oraclecloud.com/. 
---[OMC Ping Hop]--------------------[Time]-[Details]---------------------------
OMC d123.us2.oraclecloud.com   71ms   HTTP 200 OK 
--------------------------------------------------------------------------------

Example 2: Gateway communicates to Oracle Management Cloud successfully. Gateway has no connectivity issues.


$ ./omcli status agent connectivity
Oracle Management Cloud Gateway
Copyright (c) 1996, 2019 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Gateway is Running 

No identifiable connectivity issues found between Gateway and Management Cloud at 
https://d123.us2.oraclecloud.com/. 
---[OMC Ping Hop]--------------------[Time]-[Details]---------------------------    
OMC d123.us2.oraclecloud.com   66ms   HTTP 200 OK
--------------------------------------------------------------------------------

Example 3: Cloud Agent communicates to Oracle Management Cloud through a Gateway. Cloud Agent has no connectivity issues.


$ ./omcli status agent connectivity
Oracle Management Cloud Agent
Copyright (c) 1996, 2019 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Cloud Agent is Running 

No identifiable connectivity issues found between Cloud Agent and Gateway.

---[OMC Ping Hop]------------------------[Time]-[Details]-----------------------    
Gateway emcc.example.com:4459   13ms   HTTP 200 OK        
OMC d123.us2.oraclecloud.com   90ms   HTTP 200 OK
--------------------------------------------------------------------------------
Check the connectivity status of the Gateway: emcc.example.com 

Example 4: Cloud Agent communicates to Oracle Management Cloud through a Gateway. Cloud Agent has connectivity issues: It is up and running, but it can't connect to the Gateway.


$ ./omcli status agent connectivity
Oracle Management Cloud Agent
Copyright (c) 1996, 2019 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Cloud Agent is Running

Connectivity Issues:
Symptom                     Cause                             Observed          
--------------------------- --------------------------------- -------------------
Agent unable to communicate Agent unable to connect to server 2019-01-29 04:33:25 
---[OMC Ping Hop]---------------------[Time]-[Details]------------------------    
Gateway emcc.example.com:4459         1ms    java.net.ConnectException:Connection refused 
(Connection refused)
--------------------------------------------------------------------------------
The Gateway is unavailable: emcc.example.com

Example 5: Cloud Agent communicates to Oracle Management Cloud through a Proxy Server. Cloud Agent has connectivity issues: It is up and running, but it can't connect to the Proxy Server.


$ ./omcli status agent connectivity
Oracle Management Cloud Agent
Copyright (c) 1996, 2018 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Cloud Agent is Running 

Connectivity Issues:
Symptom                     Cause                          Observed
--------------------------- ------------------------------ -------------------
Agent unable to communicate The proxy host is not reachable 2019-01-29 16:09:39 
---[OMC Ping Hop]-------------------------------[Time]-[Details]-----   
OMC 6e8c5404a19b42cd82d.us2.oraclecloud.com:-1  1ms java.net.UnknownHostException:
NOproxy.example.com
--------------------------------------------------------------------------------

Running connectivity command with –verbose flag

If you want to see more information, then you can execute a more verbose flavor by adding the -verbose flag to the command. This adds the following columns:

Category Confidence Detail

The type of issue.

An internal measure of confidence. This can be thought of as a percentage, where 100.0 is certainty and 0.0 is no idea. The confidence measure is based on empirical data and code knowledge.

A message that can give more insight into the occurrence. This may be an exception message or something else.

Understanding the Symptom and the Cause

Cloud Agent Issues

Cause Symptom

AGENT_UNABLE_TO_START due to AGENT_NOT_REGISTERED

Registration Key and Tenant ID are provided to the agent installer during installation. Oracle Management Cloud identifies an agent using an entity ID/ MEID; this is based on the host and port on which the agent is installed.

The agent can’t start without being registered with Oracle Management Cloud. The entity ID given during registration is what identifies this agent to the cloud. That entity ID is missing.

The Agent Config phase runs the agent registration. If this fails (which can happen, for example, if there are required proxy settings for the agent that aren’t supplied during the installation), then the agent won’t be registered and won’t be able to start. Review the installation logs and check the proxy settings.

AGENT_UNABLE_TO_COMMUNICATE due to SERVER_UNAVAILABLE

In this case, the agent can’t connect to the server. If the cloud agent is configured for a gateway and it's unable to connect to that gateway, then it could be a network problem, or the gateway isn't running. You can run the connectivity command at the gateway to find out more.

AGENT_UNABLE_TO_COMMUNICATE due to AGENT_CERTIFICATE_MISMATCH

The agent can’t communicate with its gateways or Oracle Management Cloud because of certificate mismatch failures.

There might be some third-party certificate at the proxy that got changed, there could have been a failure to download the certificates, or the certificates changed at Oracle Management Cloud.

Gateway Issues

Cause Symptom

AGENT_UNABLE_TO_START due to AGENT_NOT_REGISTERED

Registration Key and Tenant ID are provided to the agent installer during installation. Oracle Management Cloud identifies an agent using an entity ID/ MEID; this is based on the host and port on which the agent is installed.

The agent can’t start without being registered with Oracle Management Cloud. The entity ID given during registration is what identifies this agent to the cloud. That entity ID is missing.

The Agent Config phase runs the agent registration. If this fails (which can happen, for example, if there are required proxy settings for the agent that aren’t supplied during the installation), then the agent won’t be registered and won’t be able to start. Review the installation logs and check the proxy settings.

AGENT_UNABLE_TO_COMMUNICATE due to AGENT_CERTIFICATE_MISMATCH

The certificate the gateway uses to trust Oracle Management Cloud is no longer correct.

  • AGENT_UNABLE_TO_UPLOAD due to DATARECEIVER_SERVICE_MISSING

  • AGENT_UNABLE_TO_DISPATCH due to DATARECEIVER_SERVICE_MISSING

  • AGENT_UNABLE_TO_DISPATCH due to WORKDEPOT_SERVICE_MISSING

There is an issue with Oracle Management Cloud. File a service request.

Cloud Agent Installation Fails Due to Insufficient ulimit

Sometimes, cloud agent installation fails with an OMCAGNT - 2101 error.

This error is caused by an insufficient ulimit value. A cloud agent installation requires a minimum of 4000 as the ulimit value. However, the recommended value for ulimit should be set to 100000 for uninterrupted service of the agent.

To set the ulimit value, run the following command:

ulimit —u 100000

Data Collector Installation Fails Due to Inaccessible Stage Directory

When you install a data collector, if the installation fails with the error “DataCollector Validation failed with status [1]”. It’s difficult to understand the real cause of this failure. In this case, you can set the following parameter in the response file:

IGNORE_DATA_COLLECTOR_VALIDATIONS=true

This helps you to identify the actual, underlying issues that are causing the installation to fail.

Data Collector Stopped Working After Upgrading to Enterprise Manager 13.x

If you deployed the data collector on Enterprise Manager 12c and then upgraded to Enterprise Manager 13.1 or a later release, then the data collector must be updated.

To update the data collector, follow these steps:
  1. Click this link to download the patch_harvester_after_em_upgrade.sql script, or go to $ORACLE_HOME/sysman/admin/scripts/emaas/harvester/patch_harvester_after_em_upgrade.sql

  2. Connect to the Oracle Management Repository database as a SYS user.

  3. Run the downloaded script using SYS credentials, and provide the data collector schema name as input.

    You can find the data collector schema name by using the following statement:

    SELECT owner
    FROM all_objects
    WHERE object_name = 'EMAAS_PERF_LOG'
    and object_type = 'TABLE';

    Note:

    If multiple rows are displayed in the output, then contact Oracle Support to find out the correct active schema.
  4. After the script is executed, verify that the data collector schema was edition enabled, as follows:
    SELECT EDITION_ENABLED
    FROM DBA_USERS
    WHERE username = upper('&DATA_COLLECTOR_SCHEMA_NAME')

    Y indicates that the edition was enabled. If the edition wasn’t enabled, then try running the script again. If you still see an issue, then contact Oracle Support.

  5. Verify that all the objects are valid in the data collector schema, as follows:

    SELECT object_name from all_objects where owner = upper(‘&DATA_COLLECTOR_SCHEMA_NAME’) and status = ‘INVALID’;

    If any invalid objects are found, try recompiling it again. If you still see an issue, then contact Oracle Support.

Host Name Issues

You may see the following error while deploying Oracle Management Cloud agents:

"Error: Unable to resolve the ORACLE_HOSTNAME/Computed hostname : <hostname>.”

When you install the agent, if the system host name doesn’t resolve to a fully qualified domain name (FQDN), because you aren’t using a DNS, then add the fully qualified domain name in the etc/hosts file, and ensure that it maps to the correct host name and IP address of the host. Ensure that the local host is reachable and resolves to 127.0.0.1. The recommended format is as follows:

<ip> <fully_qualified_host_name> <short_host_name>

For example:

If your host name is myhost and your domain is example.com (IPv4):

172.16.0.0 myhost.example.com myhost

If your host name is myhost and your domain is example.com (IPv6):

aaaa::111:2222:3333:4444 myhost.example.com myhost

You can run the following commands to verify this. You should see the same host name and IP address displayed.

 $getent hosts `hostname`
 $host `hostname -f`

In the output, the fully qualified domain name must appear in the second field as specified in the /etc/hosts file.

If you can ensure that short host names don’t have the same value on different hosts in your environment, then you can ignore the FQDN requirement by passing the following argument to the AgentInstall script: IGNORE_VALIDATIONS=true along with ORACLE_HOSTNAME=<host_name>

 $./AgentInstall.sh IGNORE_VALIDATIONS=true ORACLE_HOSTNAME=<host_name>

Frequent Password Changes Affect Data Collection

If the Cloud Control Management Repository passwords are changed frequently, then this may affect the data collector collection. Due to the security policies at your data center, if the operating system password for the user that was used to install the Cloud Control Management Repository is changed, then the data collector stops collecting performance and event data.

To update the data collector with the new password, follow these steps:

  1. Stop the data collector agent.

    <AGENT_BASE_DIRECTORY>/agent_inst/bin/omcli stop agent

  2. Run the following command on the host that’s running the data collector:

    <AGENT_BASE_DIRECTORY>/agent_inst/bin/omcli change_datacollector_host_pwd agent <new password>

  3. Restart the data collector agent.

    <AGENT_BASE_DIRECTORY>/agent_inst/bin/omcli start agent

Debugging a Cloud Agent Installed Through a Gateway

When you install the cloud agent through a gateway, ensure that the gateway host name you provide is the same as the host name you specified when the gateway was installed. Any mismatch will result in a failure, prompting the user with connectivity error.

For example: If you specified the host name as ORACLE_HOSTANME=abc.xyz.com when you deployed the gateway, but when you install the cloud agent, you specify GATEWAY_HOST=abc, then this will cause the cloud agent registration to fail.

Error Example:

2017-02-10 11:29:39,308 [1:EE2D8594] DEBUG - Establishing connection to agent at https://abc.xyz.com:1846/emd/lifecycle/main/... 2017-02-10 11:29:39,321 [1:EE2D8594] DEBUG - setting user-interaction allowed to false 2017-02-10 11:29:39,329 [1:EE2D8594] INFO - Unable to connect to the agent at https://abc.xyz.com:1846/emd/lifecycle/main/ [Connection refused] 2017-02-10 11:29:39,694 [1:EE2D8594] DEBUG - Establishing connection to agent at https://abc.xyz.com:1846/emd/lifecycle/main/... 2017-02-10 11:29:39,694 [1:EE2D8594] DEBUG - setting user-interaction allowed to false 2017-02-10 11:29:39,696 [1:EE2D8594] INFO - Disconnecting: client terminus 2017-02-10 11:29:39,696 [1:EE2D8594] INFO - stderr: Status agent Failure:Unable to connect to the agent at https://abc.xyz.com :1846/emd/lifecycle/main/ [Connection refused] 2017-02-10 11:29:39,696 [1:EE2D8594] INFO - Exit Code: 1

Remove an Incompletely Installed Data Collector

You may need to remove the data collector if the installation is incomplete, or you may need to delete the data collector schema if the data collector home was manually deleted or if the host was decommissioned. In these cases, you must remove the data collector and clean up the left over schema in the Enterprise Manager repository. Follow these steps to remove the data collector and the schema:

  1. On the Oracle Management Cloud home page, click the OMC Navigation icon on the top-left corner to view the Management Cloud navigation pane, if it isn’t already displayed.

  2. Select Agents under Administration.

  3. Click the Data Collectors menu option to go to the Data Collectors page. If the data collector to be removed still appears on this page, then select the data collector, right click the Actions menu, and then click Remove to remove the data collector.

    Note:

    It’s recommended that you save a copy of the data collector schema before you delete the data collector.
  4. Log in as SYS user to the Oracle Management Repository host, and then run the script to drop collector schema.

    @<script_path>

    You are prompted for the data collector schema name. To find the data collector schema name, enter the following command:

        SELECT owner
              FROM  all_tables
              WHERE table_name='EMAAS_PERF_HV_PROPS'

    If multiple rows are displayed, then this indicates that there is more than one data_collector_schema. You can either drop all the schemas, one after the other, or contact Oracle Support.

    This script checks whether the schema being dropped is a data collector schema. If this validation fails, then the schema won’t be dropped.

    In this case, using SYS user credentials, pass the data_collector_schema_name as input, and run the following command to drop the data collector schema: @DROP user <data_collector_schema_name> CASCADE;

    Don’t use this script if your data collector home is intact, and it can communicate with Oracle Management Cloud. In this case, follow the steps in Uninstalling Oracle Management Cloud Agents to remove the data collector.

Agent or Gateway Installation Fails Due to Connectivity Issues from OCI-C to a tenant in OCI

If the installation of Oracle Management Cloud Agent or Gateway is failing, you may have connectivity issues to the Oracle Management Cloud backend services.

You may see the following error message:

[OMCAGNT-3018]: Can not connect to Oracle Identity Cloud Service. Please ensure that the URL [ https://idcs-xxxxx.oraclecloud.com:443/.well-known/idcs-configuration ] is accessible from the installation host and retry.

Root Cause:

The actual root cause is a wrong setting of the MTU size on the network interface which reaches out to the internet. By default, all VMs in OCI have a MTU size of 9000 bytes where a value of 1500 is required for communicating with OMC and IDCS endpoints. The following OCI document describes the general issue in full detail: https://docs.cloud.oracle.com/iaas/Content/Network/Troubleshoot/connectionhang.htm.

Solution:

Customers have to set the MTU size of the outgoing network interface on the gateway (or on the proxy if used) to 1500.

If Customers want to stay with the default value, they can create static host routes to use the smaller MTU value only for certain IPs. For example:

[root@proxy] # ip route add 10.10.10.10/32 via 10.0.0.1 mtu 1500

Due to load balancers in front of the OMC or IDCS endpoint, you have to perform this for multiple IP addresses usually. Use getent hosts <OMC-Endpoint | IDCS-Endpoint> to get all required IP-Addresses.

Update Authenticated Proxy Server Parameters

If you are using an authenticated proxy server, you can edit the emd.properties file to update the following proxy parameters: OMC_PROXYUSER, OMC_PROXYPWD and OMC_PROXYREALM.

You can specify the proxy parameters values by using the prefix CLEAR: before the parameter value. CLEAR: indicates that the value provided is in clear text. For example:

OMC_PROXYUSER=CLEAR:johndoe

OMC_PROXYPWD=CLEAR:password

OMC_PROXYREALM=CLEAR:McAfee Web Gateway

The value of the OMC_PROXYREALM parameter is specific to the authenticated proxy server in use. In the above example, the realm value for McAfee Web Gateway is McAfee Web Gateway. Contact your proxy vendor for instructions about how to get the realm value from the proxy settings.

After updating the emd.properties file, stop and restart the agent.

The above proxy parameters will be encrypted by the agent and stored in the emd.properties file.

Do not use omcli setproperty command to set any of the above proxy parameters.