Oracle® Enterprise Manager Cloud Control Administrator's Guide 12c Release 1 (12.1.0.1) Part Number E24473-01 |
|
|
PDF · Mobi · ePub |
This chapter introduces diagnostic capabilities in Enterprise Manager that extend to Oracle Management Service (OMS) and Management Agents.
Enterprise Manager includes a fault diagnostics framework for collecting and managing diagnostic data. Diagnostic data includes trace files, dumps, and core files as well as other information that enables customers and Oracle Support to identify, investigate, track, and resolve problems quickly and effectively.
The diagnostics framework offers the following benefits:
Automatic capture of diagnostic data upon first failure contributes to quick problem resolution thereby reducing downtime
Integration with Incident Manager and Support Workbench, and access to My Oracle Support (MOS) from Enterprise Manager ensure simplified customer interaction with Oracle Support
Ability to initiate proactive health checks and availability of the Enterprise Manager Diagnostics Kit enhance problem prevention and resolution
The general fault diagnostic workflow is as follows:
A critical error occurs in OMS or an Agent
The diagnostic framework automatically creates an incident and organizes the diagnostic information in the Automatic Diagnostic Repository (ADR)
Administrators use the Incident Manager to manage the complete life cycle of the incident
Administrators use the Support Workbench to view and process the contents of the ADR
Administrators perform health checks and run the Enterprise Manager Diagnostics Kit for finer grained analysis
Administrators use the Incident Packaging System (IPS) to package incident data and diagnostic results in a zip file for upload to My Oracle Support (MOS)
Administrators receive confirmation that an SR was created with which to track problem resolution
For critical errors, the ability to capture error information at first-failure greatly increases the chance of a quick problem resolution and reduced downtime. An always-on, memory-based tracing system proactively collects diagnostic data from many Enterprise Manager components, and can help isolate root causes of problems. The system of data collection is similar to that of airplane "black box" flight recorders. When a problem is detected, alerts are generated and the fault diagnosability infrastructure is activated to capture and store diagnostic data.
The fault diagnosability infrastructure aids in preventing, detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors such as those caused by code bugs, metadata corruption, and customer data corruption.
When a critical error occurs, it is assigned an incident number, and diagnostic data for the error (such as trace files) are immediately captured and tagged with this number. The data is then stored in the Automatic Diagnostic Repository (ADR), where it can later be retrieved by incident number and analyzed.
The ADR is a file-based hierarchical data store for depositing diagnostic information produced by diagnostic framework clients. The repository contains data describing incidents, traces, dumps, alert messages, data repair records, health check records, SQL Trace information, core dumps, and other information essential for problem diagnosis.
You can view and process ADR contents through the Support Workbench. There also is a command line interface, the ADR Command Interpreter, with which you can manipulate the contents.
The default ADR home for OMS is:
<MiddlewareHome>/gc_inst/user_projects/domains/<DOMAIN_ NAME>/servers/<SERVER_NAME>/adr
The default ADR home for Agents is:
<Middleware Home>/agent/agent_inst
The Incident Manager provides a central point of control for managing events, incidents and problems detected within Enterprise Manager.
The Incident Manager gives you in-context access to diagnostic and resolution capabilities. You also have in-context access to My Oracle Support, where you can research knowledge base articles and create service requests.
The Guided Resolution region offers recommendations and provides links to diagnostics and resolutions.
The Enterprise Manager Support Workbench (Support Workbench) is a facility that enables you to investigate, report, and in some cases, repair problems (critical errors), all with an easy-to-use graphical interface. The Support Workbench provides a self-service means for you to gather first-failure diagnostic data, obtain a support request number, and upload diagnostic data to Oracle Support quickly and with a minimum of effort, thereby reducing time-to-resolution for problems.
The Support Workbench allows you to view and process the contents of ADRs. From the Home and Problem Details pages you can do the following:
View recent and historical problems
View and create diagnostic packages
Create user-reported problems
Review checker findings
Search MOS knowledge base
Perform Health Checks and Run Diagnostics Kit
Health checks test the viability of various system components. Health checks run automatically in response to an incident. You also can perform targeted checks proactively. The diagnostic framework includes a comprehensive set of 26 out-of-box health checks to test components such as Jobs, Credential, Event, Loader, Plugin, ASLM, and so forth. Health check results are stored in the ADR.
The Enterprise Manager Diagnostics Kit is a set of Oracle-supplied scripts specifically designed to identify inconsistencies in Enterprise Manager that are known to contribute to errors. In some cases, the script may be able to resolve the issue.
The scripts run repository diagnostics against system modules. You can run diagnostics against all or selected modules. The kit is accessible via a link in the Support Workbench. Diagnostic output is stored in the ADR with other dump files.
The IPS enables you to automatically and easily gather the diagnostic data (traces, dumps, health check reports, and so forth) pertaining to a critical error and package the data into a zip file for transmission to Oracle Support.
Because all diagnostic data and files related to a critical error are tagged with that error's incident number, you do not have to search through all the stored information to determine the files required for analysis. The IPS identifies the required files automatically and adds them to the zip file.
Before creating the zip file, the IPS first collects diagnostic data into an intermediate logical structure called an incident package (package) and stores it in the ADR, where you can view the package and modify its contents. For example, you may want to add additional diagnostic data or remove existing data before uploading the zip file to Oracle Support.
When Enterprise Manager encounters a critical error that prevents you from completing a task, Enterprise Manager logs an error and generates an incident for this critical error, which then generates an alert. Enterprise Manager stores incident details, including dump and trace files where applicable, in the Automatic Diagnostic Repository so that Support Workbench can access this information and display it.
After receiving an alert notifying you of a problem or incident, take the following action:
From the Enterprise menu, select Monitoring, then select Support Workbench.
In the list of targets that support ADR, locate the target about which you were notified and click the target link.
On the Support Workbench page, perform any of the following actions as appropriate:
View problem or incident details
View, create, or modify incident packages
View health checker findings
Close resolved problems
For details on performing these actions, see the Cloud Control online help.