Incident management in Oracle Enterprise Manager Ops Center consists of several components that are designed to work together to simplify managing incidents for a large number of assets. The components include monitoring rules, suggested actions, and methods for automating incident identification and resolution.
Monitoring includes a standard set of monitoring rules, consisting of an asset's attribute and the threshold value for that attribute. When Oracle Enterprise Manager Ops Center performs monitoring, it generates alerts, which connect to both the incident management and notification features.
When an asset is not operating within the parameters defined in the monitoring rules and policies, Oracle Enterprise Manager Ops Center generates an incident and displays the information in the Unassigned Incidents section. Incidents appear as Informational (Info), Warning, or Critical severity. All incidents appear in the Message Center. When an incident is first detected, it appears in the Unassigned category. When you assign an incident to yourself, it moves from the Unassigned Incidents to My Incidents. When an incident is assigned to someone else, it appears in Assigned Incidents.
For example, the CPU usage on a Sun Fire x4150 host is exceeded and an incident is generated. You assign the incident to Bob. Bob is concerned because these systems are often used to host Oracle Solaris Zones.
Bob reviews the incident and adds the following comment to the incident: This asset is not powerful enough and cannot cope with the load. Bob also wants to associate an annotation with the Global Zone asset type. He wants to add a recommended action annotation to the asset type to check for processes that are consuming excessive CPU usage on the Global Zone. He adds the following annotation to the asset type: Run the prstat 1 1 command to check which processes are taking CPU.
The annotation is saved in the Incidents Knowledge Base and displays the next time CPU usage is exceeded on a global zone asset type.
Oracle Enterprise Manager Ops Center uses a help desk approach to manage incidents.
The following are the key tools available for taking action on an incident:
Message Center: View the status of incidents and assign incidents.
Annotations: Add notes and change status. Use annotation options to provide recommended actions or fixes, or add custom scripts to provide an automated response to an incident.
Operational Plans: Deploy a shell script against a specific asset, or asset sub-type to automate incident resolution.
Incidents Knowledge Base: Collect comments and suggested actions for known issues for future use.
When you want to receive e-mail or pager notification each time an incident is reported in the Message Center, create notification rules to send a message advising you of a new critical or warning incident.
The Message Center contains a detailed list of unassigned incidents, incidents assigned to you, and incidents assigned to other users.
You can manage incidents from either the Message Center or from the Asset view. You can view and add comments and annotations, take action on an incident, and close incidents.
The Message Center provides a list of all incidents. Select an incident to see its details and activity.
From the Asset tree in the Navigation pane, select the asset and then click the Incidents tab to see a list of incidents for that asset.
When you have the Manage or Admin role for the asset, you can take action on the incident. The person assigned to the incident must also have the Manage or Admin role. When the icon is not active, you do not have the appropriate role.
You can view unresolved incidents for a specific asset or by incident.
To view unresolved incidents for a specific asset, click the asset in the Navigation pane, then click the Incidents tab in the center pane.
To view unresolved incidents from the Message Center, click one of the following:
Unassigned Incidents
My Incidents
Incidents Assigned to Others
The number of unresolved incidents for an asset appears in a bar chart and in a summary by severity. All Unresolved incidents appear in a table.
View high-level details by hovering your mouse over the incident or clicking the incident in the Unresolved Incidents table. You can drill down to view the alerts that make up the incident by clicking the incident, then clicking the Alerts icon in the center pane.
Procedure to view incident details.
You can assign an incident to a user who has Manage or Admin role for the asset.
Assigning an incident might affect the asset's Incident severity badge. When an incident was previously acknowledged or marked as being repaired, its severity was not propagated up to antecedent assets in the navigation pane. After assigning an incident (to a user or to no one), the severity is propagated up again to antecedent assets in the navigation pane.
Acknowledging an incident indicates that you are investigating the issue. You can acknowledge an incident when you have the Admin or Manage role for the asset on which the incident is identified.
Acknowledging an incident might affect the asset's Incident severity badge. When the incident was in an Unassigned state or was assigned to someone else, the severity was taken into account in the computation of the highest severity to propagate up to antecedent assets in the navigation pane. When you acknowledge an incident, it is moved into your queue in the Message Center and the severity is no longer propagated up to antecedent assets in the navigation pane.
When you acknowledge the Critical incident, the badge is replaced with the Warning badge because that is now the highest level unacknowledged incident.
Figure 3-11 Effect of Acknowledging a Critical Incident
Open the incident from the Message Center or Assets view.
Message Center View: Click Message Center, then click an Incident category: Unassigned Incidents, My Incidents, or Incidents Assigned to Others
Assets View: Click the asset in the Assets section of the Navigation pane, then click the Incidents tab in the center pane.
Select one or more incidents, then click the Acknowledge Incidents icon in the center pane.
Annotations are defined by the asset type. Annotations are comments, a suggested action, or a reference to an operational profile.
Any user can add an annotation to a Incident. To add an entry to the Incidents Knowledge Base requires Oracle Enterprise Manager Ops Center Admin permissions.
You can display annotations for an asset type in the Incidents Knowledge Base.
Procedure to view comments.
A comment is a type of annotation. To add a comment, see Adding an Annotation.
When you have the Manage or Administration role for an asset that has an open incident, you can correct some incidents by using an automated annotation, if one has been associated with the same issue. For other incidents, review the issue before deciding on the appropriate action.
The software cannot determine when an incident is repaired. However, you can open a known incident and add a note with the repair details and mark the incident as repaired. You must have the Manage or Admin role for the asset to perform this task.
After marking this incident as repaired, its severity badge does not appear in the assets list in the navigation pane.
The incident stays open until you close it, even if the alerting condition is cleared. To remove an incident from the Message Center and the asset view, you must close the incident or take no action on it for seven (7) days.
When any action is taken on an incident, such as adding an annotation, the counter is reset.
Note:
Incidents with no activity for seven (7) days are closed automatically by Oracle Enterprise Manager Ops Center, and do not appear in the UI. You can edit this value in the public API.
When an incident is closed, its status changes to Closed, the incident is deleted from the list of active incidents, and the incident is no longer displayed in the UI. You can retrieve information about a closed incident for 60 days by using the public API. After 60 days, closed incidents are permanently deleted. To edit the time limit, you must edit the value in the public API. You can disable the time limit by setting the value for the number of days to 0.
Note:
When the monitoring condition is still true after the incident is closed, a new alert is raised and a new incident is created.