Managing Incidents

Incident management in Oracle Enterprise Manager Ops Center consists of several components that are designed to work together to simplify managing incidents for a large number of assets. The components include monitoring rules, suggested actions, and methods for automating incident identification and resolution.

Monitoring includes a standard set of monitoring rules, consisting of an asset's attribute and the threshold value for that attribute. When Oracle Enterprise Manager Ops Center performs monitoring, it generates alerts, which connect to both the incident management and notification features.

When an asset is not operating within the parameters defined in the monitoring rules and policies, Oracle Enterprise Manager Ops Center generates an incident and displays the information in the Unassigned Incidents section. Incidents appear as Informational (Info), Warning, or Critical severity. All incidents appear in the Message Center. When an incident is first detected, it appears in the Unassigned category. When you assign an incident to yourself, it moves from the Unassigned Incidents to My Incidents. When an incident is assigned to someone else, it appears in Assigned Incidents.

For example, the CPU usage on a Sun Fire x4150 host is exceeded and an incident is generated. You assign the incident to Bob. Bob is concerned because these systems are often used to host Oracle Solaris Zones.

Bob reviews the incident and adds the following comment to the incident: This asset is not powerful enough and cannot cope with the load. Bob also wants to associate an annotation with the Global Zone asset type. He wants to add a recommended action annotation to the asset type to check for processes that are consuming excessive CPU usage on the Global Zone. He adds the following annotation to the asset type: Run the prstat 1 1 command to check which processes are taking CPU. The annotation is saved in the Incidents Knowledge Base and displays the next time CPU usage is exceeded on a global zone asset type.

Methods of Incident Management

Oracle Enterprise Manager Ops Center uses a help desk approach to manage incidents.

The following are the key tools available for taking action on an incident:

  • Message Center: View the status of incidents and assign incidents.

  • Annotations: Add notes and change status. Use annotation options to provide recommended actions or fixes, or add custom scripts to provide an automated response to an incident.

  • Operational Plans: Deploy a shell script against a specific asset, or asset sub-type to automate incident resolution.

  • Incidents Knowledge Base: Collect comments and suggested actions for known issues for future use.

When you want to receive e-mail or pager notification each time an incident is reported in the Message Center, create notification rules to send a message advising you of a new critical or warning incident.

The Message Center contains a detailed list of unassigned incidents, incidents assigned to you, and incidents assigned to other users.

You can manage incidents from either the Message Center or from the Asset view. You can view and add comments and annotations, take action on an incident, and close incidents.

  • The Message Center provides a list of all incidents. Select an incident to see its details and activity.

  • From the Asset tree in the Navigation pane, select the asset and then click the Incidents tab to see a list of incidents for that asset.

When you have the Manage or Admin role for the asset, you can take action on the incident. The person assigned to the incident must also have the Manage or Admin role. When the icon is not active, you do not have the appropriate role.

Viewing Unresolved Incidents

You can view unresolved incidents for a specific asset or by incident.

  • To view unresolved incidents for a specific asset, click the asset in the Navigation pane, then click the Incidents tab in the center pane.

  • To view unresolved incidents from the Message Center, click one of the following:

    • Unassigned Incidents

    • My Incidents

    • Incidents Assigned to Others

      The number of unresolved incidents for an asset appears in a bar chart and in a summary by severity. All Unresolved incidents appear in a table.

View high-level details by hovering your mouse over the incident or clicking the incident in the Unresolved Incidents table. You can drill down to view the alerts that make up the incident by clicking the incident, then clicking the Alerts icon in the center pane.

Viewing Incident Details

Procedure to view incident details.

  1. Select Assets in the Navigation pane.
  2. Select an asset that has an incident badge next to the icon. The Dashboard page displays with the status of the asset.
  3. Click the Incidents tab.
  4. Hover over the incident to display the incident details.
  5. To display the alerts that are associated with the incident, click the Alerts sub-tab or click the Alerts icon in the center pane. The alerts that make up the incident are displayed, including the current and highest alert status, and the alert history.

Assigning an Incident

You can assign an incident to a user who has Manage or Admin role for the asset.

Assigning an incident might affect the asset's Incident severity badge. When an incident was previously acknowledged or marked as being repaired, its severity was not propagated up to antecedent assets in the navigation pane. After assigning an incident (to a user or to no one), the severity is propagated up again to antecedent assets in the navigation pane.

  1. To display an incident from the Message Center, click Message Center, then click Unassigned Incidents in the navigation pane.

    To display an incident from the asset view, click the asset in the Navigation pane, then click the Incidents tab.

  2. Select one or more incidents in the center pane, then click the Assign Incidents icon.
  3. Select a user name from the Assign To list, which is the list of users who have either the Manage or Admin role for the asset. To relocate an assigned incident back to the Unassigned Incidents queue, select No One from the list.
  4. (Optional) Add a note in the text field.
  5. Click Assign Incidents.

Acknowledging Incidents

Acknowledging an incident indicates that you are investigating the issue. You can acknowledge an incident when you have the Admin or Manage role for the asset on which the incident is identified.

Acknowledging an incident might affect the asset's Incident severity badge. When the incident was in an Unassigned state or was assigned to someone else, the severity was taken into account in the computation of the highest severity to propagate up to antecedent assets in the navigation pane. When you acknowledge an incident, it is moved into your queue in the Message Center and the severity is no longer propagated up to antecedent assets in the navigation pane.

When you acknowledge the Critical incident, the badge is replaced with the Warning badge because that is now the highest level unacknowledged incident.

Figure 3-11 Effect of Acknowledging a Critical Incident

Description of Figure 3-11 follows
Description of "Figure 3-11 Effect of Acknowledging a Critical Incident"
  1. Open the incident from the Message Center or Assets view.

    • Message Center View: Click Message Center, then click an Incident category: Unassigned Incidents, My Incidents, or Incidents Assigned to Others

    • Assets View: Click the asset in the Assets section of the Navigation pane, then click the Incidents tab in the center pane.

  2. Select one or more incidents, then click the Acknowledge Incidents icon in the center pane.

Adding an Annotation

Annotations are defined by the asset type. Annotations are comments, a suggested action, or a reference to an operational profile.

Any user can add an annotation to a Incident. To add an entry to the Incidents Knowledge Base requires Oracle Enterprise Manager Ops Center Admin permissions.

  1. Open the incident from the Message Center or Assets view.
    • Message Center View: Click Message Center, then click an Incident category: Unassigned Incidents, My Incidents, or Incidents Assigned to Others

    • Assets View: Click the asset in the Assets section of the Navigation pane, then click the Incidents tab in the center pane.

  2. Select the incident, then click the Add Annotations icon in the center pane.
  3. Select one of the following types from the Annotation Type from the drop-down list:
    • Comment: Text only option that is designed to be used to add a note or editorial comment.

    • Suggested Action: Text required and a script is optional.

  4. Select an operational plan from the drop-down list of operational profiles defined for the type of asset on which this incident is open.
  5. The Synopsis field is completed based on the annotation type. Edit the synopsis, as needed. The UI does not have a character limit, but the API allows for 80 characters.

    Note:

    When you enter more than 80 characters, the synopsis is truncated to the first 80 characters when viewed in the annotation.

  6. Type a description or instructions in the Note field. There is no character limit.
  7. To add the annotation to the Incidents Knowledge Base and include the annotation for every incident of this type and severity, click the check box.

    Note:

    You must have the Oracle Enterprise Manager Ops Center Admin role to complete this operation.

  8. Click Save and Execute or click Save.

Displaying Annotations

You can display annotations for an asset type in the Incidents Knowledge Base.

  1. Click Plan Management.
  2. Expand Incidents Knowledge Base in the Navigation pane, then select the asset type. The annotations associated with the asset type appear in the center pane.

Viewing Comments

Procedure to view comments.

A comment is a type of annotation. To add a comment, see Adding an Annotation.

  1. Expand the Message Center, then click one of the following:
    • Unassigned Incidents

    • My Incidents

    • Incidents Assigned to Others

  2. Click the incident in the center pane.
  3. Click the View Comments icon.

Taking Action on a Incident

When you have the Manage or Administration role for an asset that has an open incident, you can correct some incidents by using an automated annotation, if one has been associated with the same issue. For other incidents, review the issue before deciding on the appropriate action.

  1. Open the incident from the Message Center or Assets view.
    • Message Center View: Click Message Center, then click an Incident category: Unassigned Incidents, My Incidents, or Incidents Assigned to Others

    • Assets View: Click the asset in the Assets section of the Navigation pane, then click the Incidents tab in the center pane.

  2. Select the incident.
  3. Click the Take Actions on a Incident icon in the center pane.
  4. Select the action to perform:
    • When the Incidents Knowledge Base has provided a suggested action for the incident, select Execute the Selected Suggested Action option and then select the action from the table.

    • When an operational plan has a suggested action, select the Execute an Operational Plan option, then select the plan from the drop-down list.

    • To run a script or command that is not part of a suggested action or operational plan, select the Execute a Command or Script File option.

      • To execute a command, enter the command in the field.

      • To browse for a script, click Browse and then select the script from the File Chooser popup.

  5. Select where to run the script, on the managed asset where the incident is open, or on the Enterprise Controller.
  6. Define the time out period for the action, in minutes, hours, or days.
  7. (Optional) Add a note describing the action taken.
  8. Click Execute Selected Action.

Marking an Incident Repaired

The software cannot determine when an incident is repaired. However, you can open a known incident and add a note with the repair details and mark the incident as repaired. You must have the Manage or Admin role for the asset to perform this task.

After marking this incident as repaired, its severity badge does not appear in the assets list in the navigation pane.

  1. From the Message Center, click Message Center, then click one of the following:
    • Unassigned Incidents

    • My Incidents

    • Incidents Assigned to Others

    or

    From the asset view, click the asset in the Assets section of the Navigation pane, then click the Incidents tab.

  2. Select one or more incidents, then click the Mark Incidents as Repaired icon in the center pane.
  3. (Optional) Select the incident, then add a Note.
  4. Click Tag Incidents as Being Repaired.

About Closing an Incident

The incident stays open until you close it, even if the alerting condition is cleared. To remove an incident from the Message Center and the asset view, you must close the incident or take no action on it for seven (7) days.

When any action is taken on an incident, such as adding an annotation, the counter is reset.

Note:

Incidents with no activity for seven (7) days are closed automatically by Oracle Enterprise Manager Ops Center, and do not appear in the UI. You can edit this value in the public API.

When an incident is closed, its status changes to Closed, the incident is deleted from the list of active incidents, and the incident is no longer displayed in the UI. You can retrieve information about a closed incident for 60 days by using the public API. After 60 days, closed incidents are permanently deleted. To edit the time limit, you must edit the value in the public API. You can disable the time limit by setting the value for the number of days to 0.

Note:

When the monitoring condition is still true after the incident is closed, a new alert is raised and a new incident is created.

Closing an Incident

You can close an incident from the asset view or from the following categories in the Message Center:

  • Unassigned Incidents

  • My Incidents

  • Incidents Assigned to Others

Perform the following steps to close an incident from the asset view:

  1. Click the asset in the Assets section of the Navigation pane, then click the Incidents tab.
  2. Select one or more incidents, then click the Close Incidents icon in the center pane.
  3. (Optional) Select the incident, then add a Note.
  4. (Optional) To temporarily disable the monitoring rule that identified the incident, click the Action check box, then define when to enable the monitors.

    This action does not disable the monitoring rule for all assets. The action disables the monitoring rule for only the assets that were related to the incident to avoid raising a similar incident on the same assets.

  5. Click Close Incidents.