Skip Headers
Oracle® Enterprise Manager Cloud Control Administrator's Guide
12c Release 2 (12.1.0.2)

Part Number E24473-20
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

3 Using Incident Management

Incident management allows you to monitor and resolve service disruptions quickly and efficiently.

This chapter covers the following:

3.1 Events, Incidents, and Problems

Enterprise Manager Cloud Control 12c greatly expands target monitoring and management capability beyond previous releases by focusing on what is important from a broader management perspective rather than the discrete events that may point to the same underlying issue. Enterprise Manager exposes three levels of management granularity that when combined, provide complete monitoring/management coverage of your environment.

3.1.1 Event Management

Intuitively, you monitor for specific events in your monitored environment. An event is a significant occurrence on a managed target that typically indicates something has occurred outside normal operating conditions--they provide a uniform way to indicate that something of interest has occurred in an environment managed by Enterprise Manager.

3.1.2 Incident Management

You monitor and manage your Enterprise Manager environment via incidents and not discrete events (even though an incident can conceivably consist of a single event). Of all events raised within your managed environment, there is likely only a subset that you need to act on because they impact your business applications (such as a target down event). However, managing by incident also allows you to address more complex situations where the subset of events you are interested in are related and may indicate a higher level issue needs to be addressed as a single issue and not as individual events: A cluster of events by themselves may indicate a minor administrative issue, but when viewed together may signify a larger problem that can potentially consist of events from multiple domains/layers of your monitored infrastructure.

For example, you are monitoring a host. If you want to monitor 'load' being placed on one or more hosts you might be interested in events such as CPU utilization, memory utilization, and swap utilization exceeding acceptable metric thresholds. Individually, these events may or may not indicate an issue with the host, but together, these events form an incident indicating extreme load is being placed on a monitored host.

Incidents represent the larger service disruptions that may impact your business instead of discrete events. Managing by incidents, therefore, allows you to monitor for complex operational issues that may affect multiple domains that may impact your business. These incidents typically need to be tracked, assigned to appropriate personnel, and resolved as quickly as possible. You can effectively implement a centralized monitoring that consolidates monitoring information and more effectively allocate resource across your ecosystem to resolve or prevent issues from occurring. The end result is better implementation of your business processes that in turn lead to better performance of your IT resources.

3.1.3 Problem Management

Problem management involves the functionality that helps track the underlying root causes of incidents. Once the immediate service disruptions represented by incidents are resolved, you can then progress to understanding and resolving the underlying root cause of the issue.

For the current release, problems pertain to the diagnostic incidents and problems stored in Automatic Diagnostic Repository (ADR), which are automatically raised by Oracle software when it encounters critical errors in the software. When problems are raised for Oracle software, Oracle has determined that the recommended recourse is to open a Service Request (SR), send support the diagnostic logs, and eventually provide a solution from Oracle. A problem represents the underlying root cause of a set of incidents. Enterprise Manager provides features to track and manage the lifecycle of a problem.

3.1.4 Summing Up

  • Event: A significant occurrence of interest on a target that has been detected by Enterprise Manager.

    Goal: Ensure that your environment is monitored.

  • Incident: A set of significant events or combination of related events that pertain to the same issue.

    Goal: Ensure that service disruptions are either avoided or resolved quickly.

  • Problems: The underlying root cause of incidents. Currently, this represents critical errors in Oracle software that represents the underlying root cause of diagnostic incidents.

    Goal: Ensure underlying root causes of issues are resolved to avoid future occurrence of issues.

Events, incidents, and problems work in concert to allow you to manage your complete IT ecosystem both effectively and efficiently. The following illustration summarizes how they work within your managed environment.

Figure 3-1 Event/Incident/Problem Flow

Incident workflow

The following sections delve into events, incidents, and problems in more detail.

3.1.5 Working with Events, Incidents, and Problems

Effectively managing your environment via events, incidents, and problems involves planning, setup, implementation, and finally usage. Ultimately, the goal is to manage your environment through a series of incidents that address the critical IT components of your organization that affect corporate operations and ultimately profitability.

This chapter covers operational specifics about Enterprise Manager's incident management feature. However, as part of the larger Enterprise Manager core framework, other areas of the monitoring infrastructure are involved. The following topics link off to appropriate areas of this chapter as well as other areas of Enterprise Manager documentation.

3.2 Key Concepts: Events, Incidents, and Problems

As discussed previously, Enterprise Manager Cloud Control 12c lets you focus on what is important from a broader monitoring/management perspective via events, incidents, and problems. This section covers these concepts in greater detail.

3.2.1 Events

An event is a significant occurrence on a managed target that typically indicates something has occurred outside normal operating conditions. Examples of events include: database target down, performance threshold violation, unapproved change in application configuration files, or job failure. An event can also be raised to signal successful operations or a job successfully completed.

Existing Enterprise Manager customers may be familiar with metric alerts and metric collection errors. For Enterprise Manager 12c, metric alerts are a type of event, one of many different event types. The notion of an event unifies the different exception conditions that are detected by Enterprise Manager, such as monitoring issues or compliance issues, into a common concept. It is backed by a consistent and uniform set of event management capabilities that can indicate something of interest has occurred in a datacenter managed by Enterprise Manager.

All events have the following attributes:

Table 3-1 Event Attributes

Attribute Description

Type

Type of event that is being reported. All events of a specific type share the same set of attributes that describe the exact nature of the problem. For example, Metric Alert, Compliance Standard Score Violation, or Job Status Change.

Severity

Event severity. For example, Fatal, Warning, or Critical.

Internal Name

An internal name that describes the nature of the event and can be used to search for events. For example, you can search for all tablespacePctUsed events.

Entity on which the event is raised.

An event can be raised on a target, a non-target source object (such as a job) or be related to a target and a non-target source object. Note: This attribute is important when determining what privileges are required to manage the event.

Message

Informational text associated with the event.

Reported Date

Time the event was reported.

Category

Functional or operational classification for an event.

Available Categories:

  • Availability

  • Business

  • Capacity

  • Configuration

  • Diagnostics

  • Error

  • Fault

  • Jobs

  • Load

  • Performance

  • Security


Event Types

The type of an event defines the structure and payload of an event and provides the details of the condition it is describing. For example, a metric alert raised by threshold violation has a specific payload whereas a job state change has a different structure. As shown in the following table, the range of events types greatly expands Enterprise Manager's monitoring flexibility.

Event Type Description
Target Availability The Target Availability Event represents a target's availability status (Example: Up, Down, Agent Unreachable, or Blackout).
Metric Alert A metric alert event is generated when an alert occurs for a metric on a specific target (Example: CPU utilization for a host target) or metric on a target and object combination Example: Space usage on a specific tablespace of a database target.
Metric Evaluation Error A metric evaluation error is generated when the collection for a specific metric group fails for a target.
Job Status Change All changes to the status of an Enterprise Manager job are treated as events, and these events are made available via the Job Status Change event class.

Note: A prerequisite to creating Incident Rules, is to enable the relevant job status and add required targets to job event generation criteria. To change this criteria, from the Setup menu, select Incidents, and then Job Events.

Compliance Standard Rule Violation Events are generated for compliance standard rule violations. Each event corresponds to a violation of a compliance rule on a specific target.
Compliance Standard Score Violation Events are generated for compliance standard score violations. An event is generated when the compliance score for a compliance standard on a specific target falls below predefined thresholds.
High Availability High Availability events are generated for database availability operations (shutdown and startup), database backups and Data Guard operations (switchover, failover, and other state changes).
Service Level Agreement Alert These events are generated when a service level or service level objective is violated for a service. occurs for a Service Level Agreement or a Service Level Objective.
User-reported These events are created by end-users.
Application Dependency and Performance Alert Alerts are raised by the Application Dependency and Performance (ADP) monitoring when metrics related to a J2EE application or component have crossed some thresholds.
Application Performance Management KPI Alert An Application Performance Management (APM) Key Performance Indicator (KPI) alert event is generated when a KPI violation alert occurs for a metric on an APM managed entity associated with a Business Application target.
JVM Diagnostics Threshold Violation A JVMD Diagnostics event is raised when a JVMD metric exceeds its threshold value on a Java Virtual Machine target.

Event Severity

The severity of an event indicates the criticality of a specific issue. The following table shows the various event severity levels along with the associated icon.

Icon Severity Description
Surrounding text describes fatal_error_16x16.png. Fatal Corresponding service is no longer available. For example, a monitored target is down (target down event). A Fatal severity is the highest level severity and only applies to the Target Availability event type.
Surrounding text describes error.png. Critical Immediate action is required in a particular area. The area is either not functional or indicative of imminent problems.
Surrounding text describes warning.png. Warning Attention is required in a particular area, but the area is still functional.
Surrounding text describes minor_warning_16x16.png. Advisory While the particular area does not require immediate attention, caution is recommended regarding the area's current state. This severity can be used, for example, to report Oracle best practice violations.
Clear Icon Clear Conditions that raised the event have been resolved.
Surrounding text describes info.png. Informational A specific condition has just occurred but does not require any remedial action.

Events with an informational severity:

  • do not appear in the incident management UI.

  • cannot create incidents.

  • are not stored within Enterprise Manager.


3.2.2 Incidents

While events indicate issues requiring attention in your managed environment, it is more efficient to work on a collective subset of related events as a single unit of work-- you can work on different events representing the same issue or you can work on one incident containing multiple space-related events. For example, you have multiple space events from various targets that indicate you are running low on space. Instead of managing numerous discrete events, you can more efficiently manage a smaller set of incidents.

An incident is a significant event or set of related significant events that need to be managed because it can potentially impact your business applications. These incidents typically need to be tracked, assigned to appropriate personnel, and resolved as quickly as possible. You perform these incident management operations through Incident Manager, an intuitive UI within Enterprise Manager.

Incident Manger provides you with a central location from which to view, manage, diagnose and resolve incidents as well as identify, resolve and eliminate the root cause of disruptions. See Section 3.2.5, "Incident Manager" for more information about this UI.

3.2.2.1 Working with Incidents

When an incident is created, Enterprise Manager makes available a rich set of incident management workflow features that let you to manage and track the incident through its complete lifecycle.

  • Assign incident ownership.

  • Track the incident resolution status.

  • Set incident priority.

  • Set incident escalation level.

  • Ability to provide a manual summary.

  • Ability to add user comments.

  • Ability to suppress/unsuppress

  • Ability to manually clear the incident.

  • Ability to create a ticket manually.

All incident management/tracking operations are carried out from Incident Manager. Creation of incidents for events, assignment of incidents to administrators, setting priority, sending notifications and other actions can be automated using (incident) rules.

Incident Status

The lifecycle of an incident within an organization is typically determined by two pieces of information: The current resolution state of the incident (Incident Status) and how important it is to resolve the incident relative to other incidents (Priority). As key incident attributes, the following options are available:

  • New

  • Work in Progress

  • Closed

  • Resolved

You can define additional statuses if the default options are not adequate. In addition, you can change labels using the Enterprise Manager Command Line Interface (EM CLI). See Advanced Topics for more information.

Priority

By changing the priority, you can escalate the incident and perform operations such as assigning it to a specific IT operator or notifying upper-management. The following priority options are available:

  • None

  • Low

  • Medium

  • High

  • Very High

  • Urgent

Priority is often based on simple business rules determined by the business impact and the urgency of resolution.

Incident Attributes

Every incident possesses attributes that provide information as identification, status for tracking, and ownership. The following table lists available incident attributes.

Incident Attribute Definition
Escalated An escalation level signifying a escalation to raise the level of attention on the incident from your organization's IT or management hierarchy.

Available escalation levels:

  • None (Not escalated)

  • Level 1 through Level 5

Category Operational or organizational classification for an incident. Incidents (and events) can have multiple categories.

Categories for all events within an incident are aggregated.

Available Categories:

  • Availability

  • Business

  • Capacity

  • Configuration

  • Diagnostics

  • Error

  • Fault

  • Jobs

  • Load

  • Performance

  • Security

Summary An intuitive message indicating what the incident is about. By default, the incident summary is pulled from the message of the last event of the incident, however, this message can be changed to a fixed summary by any administrator working on the incident.
Incident Created Date and time the incident was created.
Last Updated Date and time the incident was last updated or when the incident was closed.
Severity Severity is based on the worst severity of the events in the incident. For example, Fatal, Warning, or Critical.
Source Source entities of the incident.
Priority Priority Values
  • None (Default)

  • Low

  • Medium

  • High

  • Very High

  • Urgent

Status Incident Status.
  • New (Default)

  • Work in Progress

  • Closed (Terminal state when the incident is closed.)

  • Resolved

You can define additional statuses if the default options are not adequate. In addition, you can change labels using the Enterprise Manager Command Line Interface (EM CLI).

Comment Annotations added by an administrator to communicate analysis information or actions taken to resolve the incident.
Owner Administrator/user currently working on the incident.
Acknowledged Indicates that a user has accepted ownership of an incident or problem. Available options: Yes or No.

When an incident is acknowledged, it will be implicitly assigned to the user who acknowledged it. When a user assigns an incident to himself, it is considered 'acknowledged'. Once acknowledged, an incident cannot be unacknowledged, but can be assigned to another user. Acknowledging an incident stops any repeat notifications for that incident.


3.2.2.2 Incident Composed of a Single Event

The simplest incident is composed of a single event. In the following example, you are concerned whenever any production target is down. You can create an incident for the target down event which is raised by Enterprise Manager if it detects the monitored target is down. Once the incident is created, you will have available all incident management functionality required to track and manage its resolution.

Figure 3-2 Incident with a Single Event

graphic illustrates an incident with a single event.

The figure shows how both the incident and event attributes are used to help you manage the incident. From the figure, we see that the database DB1 has gone down and an event of Fatal severity has been raised. When the event is newly generated, there is no ownership or status. An incident is opened that can be updated manually or by automated rules to set owners, status, as well as other attributes. In the example, the owner/administrator Scott is currently working to resolve the issue.

The incident severity is currently Fatal as the incident inherits the worst severity of all the events within incident. In this case there is only one event associated with the incident so the severity is Fatal.

3.2.2.3 Incident Composed of Multiple Events

Situations of interest may involve more than a single event. It is an incident's ability to contain multiple events that allows you to monitor and manage complex and more meaningful issues.

Note:

Multi-event incidents are not automatically generated. An administrator must manually create them.

For example, if a monitored system is running out of space, separate multiple events such as tablespace full and filesystem full may be raised. Both, however, are related to running out of space. Another machine resource monitoring example might be the simultaneous raising of CPU utilization, memory utilization, and swap utilization events. Together, these events form an incident indicating extreme load is being placed on a monitored host. The following figure illustrates this example.

Figure 3-3 Incident with Multiple Events

graphic shows a incident with multiple events

Incidents inherit the worst severity of all the events within incident. The incident summary indicates why this incident should be of interest, in this case, "Machine Load is high". This message is an intuitive indicator for all administrators looking at this incident. By default, the incident summary is pulled from the message of the last event of the incident, however, this message can be changed by any administrator working on the incident.

Because administrators are interested in overall machine load, administrator Sam has manually created an incident for these two metric events because they are related—together these events represent a host overload situation. An administrator needs to take action because memory is filling up and consumed CPU resource is too high. In its current state, this condition will impact any applications running on the host.

3.2.2.4 How are Incidents Created?

Incidents are most commonly created automatically through rules and rule sets--user-defined instructions that tell Incident Manager how to handle specific events when they occur. As shown in the preceding examples, incidents can also be created manually. Once an incident is raised, its severity is inherited from the worst severity of all events within the incident. The latest event Message, by default, becomes the Incident Summary.

3.2.3 Problems

For Enterprise Manager 12c, problems focus on the diagnostic incidents and problem diagnostic incidents/problems stored in Advanced Diagnostic Repository (ADR), which are automatically raised by Oracle software when it encounters critical errors in the software. A problem, therefore, represents the root cause of all the Oracle software incidents. For these diagnostic incidents, in order to address root cause, a problem is created that represents the root cause of these diagnostic incidents. A problem is identified by a problem key which uniquely identifies the particular error in software. Each occurrence of this error results in a diagnostic incident which is then associated with the problem object.

When a problem is raised for Oracle software, Oracle has determined that the recommended recourse is to open a service request (SR), send support the diagnostic logs, and eventually provide a solution from Oracle. As an incident, Enterprise Manager makes available all tracking, diagnostic, and reporting functions for problem management. Whenever you view all open incidents and problems, whether you are using Incident Manager, or in context of a target/group home page, you can easily determine what issues are actually affecting your monitored target.

To manage problems, you can use Support Workbench to package the diagnostic details gathered in ADR and open SR. Users should then manage the problems in Incident Manager. Access to Support Workbench functionality is available through Incident Manager (Guided Resolution area) in context of the problem.

3.2.4 Rule Sets

Incident rules and rule sets automate actions related to events, incidents and problems. They can automate the creation of incidents based on important events, perform notification actions such as sending e-mail or opening helpdesk tickets, or perform operations to manage the incident workflow lifecycle such as changing incident ownership, priority, or escalation level.

With previous versions of Enterprise Manager, you used notification rules to choose the individual targets and conditions for which you want to perform actions or receive notifications (send e-mail, page, open a helpdesk ticket) from Enterprise Manager. For Enterprise Manager 12c, the concept and function of notification rules has been replaced with incident rules and rule sets.

  • Rules: A rule instructs Enterprise Manager to take specific actions when incidents, events, or problems occur, such as performing notifications. Beyond notifications, rules can also instruct Enterprise Manager to perform specific actions, such as creating incidents, updating incidents and problems. The actions can also be conditional in nature. For example, a rule action can be defined to page a user when an incident severity is critical or just send e-mail if it is warning.

  • Rule Set: An incident ruleset is a collection of rules that apply to a common set of objects such as targets (hosts, databases, groups), jobs, metric extensions, or self updates and take appropriate actions to automate the business processes underlying event, incident and problem management.

Operationally, individual rules within a rule set are executed in a specified order as are the rule sets themselves. Rule sets are executed in a specified order. By default, the execution order for both rules and rule sets is the order in which they are created, but they can be reordered from the Incident Rules UI.

The following figure shows typical rule set structure and how the individual rules are applied to a heterogeneous group of targets.

Figure 3-4 Rule Set Application

Graphic shows the applications of an incident rule set.

The graphic illustrates a situation where all rules pertaining to a group of targets can be put into a single rule set (this is also a best practice). In the above example, a group named PROD-GROUP consists of hosts, databases, and WebLogic servers exists as part of a company's managed environment. A single rule set is created to manage the group.

In addition to the actual rules contained within a rule set, a rule set possesses the following attributes:

  • Name: A descriptive name for the rule set.

  • Description: Brief description stating the purpose of the rule set.

  • Applies To: Object to which all rules in the rule set apply: Valid rule set objects are targets, jobs, metric extensions, and self update.

  • Owner: The Enterprise Manager user who created the rule set. Rule set owners have the ability to update or delete the rule set and the rules in the rule set.

  • Enabled: Whether or not the rule set is actively being applied.

  • Type: Enterprise or Private. See "Rule Set Types"

3.2.4.1 Out-of-Box Rule Sets

Enterprise Manager provides out-of-box rule sets for incident creation and event clearing based on typical scenarios. Out-of-box rule sets cannot be edited or deleted. As a best practice, you should create your own copies of out-of-box rule sets and then subscribe to the rule set copies rather than subscribing directly to the out-of-box rule sets. Changes to out-of-box rule set definitions and the actions they perform can be made by Oracle at any time and will be applied during patching or software upgrade.

Regular Enterprise Manager administrators are allowed to perform the following operations on rule sets:

  • Subscribe

  • Unsubscribe

Note:

Even though administrators can subscribe to a rule set, they will only receive notification from the targets for which they have at least the View Target privilege.

Enterprise Manager Super Administrators have the added ability to reorder the rule sets. Enterprise rule sets are evaluated sequentially. For example, when a new event, incident, or problem arises, the first rule set in the list is checked to see if any of its member rules apply and appropriate actions specified in those rules are taken. The second rule is then checked to see if its rules apply and so on. Private rule sets are only evaluated once all enterprise rule set evaluations are complete and in no particular order.

Important:

Use caution when reordering rule sets as their order defines the event, incident, and problem handling workflow. Reordering rule sets without fully understanding the impact on your system can result in unintended actions being taken on incoming events, incidents, and problems.

3.2.4.2 Rule Set Types

There are two types of Rule Sets:

  • Enterprise: Used to implement all operational practices within your IT organization. All supported actions are available for this type of rule set. However, because this type of rule set can perform all actions, there are restrictions as to who can create an enterprise rule set.

    In order to create or edit an enterprise rule set, an administrator must have been granted the Create Enterprise Rule Set privilege on the Enterprise Rule Set resource. However, if the rule set owner loses the Create Enterprise Rule Set system privilege at some future time, he can still edit or delete the rule set. Super Administrators can edit or delete any rule set. If the originator of the rule set wants other administrators to edit the rule set, he will need to share access in order to work collaboratively by adding co-authors. Enterprise rule sets are visible to all administrators.

  • Private: Used when an administrator wants to be notified about something he is monitoring but not as a standard business practice. The only action a private rule set can perform is to send e-mail to the rule set owner. Any administrator can create a private rule set regardless of whether they have been granted the Create Enterprise Rule Set resource privilege. Oracle recommends that private rule sets be used only in rare or exceptional situations.

When a rule set performs actions, the privileges of the rule set creator are used. For example, a rule set owner/creator must have at least View Target privilege in order to receive notifications and at least Manage Target Events privilege in order to update the incident. The exception is when a rule set sends a notification. In this case, the privileges of the user it is sent to is used.

3.2.4.3 Rules

Rules are instructions within a rule set that automate actions on incoming events or incidents or problems. Because rules operate on incoming incidents/events/problems, if you create a new rule, it will not act retroactively on incidents/events/problems that have already occurred.

Every rule is composed of two parts:

  • Criteria: The events/incidents/problems on which the rule applies.

  • Action(s): The ordered set of one or more operations on the specified events, incidents, or problems. Each action can be executed based on additional conditions.

The following table shows how rule criteria and actions determine rule application. In this rule operation example there are three rules which take actions on selected events and incidents. Within a ruleset, rules are executed in a specified order. The rule execution order can be changed at any time. By default, rules are executed in the order they are created.

Table 3-2 Rule Operation

Rule Name Execution Order Criteria Action



Condition Actions

Rule 1

First

CPU Util(%), Tablespace Used(%) metric alert events of warning or critical severity

 

Create incident.

Rule 2

Second

Incidents of warning or critical severity

If severity = critical

If severity =warning

Notify by page

Notify by e-mail

Rule 3

Third

Incidents are unacknowledged for more than six hours

 

Set escalation level to 1


In the rule operation example, Rule 1 applies to two metric alert events: CPU Utilization and Tablespace Used. Whenever these events reach either Warning or Critical severity threshold levels, an incident is created.

When the incident severity level (the incident severity is inherited from the worst event severity) reaches Warning, Rule 2 is applied according to its first condition and Enterprise Manager sends an e-mail to the administrator. If the incident severity level reaches Critical, Rule 2's second condition is applied and Enterprise Manager sends a page to the administrator.

If the incident remains open for more than six hours, Rule 3 applies and the incident escalation level is increased from None to Level 1.

3.2.4.3.1 Rule Application

Each rule within a rule set applies to an event, incident OR problem. For each of these, you can choose rule application criteria such as:

  • Apply the rule to incoming events or updated events only

  • Apply the rule to critical events only.

Rules are applied to events, incidents, and problems according to criteria selected at the time of rule creation (or update). The following situations illustrate the methodology used to apply rules.

  • If one of the rules creates a new incident in response to an incoming event, Enterprise Manager finishes matching the event to any further rules/rule sets. Once completed, Enterprise Manager then matches the newly created incident to all the rule sets from the beginning to see if any incident-specific rules match.

  • If an incoming event is already associated with an incident (for example, a Warning event creates an incident and then a Critical event is generated for the same issue), Enterprise Manager applies all the matching rules to the event and then matches all rules to the incident.

  • If, while applying a rule to an incident, changes are made to the incident (change priority. for example), Enterprise Manager stops rule application at that point and then re-applies the rules to the incident from the beginning. The conditional action that updated the incident will not be matched again in the same rule application cycle.

3.2.4.3.2 Rule Criteria

The following tables list selectable criteria for each type.

Table 3-3 Rule Criteria: Events

Criteria Description

Type

Rule applies to a specific event type.

Severity

Rule applies to a specific event severity.

Category

Rule applies to a specific event category.

Target type

Rule applies to a specific target type.

Target Lifecycle Status

Rule applies to a specific lifecycle status for a target. Lifecycle status is a target property that specifies a target's operational status.

Associated with incident

Typically, events are associated with incidents through rules. Specify Yes or No.

Event name

Rule applies to events with a specific name. The specified name can either be an exact match or a pattern match.

Root cause analysis result

Upon completion of Root Cause Analysis (RCA) event, the rule applies to the event that is marked either as root cause or symptom. Alternatively, the rule can act on an RCA event when it is no longer a symptom.

Associated incident acknowledged

Rule applies to an event that is associated with a specific incident when that incident is acknowledged by an administrator. Specify Yes or No.

Total occurrence count

For duplicated events, the rule is applies when the total number of event occurrences reaches a specified number.

Comment added

Rule applies to events where an administrator adds a comment.


For incidents, a rule can apply to all new and/or updated incidents, or newly created incidents that match specific criteria shown in the following table.

Table 3-4 Rule Criteria: Incidents

Criteria Description

Rules that created the incidence

Rule applies to incidents raised by a specific rule.

Category

Rule applies to a specific incident category.

Target Type

Rule applies to a specific target type.

Target Lifecycle Status

Rule applies to a specific lifecycle status for a target. Lifecycle status is a target property that specifies a target's operational status.

Severity

Rule applies to a specific incident severity.

Acknowledged

Rule applies if the incident has been acknowledged by an administrator. Specify Yes or No.

Owner

Rule applies for a specified incident owner.

Priority

Rule applies when incident priority matches a selected priority.

Status

Rule applies when the incident status matches a selected incident status.

Escalation Level

Rule applies when the incident escalation level matches the selected level. Available escalation levels: None, Level 1, Level 2, Level 3, Level 4, Level 5

Associated with Ticket

Rule applies when the incident is associated with a helpdesk ticket. Specify Yes or No.

Associated with Service Request

Rule applies when the incident is associated with a service request. Specify Yes or No.

Diagnostic Incident

Rule applies when the incident is a diagnostic incident. Specify Yes or No.

Unassigned

Rule applies if the newly raised incident does not have an owner.

Comment Added

Rule applies if an administrator adds a comment to the incident.


For problems, a rule can apply to all new and/or updated problems, or newly created problems that match specific criteria shown in the following table.

Table 3-5 Rule Criteria: Problems

Criteria Description

Problem key

Each problem has a problem key, which is a text string that describes the problem. It includes an error code (such as ORA 600) and in some cases, one or more error parameters.

Rule can apply to a specific problem key or a key matching a specific pattern (using a wildcard character).

Category

Rule applies to a specific problem category.

Target Type

Rule applies to a specific target type.

Target Lifecycle Status

Rule applies to a specific lifecycle status for a target. Lifecycle status is a target property that specifies a target's operational status.

Acknowledged

Rule applies when the problem is acknowledged.

Owner

Rule applies for a specified problem owner.

Priority

Rule applies when problem priority matches a selected priority.

Status

Rule applies when the problems matches a specific status.

Escalation Level

Rule applies when the problem escalation level matches the selected level. Available escalation levels: None, Level 1, Level 2, Level 3, Level 4, Level 5

Incident Count

Rule applies when the number of incidents related to the problem reaches the specified count limit. The problem owner and the Operations manager are notified via e-mail.

Associated with Service Request

Rule applies if the incoming problem is has an associated Service Request. Specify Yes or No.

Associated with Bug

Rule applies if the incoming problem is has an associated bug. Specify Yes or No.

Unassigned

Rule applies if the newly raised incident does not have an owner.

Comment Added

Rule applies if an administrator adds a comment to the problem.


3.2.4.3.3 Rule Actions

For each rule, Enterprise Manager allows you to define specific actions.

Some examples of the types of actions that a rule set can perform are:

  • Create an incident based on an event.

  • Perform notification actions such as sending an e-mail or generating a helpdesk ticket.

  • Perform actions to manage incident workflow notification via e-mail/PL/SQL methods/ SNMP traps. For example, if a target down event occurs, create an incident and e-mail administrator Joe about the incident. If the incident is still open after two days, set the escalation level to one and e-mail Joe's manager.

The following table summarizes available actions for each rule application.

Table 3-6 Available Rule Actions

Action Event Incident Problem

E-mail

Yes

Yes

Yes

Page

Yes

Yes

Yes

Advanced Notifications

     

Send SNMP Trap

Yes

No

No

Run OS Command

Yes

Yes

Yes

Run PL/SQL Procedure

Yes

Yes

Yes

Create an Incident

Yes

No

No

Set Workflow Attributes

Yes

Note: Within an event rule, the workflow attributes of the associated incident can also be updated.

Yes

Yes

Create a Helpdesk Ticket

Yes

Note: Action performed indirectly by first creating an incident and then creating a ticket for the incident.

Yes

No


3.2.5 Incident Manager

Incident Manager provides, in one location, the ability to search, view, manage, and resolve incidents and problems impacting your environment. Use Incident Manager to perform the following tasks:

  • Filter incidents, problems, and events by using custom views

  • Search for specific incidents by properties such as target name, summary, status, or target lifecycle status

  • Respond and work on an incident

  • Manage incident lifecycle including assigning, acknowledging, tracking its status, prioritization, and escalation

  • Access (in context) My Oracle Support knowledge base articles and other Oracle documentation to help resolve the incident.

  • Access direct in-context diagnostic/action links to relevant Enterprise Manager functionality allowing you to quickly diagnose or resolve the incident.

Figure 3-5 incident Manager

graphic shows the incident manager console.
Description of "Figure 3-5 incident Manager"

For example, you have an open incident. You can use Incident Manager to track its ownership, its resolution status, set the priority and, if necessary, add annotations to the incident to share information with others when working in a collaborative environment. In addition, you have direct access to pertinent information from MOS and links to other areas of Enterprise Manager that will help you resolve issues quickly. By drilling down on an open incident, you can access this information and modify it accordingly.

Displaying Target Information in the Context of an Incident

You can directly view information about a target for which an incident or event has been raised. The type of information shown varies depending on the target type.

To display in-context target information:

  1. From the Enterprise menu, select Monitoring and then Incident Manager.

  2. From the Incident Manager UI, choose an incident. Information pertaining to the incident displays.

  3. From the Incident Details area of the General tab, click on the information icon "i" next to the target. Target information as it pertains to the incident displays. See Figure 3-6

Figure 3-6 Target Information in Context of an Incident

target properties in context of an incident

Being able to display target information in this way provides you with more operational context about the targets on which the events and incidents are raised. This in turn helps you manage the lifecycle of the incident more efficiently.

Cloud Control Mobile

Also available is the mobile application Cloud Control Mobile, which lets you manage incidents and problems on the go using any iDevice to remotely connect to Enterprise Manager.

Figure 3-7 Cloud Control Mobile

Cloud Control Mobile

For more information about this mobile application, see Chapter 15, "Remote Access To Enterprise Manager"

3.2.5.1 Views

Views let you work efficiently with incidents by allowing you to categorize and focus on only those incidents of interest. A view is a set of search criteria for filtering incidents and problems in the system. Incident Manager provides a set of predefined standard views that cover the most common event, incident, and problem search scenarios. In addition, Incident Manager also allows you to create your own custom views. Custom views cannot be shared with other users. For instructions on creating custom views, see Setting Up Custom Views.

3.3 Setting Up Your Incident Management Environment

Before you can monitor and manage your environment using incidents, you must ensure that your monitoring environment is properly configured. Proper configuration consists of the following:

3.3.1 Setting Up Your Monitoring Infrastructure

The first step in setting up your monitoring infrastructure is to determine which conditions need to be monitored and hence are the source of events. To prevent an inordinate number of extraneous events from being generated, thus reducing system and administrator overhead, you need to determine what is of interest to you and enable monitoring based on your requirements. You can leverage Enterprise Manager features such as Administrations Groups to automatically apply management settings such as monitoring settings or compliance standards when new targets are added to your monitored environment. This greatly simplifies the task of ensuring that events are raised only for those conditions in which you are interested. For more information, see Chapter 6, "Using Administration Groups".

Example: You want to ensure that the database containing your human resource information is available round the clock. One condition you are monitoring for is whether that database target is up or down. If it goes down, you want the appropriate person to be notified and have them resolve the problem as quickly as possible. Other conditions that you may want to monitor include performance threshold violations, any changes in application configuration files, or job failures. Working with events, you are monitoring and managing individual targets and issues directly related to those targets. For example, you monitor for individual database availability, individual host threshold violations such as CPU and I/O load, or perhaps the performance of a Web service.

In general, if you are primarily interested in availability and some key performance related metrics, you should use default monitoring templates and other template features to ensure the only those specific metrics are collected and events are raised only for those metrics.

Job Events: The status of a job can change throughout its lifecycle - from the time it is submitted to the time it has executed. For each of these job statuses, events can be raised to notify administrators of the status of the job.

As a general rule, events should be generated only for job status values that require administration attention. These job status values include Action Required and Problem status values such as Failed or Stopped. However, in order to avoid overloading the system with unnecessary events, job events are not enabled for any target by default. Hence, if you would like to generate events for jobs, you must:

  1. Set the appropriate job status. You can use the default settings or modify them as required.

  2. Specify the set of targets for which you would like job-related events to be generated.

    You can perform these operations from the Job Event Generation Criteria page. From the Setup menu, choose Incidents and then Job Events.

3.3.1.1 Rule Set Development

Before creating incident rules/rule sets, the first step is to strategically determine when incidents should be created based on the business requirements of your organization. Important questions to consider are:

  1. What events should create incidents? Which service disruptions need to be tracked and resolved by IT administrators?

  2. Which administrators should be notified for incoming events or incidents?

  3. Are any of the events or incidents being forwarded to external systems (such as a helpdesk ticketing system) ?

Once the exact business requirements are understood, you translate those into enterprise rule sets. Adhering to the following guidelines will result in efficient use of system resource as well as operational efficiency.

  • For rule sets that operate on targets (for example, hosts and databases), use groups to consolidate targets into a smaller number of monitoring entities for the rule set. Groups should be composed of targets that have similar monitoring requirements including incident management and response.

  • All the rules that apply to the same groups of targets should be consolidated into one rule set. You can create multiple rules that apply to the targets in the rule set. You can create rules for events specific to an event class, rules that apply to events of a specific event class and target type, or rules that apply to incidents on these targets.

  • Leverage the execution order of rules within the rule set. Rule sets and rules within a rule set are executed in sequential order. Therefore, ensure that rules and rulesets are sequenced with that in mind.

When creating a new rule, you are given a choice as to what object the rule will apply— events, incidents or problems. Use the following rule usage guidelines to help guide your selection.

Table 3-7 Rule Usage Guidelines

Rule Usage Application

Rules on Event

To create incidents for the events managed in Enterprise Manager.

To send notifications on events.

To create tickets for incidents managed by helpdesk analysts, you want to create an incident for an event, then create a ticket for the incident.

Send events to third-party management systems.

Rules on Incidents

Automate management of incident workflow operations (assign owner, set priority, escalation levels..) and send notifications

Create tickets based on incident conditions. For example, create a ticket if the incident is escalated to level 2.

Rules on Problems

Automate management of problem workflow operations (assign owner, set priority, escalation levels..) and send notifications


Rule Set Example

The following example illustrates many of the implementation guidelines just discussed. All targets have been consolidated into a single group, all rules that apply to group members are part of the same rule set, and the execution order of the rules has been set. In this example, the rule set applies to a group (Production Group G) that consists of the following targets:

  • DB1 (database)

  • Host1 (host)

  • WLS1 (WebLogic Server)

All rules in the rule set perform three types of actions: incident creation, notification, and escalation.

graphic shows an example rule set containing 3 rules.

In a more detailed view of the rule set, we can see how the guidelines have been followed.

graphic shows a detailed view of the rule set where actual rules have been added.

In this detailed view, there are five rules that apply to all group members. The execution sequence of the rules (rule 1 - rule 5) has been leveraged to correspond to the three types of rule actions in the rule set: Rules 1-3

  • Rules 1-3: Incident Creation

  • Rule 4: Notification

  • Rule 5: Escalation

By synchronizing rule execution order with the progression of rule action categories, execution efficiency is achieved. As shown in this example, by using conditional actions that take different actions for the same set of events based on severity, it is easier to change the event selection criteria in the future without having to change multiple rules. Note: This assumes that the action requirements for all incidents (from rules 1 - 3) are the same.

The following table illustrates explicit rule set operation for this example.

Table 3-8 Example Rule Set for Production Group G

Rule Name Execution Order Criteria Action



Condition Actions

Rule Set: Targets within Production Group G

Rule 1

First

DB1 goes down .

Host1 goes down.

WLS1 goes down.

 

Create incident.

Rule 2

Second

DB1

Tablespace Full (%)

Note: The warning and critical thresholds are defined in Metric and Policy settings, not from the rules UI.

Host1

CPU Utilization (%)

WLS1

Heap Usage (%)

If severity=Warning

If severity=Critical

Create incident.

Rule 3

Third

Event generated for problem job status changes for DB1, Host1, and WLS1.

 

Create incident.

Rule 4

Fourth

All incidents for Production Group G

Severity=Warning

Severity=Critical

Send e-mail

Send page

Rule 5

Fifth

Incident remains open for more than 12 days.

Status=Fatal

Increase escalation level to 1.


3.3.1.1.1 Before Using Rules

Before you use rules, ensure the following prerequisites have been set up:

  • User's Enterprise Manager account has notification preferences (e-mail and schedule). This is required not just for the administrator who is creating/editing a rule, but also for any user who is being notified as a result of the rule action.

  • If you decide to use connectors, tickets, or advanced notifications, you need to configure them before using them in the actions page.

  • Ensure that the SMTP gateway has been properly configured to send e-mail notifications.

  • User's Enterprise Manager account has been granted the appropriate privileges to manage incidents from his managed system.

3.3.2 Setting Up Notifications

After determining which events should be raised for your monitoring environment, you need to establish a comprehensive notification infrastructure for your enterprise by configuring Enterprise Manager to send out e-mail and or pages, setting up e-mail addresses for administrators and tagging them as e-mail/paging. In addition, depending on the needs of your organization, notification setup may involve configuring advanced notification methods such as OS scripts, PL/SQL procedures, or SNMP traps. For detailed information and setup instructions for Enterprise Manager notifications, see Chapter 4, "Using Notifications".

3.3.3 Setting Up Administrators and Privileges

This step involves defining the appropriate administrators (which includes assigning the proper privileges for security) and then setting up notification assignments based on their defined roles and domain ownership within your organization.

To perform user account administration, click Setup on the Enterprise Manager home page, select Security, then select Administrators to access the Administrators page.

Graphic displays the adminstrators page.

There are two types of administrators typically involved in incident management.

  • Business Rules Architect/Analyst: Administrator who has a deep understanding of how the business works and translates this knowledge to operational rules. Once these rules have been deployed, the business architect uses their knowledge of the dynamic organization to keep these rules up-to-date.

    In order to create or edit an enterprise rule set, the business architect/analyst must have been granted the Create Enterprise Rule Set privilege on the Enterprise Rule Set resource. The architect/analyst can share ownership of the rule sets with other administrators who may or may not have the Create Enterprise Ruleset privilege but are responsible for managing a specific ruleset.

  • IT Operator/Manager: The IT manager is responsible for day-to-day management of incident assignment. The IT operator is assigned the incidents and is responsible for their resolution.

Privileges Required for Enterprise Rule Sets

As the owner of the rule set, an administrator can perform the following:

  • Update or delete the rule set, and add, modify, or delete the rules in the rule set.

  • Assign co-authors of the rule set. Co-authors can edit the rule set the same as the author. However, they cannot delete rule sets nor can they add additional co-authors.

  • When a rule action is to update an event, incident, or problem (for example, change priority or clear an event), the action succeeds only if the owner has the privilege to take that action on the respective event, incident, or problem.

  • Additionally, user must be granted privilege to create an enterprise rule set.

If an incident or problem rule has an update action (for example, change priority), it will take the action only if the owner of the respective rule set has manage privilege on the matching incident or problem.

To grant privileges, from the Setup menu on the Enterprise Manager home page, select Security, then select Administrators to access the Administrators page. Select an administrator from the list, then click Edit to access the Administrator properties wizard as shown in the following graphic.

graphic shows the administrator edit wizard.

Granting User Privileges for Events, Incidents and Problems

In order to work with incidents, all relevant Enterprise Manager administrator accounts must be granted the appropriate privileges to manage incidents. Privileges for events, incidents, and problems are determined according to the following rules:

  • Privileges on events are calculated based on the privilege on the underlying source objects. For example, the user will have VIEW privilege on an event if he can view the target for the event.

  • Privileges on an incident are calculated based on the privileges on the events in the incident.

  • Similarly, problem privileges are calculated based on privileges on underlying incidents.

Users are granted privileges for events, incidents, and problems in the following situations.

For events, two privileges are defined in the system:

  • The View Event privilege allows you to view an event and add comments to the event.

  • The Manage Event privilege allows you to take update actions on an event such as closing an event, creating an incident for an event, and creating a ticket for an event. You can also associate an event with an incident.

Important:

Incident privilege is inherited from the underlying events.

If an event is raised on a target alone (the majority of event types are raised on targets such as metric alerts, availability events or service level agreement), you will need the following privileges:

  • View on target to view the event.

  • Manage Target Events to manage the event.

    Note: This is a sub-privilege of Operator.

If an event is raised on both a target and a job, you will need the following privileges:

  • View on target and View on the job to view the event.

  • View on target and Full on the job to manage the event.

If the event is raised on a job alone, you will need the following privileges:

  • View on the job to view the event.

  • Full on the job to manage the event.

If an event is raised on a metric extension, you will need View privilege on the metric extension to view the event. Because events raised on metric extensions are informational (and do not appear in Incident Manager) event management privileges do not apply in this situation.

If an event is raised on a Self-update, only system privilege is required. Self-update events are strictly informational.

For incidents, two privileges are defined in the system:

  • The View Incident privilege allows you to view an incident, and add comments to the incident.

  • The Manage Incident privilege allows you to take update actions on an incident. The update actions supported for an incident includes incident assignment and prioritization, resolution management, manually closing events, and creating tickets for incidents.

If an incident consists of a single event, you can view the incident if you can view the event and manage the incident if you can manage the event.

If an incident consists of more than one event, you can view the incident if you can view at least one event and manage incident if you can manage at least one of the events.

For problems, two privileges are defined:

  • The View Problem privilege allows you to view a problem and add comments to the problem.

  • The Manage Problem privilege allows you to take update actions on the problem. The update actions supported for a problem include problem assignment and prioritization, resolution management, and manually closing the problem.

In Enterprise Manager 12c, problems are always related to a single target. So the View Problem privilege, if an administrator has View privilege on the target, and the Manage Problem privilege, if an administrator has manage_target_events privilege on the target, implicitly grants management privileges on the associated event. This, in turn, grants management privileges on the incident within the problem.

3.3.4 Setting Up Rule Sets

Rule sets automate actions in response to incoming events, incidents and problems or updates to them. This section covers the most common tasks and examples.

3.3.4.1 Creating a Rule Set

In general, to create a rule set, perform the following steps:

  1. From the Setup menu, select Incidents then select Incident Rules.

  2. On the Incident Rules - All Enterprise Rules page, edit the existing rule set or create a new rule set. For new rule sets, you will need to first select the targets to which the rules apply. Rules are created in the context of a rule set.

    Note:

    In the case where there is no existing rule set, create a rule set by clicking Create Rule Set... You then create the rule as part of creating the rule set.

    Narrowing Rule Set Scope Based on Target Lifecycle Status

    When creating a new rule set, you can choose to have the rule set apply to a narrower set of targets based on the target's Lifecycle Status value. For example, you can create one rule set that only applies only to targets that have a Lifecycle Status of Staging and Production. As shown in the following graphic, you determine rule set scope by setting the Lifecycle Status filter.

    lifecycle status filter

    Using this filter allows you to create rules for targets based on their Lifecycle Status without having to first create a group containing only such targets.

  3. In the Rules tab of the Edit Rule Set page, click Create... and select the type of rule to create (Event, Incident, Problem) on the Select Type of Rule to Create pop-up dialog. Click Continue.

  4. In the Create New Rule wizard, provide the required information.

  5. Once you have finished defining the rule, click Continue to add the rule to the rule set. Click Save to save the changes made to the rule set.

3.3.4.2 Creating a Rule to Create an Incident

To create a rule that creates an incident, perform the following steps:

  1. From the Setup menu, select Incidents, then select Incident Rules.

  2. Determine whether there is an existing rule set that contains a rule that manages the event. In the Incident Rules page, use the Search option to find the rule/rule set name, description, target name, or target type for the target and the associated rule set. You can search by target name or the group target name to which this target belongs to locate the rule sets that manage the targets.

    Note: In the case where there is no existing rule set, create a rule set by clicking Create Rule Set... You then create the rule as part of creating the rule set.

  3. Select the rule set that will contain the new rule. Click Edit... In the Rules tab of the Edit Rule Set page,

    1. Click Create ...

    2. Select "Incoming events and updates to events"

    3. Click Continue.

    Provide the rule details using the Create New Rule wizard.

    1. Select the Event Type the rule will apply to, for example, Metric Alert. (Metric Alert is available for rule sets of the type Targets.) Note: Only one event type can be selected in a single rule and, once selected, it cannot be changed when editing a rule.

      You can then specify metric alerts by selecting Specific Metrics. The table for selecting metric alerts displays. Click the +Add button to launch the metric selector. On the Select Specific Metric Alert page, select the target type, for example, Database Instance. A list of relevant metrics display. Select the ones in which you are interested. Click OK.

      You also have the option to select the severity and corrective action status.

    2. Once you have provided the initial information, click Next. Click +Add to add the actions to occur when the event is triggered. One of the actions is to Create Incident.

      As part of creating an incident, you can assign the incident to a particular user, set the priority, and create a ticket. Once you have added all the conditional actions, click Continue.

    3. After you have provided all the information on the Add Actions page, click Next to specify the name and description for the rule. Once on the Review page, verify that all the information is correct. Click Back to make corrections; click Continue to return to the Edit (Create) Rule Set page.

    4. Click Save to ensure that the changes to the rule set and rules are saved to the database.

  4. Test the rule by generating a metric alert event on the metrics chosen in the previous steps.

3.3.4.3 Creating a Rule to Manage Escalation of Incidents

To create a rule to manage incident escalation, perform the following steps:

  1. From the Setup menu, select Incidents, then select Incident Rules.

  2. Determine whether there is an existing rule set that contains a rule that manages the incident. You can add it to any of your existing rule sets on incidents.

    Note: In the case where there is no existing rule set, create a rule set by clicking Create Rule Set... You then create the rule as part of creating the rule set.

  3. Select the rule set that will contain the new rule. Click Edit... in the Rules tab of the Edit Rule Set page, and then:

    1. Click Create ...

    2. Select "Newly created incidents or updates to incidents"

    3. Click Continue.

  4. For demonstration purposes, the escalation is in regards to a production database.

    As per the organization's policy, the DBA manager is notified for escalation level 1 incidents where a fatal incident is open for 48 hours. Similarly, the DBA director is paged if the incident has been escalated to level 2, the severity is fatal and it has been open for 72 hours. If the fatal incident is still open after 96 hours, then it is escalated to level 3 and the operations VP is notified.

    Provide the rule details using the Create New Rule wizard.

    1. To set up the rule to apply to all newly created incidents or when the incident is updated with fatal severity, select the Specific Incidents option and add the condition Severity is Fatal .

    2. In the Conditions for Actions region located on the Add Actions page, select Only execute the actions if specified conditions match.

      Select Incident has been open for some time and is in a particular state (select time and optional expressions).

      Select the time to be 48 hours and Status is not resolved or closed.

    3. In the Notification region, type the name of the administrator to be notified by e-mail or page. Click Continue to save the current set of conditions and actions. add action

    4. Repeat steps b and c to page the DBA director (Time in this state is 72 hours, Status is Not Resolved or Closed). If open for more than 96 hours, set escalation level to 3, page Operations VP.

    5. After reviewing added actions sets, click Next. Click Next to go to the Summary screen. Review the summary information and click Continue to save the rule.

  5. Review the sequence of existing enterprise rules and position the newly created rule in the sequence.

    In Edit Rule Set page, click on the desired rule from the Rules table and select Reorder Rules from the Actions menu to reorder rules within the rule set, then click Save to save the rule sequence changes.

Example Scenario

To facilitate the incident escalation process, the administration manager creates a rule to escalate unresolved incidents based on their age:

  • To level 1 if the incident is open for 30 minutes

  • To level 2 if the incident is open for 1 hour

  • To level 3 if the incident is open for 90 minutes

As per the organization's policy, the DBA manager is notified for escalation level 1. Similarly, the DBA director and operations VP are paged for incidents escalated to levels "2" and "3" respectively.

Accordingly, the administration manager inputs the above logic and the respective Enterprise Manager administrator IDs in a separate rule to achieve the above notification requirement. Enterprise Manager administrator IDs represents the respective users with required target privileges and notification preferences (that is, e-mail addresses and schedule).

3.3.4.4 Creating a Rule to Escalate a Problem

In an organization, whenever an unresolved problem has more than 20 occurrences of associated incidents, the problem should be auto-assigned to the appropriate administrator based on target type of the target on which the problem has been raised.

Accordingly, a problem rule is created to observe the count of incidents attached to the problem and notify the appropriate administrator handling that specific target type.

The problem owner and the Operations manager are notified by e-mail.

To create a rule to escalate a problem, perform the following steps:

  1. Navigate to the Incident Rules page.

    From the Setup menu, select Incidents, then select Incident Rules.

  2. On the Incident Rules - All Enterprise Rules page, either create a new rule set (click Create Rule Set...) or edit an existing rule set (highlight the rule set and click Edit...). Rules are created in the context of a rule set.

    Note: In the case where there is no existing rule set, create a rule set by clicking Create Rule Set... You then create the rule as part of creating the rule set.

  3. In the Rules section of the Edit Rule Set page, select Create...

  4. From the Select Type of Rule to Create dialog, select Newly created problems or updates to problems and click Continue.

  5. On the Create New Rule page, select Specific problems and add the following criteria:

    The Attribute Name is Incident Count, the Operator is Greater than or equals and the Values is 20.

    Click Next.

  6. In the Conditions for Actions region on the Add Actions page select Always execute the action. As the actions to take when the rule matches the condition:

    • In the Notifications region, send e-mail to the owner of the problem and to the Operations Manager.

    • In the Update Problem region, enter the e-mail address of the appropriate administrator in the Assign to field.

    Click Continue.

  7. Review the rules summary. Make corrections as needed. Click Continue to return to Edit Rule Set page and then click Save to save the rule set.

3.3.4.5 Subscribing to a Rule

A DBA is aware that incidents owned by him will be escalated when not resolved in 48 hours. The DBA wants to be notified when the rule escalates the Incident. The DBA can subscribe to the Rule, which escalates the Incident and will be notified whenever the rule escalates the Incident.

Before you set up a notification subscription, ensure there exists a rule that escalates High Priority Incidents for databases that have not been resolved in 48 hours

Perform the following steps:

  1. From the Setup menu, select Incidents, and then select Incident Rules.

  2. On the Incident Rules - All Enterprise Rules page, click on the rule set containing incident escalation rule in question and click Edit... Rules are created in the context of a rule set.

    Note: In the case where there is no existing rule set, create a rule set by clicking Create Rule Set... You then create the rule as part of creating the rule set.

  3. In the Rules section of the Edit Rule Set page, highlight the escalation rule and click Edit....

  4. Navigate to the Add Actions page.

  5. Select the action that escalates the incident and click Edit...

  6. In the Notifications section, add the DBA to the E-mail cc list.

  7. Click Continue and then navigate back to the Edit Rule Set page and click Save.

As a result of the edit to the enterprise rule, when an incident stays unresolved for 48 hours, the rule marks it to escalation level 1. An e-mail is sent out to the DBA notifying him about the escalation of the incident.

Alternate Rule Set Subscription Method: From the Incident Rules - All Enterprise Rules page, select the rule in incident rules table. From the Actions menu, select E-mail and then Subscribe me (or Subscribe administrator....).

3.3.4.6 Receiving E-mail for Private Rules

A DBA has setup a backup job on the database that he is administering. As part of the job, the DBA has subscribed to e-mail notification for "completed" job status. Before you create the rule, ensure that the DBA has the requisite privileges to create jobs. See Chapter 9, "Utilizing the Job System and Corrective Actions" for job privilege requirements.

Perform the following steps:

  1. Navigate to the Rules page.

    From the Setup menu, select Incidents, then select Incident Rules.

  2. On the Incident Rules - All Enterprise Rules page, either edit an existing rule set (highlight the rule set and click Edit...) or create a new rule set.

    Note: The rule set must be defined as a Private rule set.

  3. In the Rules tab of the Edit Rule Set page, select Create... and select Incoming events and updates to events. Click Continue.

  4. On the Select Events page, select Job Status Change as the Event Type. Select the job in which you are interested either by selecting a specific job or selecting a job by providing a pattern, for example, Backup Management.

    Add additional criteria by adding an attribute: Target Type as Database Instance.

  5. Add conditional actions: Event matches the following criteria (Severity is Informational) and E-mail Me for notifications.

  6. Review the rules summary. Make corrections as needed. Click Save.

  7. Create a database backup job and subscribe for e-mail notification when the job completes.

When the job completes, Enterprise Manager publishes the informational event for "Job Complete" state of the job. The newly created rule is considered 'matching' against the incoming job events and e-mail will be sent to the DBA.

The DBA receives the e-mail and clicks the link to access the details section in Enterprise Manager console for the event.

3.4 Working with Incidents

Data centers follow operational practices that enable them to manage events and incidents by business priority and in a collaborative manner. Enterprise Manager provides the following features to enable this management and automation:

You can update resolution information for an incident by performing the following:

  1. In the All Open Incidents view, select the incident.

  2. In the resulting Details page, click the General tab, then click Manage. The Manage dialog displays.

    incident mange dialog

    You can then adjust the priority, escalate the incident, and assign it to a specific IT operator.

Working with incidents involves the following stages:

  1. Finding What Needs to be Worked On

  2. Searching for Incidents

  3. Setting Up Custom Views

  4. Responding and Working on a Simple Incident

  5. Responding to and Managing Multiple Incidents, Events and Problems in Bulk

  6. Managing Workload Distribution of Incidents

  7. Creating an Incident Manually

3.4.1 Finding What Needs to be Worked On

Enterprise Manager provides multiple access points that allow you to find out what needs to be worked on. The primary focal point for incident management is the Incident Manager console, however Enterprise Manager also provides other methods of notification. The most common way to be notified that you have an issue that needs to be addressed is by e-mail. However, incident information can also be found in the following areas:

Custom Views (See "Setting Up Custom Views") Surrounding text describes inc_create_cust_view.gif.

Group or System Homepages (See Chapter 5, "Managing Groups") Surrounding text describes group_homepage.png.

Target Homepages Surrounding text describes inc_target.gif.

Incident Manager (in context of a system or target) Surrounding text describes inc_mgr_context.gif.

Enterprise Manager Console Surrounding text describes inc_summary.gif.

3.4.2 Searching for Incidents

You can search for incidents based on a variety of incident attributes such as the time incidents were last updated, target name, target type, or incident status.

  1. Navigate to the Incident Manager page.

    From the Enterprise menu on the Enterprise Manager home page, select Monitoring, then select Incident Manager.

  2. In the Views region located on the left, click Search.

    1. In the Search region, search for Incidents using the Type list and select Incidents.

    2. In the Criteria region, choose all the criteria that are appropriate. To add fields to the criteria, click Add Fields... and select the appropriate fields.

    3. After you have provided the appropriate criteria, click Get Results.

      Validate that the list of incidents match what you are looking for. If not, change the search criteria as needed.

    4. To view all the columns associated with this table, in the View menu, select Columns, then select Show All.

Searching for Incidents by Target Lifecycle Status

In addition to searching for incidents using high-level incident attributes, you can also perform more granular searches based on individual target lifecycle status. Briefly, lifecycle status is a target property that specifies a target's operational status. Status options for which you can search are:

  • All

  • Mission Critical

  • Production

  • Staging

  • Test

  • Development

For more discussion on lifecycle status, see Section 3.5.5, "Event Prioritization."

To search for incidents by target lifecycle status:

  1. Navigate to the Incident Manager page.

    From the Enterprise menu on the Enterprise Manager home page, select Monitoring, then select Incident Manager.

  2. In the Views region located on the left, click Search.

  3. In the Search region, click Add Fields. A pop-up menu appears showing the available lifecycle statuses.

  4. Choose on one or more of the lifecycle status options.

  5. Enter any additional search criteria.

  6. Click Get Results.

3.4.3 Setting Up Custom Views

Incident Manager also allows you to define custom views to help you gain quick access to the incidents and problems on which you need to focus. For example, you may define a view to display all critical database incidents that you own. By specifying and saving view preferences to display only those incident attributes that you are interested in Enterprise Manager will show only the list of matching incidents.

You can then search the incidents for only the ones with specific attributes, such as priority 1. The view allows easy access to pertinent incidents for daily triage. Accordingly, you can save the search criteria as a filter named "All priority 1 incidents for my targets". The view becomes available in the UI for immediate use and will be available anytime you log in to access the specific incidents. The last view you used will be the default view used on your next login.

Note:

The view you create is specific to your Enterprise Manager account and cannot currently be shared with other administrators.

Perform the following steps:

  1. Navigate to the Incident Manager page.

    From the Enterprise menu on the Enterprise Manager home page, select Monitoring, then select Incident Manager.

  2. In the Views region located on the left, click Search.

    1. In the Search region, search for Incidents using the Type list and select Incidents.

    2. In the Criteria region, choose all the criteria that are appropriate. To add fields to the criteria, click Add Fields... and select the appropriate fields.

    3. After you have provided the appropriate criteria, click Get Results.

      Validate that the list of incidents match what you are looking for. If not, change the search criteria as needed.

    4. To view all the columns associated with this table, in the View menu, select Columns, then select Show All.

      To select a subset of columns to display and also the order in which to display them, from the View menu, select Columns, then Manage Columns. A dialog displays showing a list of columns available to be added in the table.

    5. Click the Create View... button.

    6. Enter the view name.

    7. Click OK to save the view.

3.4.4 Responding and Working on a Simple Incident

The following steps take you through one possible incident management scenario.

  1. Navigate to Incident Manager.

    From the Enterprise menu on the Enterprise Manager home page, select Monitoring, then select Incident Manager.

  2. Use a view to filter the list of incidents. For example, you should use My Open Incidents and Problems view to see incidents and problems assigned to you. You can then sort the list by priority.

  3. To work on an incident, select the incident. In the General tab, click Acknowledge to indicate that you are working on this incident, and to stop receiving repeat notifications for the incident.

    In addition to the acknowledging the incident, you can perform other incident management operations such as:

    Be aware that as you are working on an individual incident, new incidents might be coming in. Update the list of incidents by clicking the Refresh icon.

  4. If the solution for the incident is unknown, use one or all of the following methods made available in the Incident page:

    • Use the Guided Resolution region and access any recommendations, diagnostic and resolution links available.

    • Check My Oracle Support Knowledge base for known solutions for the incident.

    • Study related incidents available through the Related Events and Incidents tab.

  5. Once the solution is known and can be resolved right away, resolve the incident by using tools provided by the system, if possible.

  6. In most cases, once the underlying cause has been fixed, the incident is cleared in the next evaluation cycle. However, in cases like log-based incidents, clear the incident.

Alternatively, you can work with incidents for a specific target from that target's home page. From the target menu, select Monitoring and then select Incident Manager to access incidents for that target (or group).

3.4.5 Responding to and Managing Multiple Incidents, Events and Problems in Bulk

There may be situations where you want to respond to multiple incidents in the same way. For example, you find that a cluster of incidents that are assigned to you are due to insufficient tablespace issues on several production databases. Your manager suggests that these tablespaces be transferred to a storage system being procured by another administrator. In this situation, you want to set all of the tablespace incidents to a customized resolution state "Waiting for Hardware." You also want to assign the incidents to the other administrator and add a comment to explain the scenario. In this situation, you want to update all of these incidents in bulk rather than individually.

To respond to incidents in bulk:

  1. Navigate to Incident Manager.

    From the Enterprise menu on the Enterprise Manager home page, select Monitoring, then select Incident Manager.

  2. Use a view to filter the list of incidents to the subset of incidents you want to work on. For example, you can use My Open Incidents and Problems view to see incidents and problems assigned to you. You can then sort the list by priority.

  3. Select the incidents to which you want to respond. You can select multiple incidents by holding down the Control key and selecting individual incidents or you can hold down the Shift key and select the first and last incidents to select a contiguous block of incidents.

  4. From the Action menu, choose the desired response action.

    • Acknowledge: Indicate that you have viewed the incidents. This option also stops any repeat notifications sent out for the incidents. This sets the Acknowledged flag to Yes and also makes you the owner of the incident

    • Manage: Allows you to perform a multi-action response to the incidents.

      • Acknowledge: If an incident is acknowledged, it will be implicitly assigned to the user who acknowledged it. When a user assigns an incident to himself, it is considered acknowledged. Once acknowledged, an incident cannot be unacknowledged. Acknowledgement also stops any repeat notifications for that incident

      • Assign to: Assign the incident(s) to the administrator who will take ownership of the incident.

      • Prioritization: The priority level of an incident can be set by selecting one of the out-of-the-box priority values: None, Urgent, Very High, High, Medium, Low

      • Incident Status: The resolution state for the incident can be set by selecting either Work in Progress or Resolved or to any custom status defined.

      • Escalation Level: Administrators can update incidents to set an escalation level: Level 1 through 5, in addition to the default value of None. An escalated issue can be de-escalated by setting the escalation to None. The appropriate Escalation Level depends on the IT procedures you have in place.

      • Comment: You can enter comments such as those you want to pass to the owner of the incident.

    • Suppress: Suppressing an incident stops corresponding notifications, and removes it from out-of-the-box views and default totals (such as those presented in the summary region). Suppression is typically performed when you want to defer action on the incident until a future time and in the meantime want to visually hide them from appearing in the console. Administrators can see suppressed incidents by explicitly searching for them such as performing a search on incidents where the search criteria includes the Suppressed search field

      Incidents can be suppressed until any of the following conditions are met:

      • Until the suppression is manually removed

      • Until specified date in the future

      • Until the severity state changes (incidents only)

      • Until it is closed

    • Clear: Administrators can clear incidents or problems manually. For incidents, this applies only to incidents containing incidents that can be manually cleared.

    • Add Comment: Users can add comments on incidents and events. Comments may be used for sharing information with other users or to provide tracking information on any actions being taken. Comments can be added even on closed issues.

      Note:

      The single action Acknowledge and Clear buttons are enabled for open incidents and can be used for multiple incident selection.

    If any of the above actions applies only to a subset of selected incidents (for example, if an administrator tries to acknowledge multiple incidents, of which some are already acknowledged), the action will be performed only where applicable. The administrator will be informed of the success or failure of the action. When an administrator selects any of these actions, a corresponding annotation is added to the incident for future reference.

  5. Click OK. Enterprise Manager displays a process summary and confirmation dialogs.

  6. Continue working with the incidents as required.

3.4.6 Searching My Oracle Support Knowledge

To access My Oracle Support Knowledge base entries from within Incident Manager, perform the following steps:

  1. Navigate to Incident Manager.

    From the Enterprise menu on the Enterprise Manager home page, select Monitoring, then select Incident Manager.

  2. Select one of the standard views. Choose the appropriate incident or problem in the View table.

  3. In the resulting details region, click My Oracle Support Knowledge.

    If your My Oracle Support (MOS) login credentials have been saved as MOS Preferred Credentials, you do not need to log in manually. If not, you will need to sign in to My Oracle Support. To save your MOS login information as Preferred Credentials.

    Setting MOS Preferred Credentials: From the Setup menu, select Security and then Preferred Credentials. From the My Oracle Support Preferred Credentials region, click Set MOS Credentials.

  4. On the My Oracle Support page, click the Knowledge tab to browse the knowledge base.

    From this page, in addition to accessing formal Oracle documentation, you can also change the search string in to look for additional knowledge base entries.

3.4.7 Open Service Request (Problems-only)

There are times when you may need assistance from Oracle Support to resolve a problem. This procedure is not relevant for incidents or events.

To submit a service request (SR), perform the following steps:

  1. Navigate to Incident Manager.

    From the Enterprise menu on the Enterprise Manager home page, select Monitoring, then select Incident Manager.

  2. Use one of the views to find the problem or search for it or use one of your custom views. Select the appropriate problem from table.

  3. Click on the Support Workbench: Package Diagnostic link.

  4. Complete the workflow for opening an SR. Upon completing the workflow, a draft SR will have been created.

  5. Sign in to My Oracle Support if you are not already signed in.

  6. On the My Oracle Support page, click the Service Requests tab.

  7. Click Create SR button.

3.4.8 Suppressing Incidents and Problems

There are times when it is convenient to hide an incident or problem from the list in the All Open Incidents page or the All Open Problems page. For example, you need to defer work on the incident until a future date (for example, until maintenance window). In order to avoid having it appear in the UI, you want to temporarily hide or suppress the incident until a future date. In order to find a suppressed incident, you must explicitly search for the incident using either the Show all or the Only show suppressed search option. In order to unhide a suppressed incident or problem, it must be manually unsuppressed.

To suppress an incident or problem:

  1. Navigate to Incident Manager.

    From the Enterprise menu on the Enterprise Manager home page, select Monitoring, then select Incident Manager.

  2. Select either the All Open Incidents view or the All Open Problems view. Choose the appropriate incident or problem. Click the General tab.

  3. In the resulting details region, click More, then select Suppress.

  4. On the resulting Suppress pop-up, choose the appropriate suppression type. Add a comment if desired.

  5. Click OK.

3.4.9 Managing Workload Distribution of Incidents

Incident Manager enables you to manage incidents and problems to be addressed by your team.

Perform the following tasks:

  1. Navigate to Incident Manager.

    From the Enterprise menu on the Enterprise Manager home page, select Monitoring, then select Incident Manager.

  2. Use the standard or custom views to identify the incidents for which your team is responsible. You may want to focus on unassigned and unacknowledged incidents and problems.

  3. Review the list of incidents. This includes: determining person assigned to the incident, checking its status, progress made, and actions taken by the incident owner.

  4. Add comments, change priority, reassign the incident as needed by clicking on the Manage button in the Incident Details region.

Example Scenario

The DBA manager uses Incident Manager to view all the incidents owned by his team. He ensures all of them are correctly assigned; if not, he reassigns and prioritizes them appropriately. He monitors the escalated events for their status and progress, adds comments as needed for the owner of the incident. In the console, he can view how long each of the incidents has been open. He also reviews the list of unassigned incidents and assigns them appropriately.

3.4.10 Reviewing Events on a Periodic Basis

Oracle recommends managing via incidents in order to focus on important events or groups of related events. Due to the variety and sheer number of events that can be generated, it is possible that not all important events will be covered by incidents. To help you find these important yet untreated events, Enterprise Manager provides the Events without incidents standard view.

Perform the following steps:

  1. From the Enterprise menu, select Monitoring, then select Incident Manager.

  2. In the Views region, click Events without incidents.

  3. Select the desired event in the table. The event details display.

  4. In the details area, choose More and then either Create Incident or Add Event to Incident.

Example Scenario

During the initial phase of Enterprise Manager uptake, every day the DBA manager reviews the events for the databases his team is responsible for and filters them to view only the ones which are not tracked by ticket or incident. He browses such events to ensure that none of them requires incidents to track the issue. If he feels that one such event requires an incident to track the issue, he creates an incident directly for this event.

3.4.10.1 Creating an Incident Manually

If an event of interest occurs that is not covered by any rule and you want to convert that event to an incident, perform the following:

  1. Using an available view, find the event of interest.

  2. Select the event in the table.

  3. From the More... drop-down menu, choose Create Incident...

  4. Enter the incident details and click OK.

  5. Should you decide to work on the incident, set yourself as owner of the incident and update status to Work in Progress.

Example Scenario

As per the operations policy, the DBA manager has setup rules to create incidents for all critical issues for his databases. The remainder of the issues are triaged at the event level by one of the DBAs.

One of the DBA receives e-mail for an "SQL Response" event (not associated with an incident) on the production database. He accesses the details of the event by clicking on the link in the e-mail. He reviews the details of the event. This is an issue that needs to be tracked and resolved, so he opens an incident to track the resolution of the issue. He marks the status of the incident as "Work in progress".

3.5 Advanced Topics

The following sections discuss incident/event management features relating advanced applications or operational areas.

3.5.1 Defining Custom Incident Statuses

As discussed in "Working with Incidents", one of the primary incident workflow attributes is status. For most conditions, these predefined status attributes will suffice. However, the uniqueness of your monitoring and management environment may require an incident workflow requiring specialized incident states. To address this need, you can define custom states using the modify_resolution_state EM CLI verb.

emcli modify_resolution_state
        -label="old label of the state to be changed"
        -new_label="New label for display"
        -position="New display position"
        [-applies_to=BOTH]
[ ] indicates that the parameter is optional

This verb modifies an existing resolution state that describes the state of incidents or problems. Only Super Administrators can execute this command. You need to specify the updated label as well as the updated position. The position can be between 2 and 98, and cannot be in use by another resolution state.

You can optionally indicate that the state should apply to both incidents and problems. A success message is reported if the command is successful. An error message is reported if the change fails.

3.5.2 Clearing Stateless Alerts for Metric Alert Event Types

For metric alert event types, an event (metric alert) is raised based on the metric threshold values. These metric alert events are called stateful alerts. For those metric alert events that are not tied to the state of a monitored system (for example, snapshot too old, or resumable session suspended ), these alerts are called stateless alerts. Because stateless alerts are not cleared automatically, they need to be cleared manually. You can perform a bulk purge of stateless alerts using the clear_stateless_alerts EM CLI verb.

Note:

For large numbers of incidents, you can manually clear incidents in bulk. See "Responding to and Managing Multiple Incidents, Events and Problems in Bulk".

clear_stateless_alerts clears the stateless alerts associated with the specified target. The clearing must be manually performed as the Management Agent does not automatically clear stateless alerts. To find the metric internal name associated with a stateless alert, use the EM CLI get_metrics_for_stateless_alerts verb.

Format

emcli clear_stateless_alerts -older_than=number_in_days -target_type=target_type -target_name=target_name [-include_members][-metric_internal_name=target_type_metric:metric_name:metric_column] [-unacknowledged_only][-ignore_notifications] [-preview][ ] indicates that the parameter is optional

Options

  • older_than

    Specify the age of the alert in days. (Specify 0 for currently open stateless alerts.)

  • target_type

    Internal target type identifier, such as host, oracle_database, and emrep.

  • target_name

    Name of the target.

  • include_members

    Applicable for composite targets to examine alerts belonging to members as well.

  • metric_internal_name

    Metric to be cleaned up. Use the get_metrics_for_stateless_alerts verb to see a complete list of supported metrics for a given target type.

  • unacknowledged_only

    Only clear alerts if they are not acknowledged.

  • ignore_notifications

    Use this option if you do not want to send notifications for the cleared alerts. This may reduce the notification sub-system load.

  • ignore_notifications

    Use this option if you do not want to send notifications for the cleared alerts. This may reduce the notification sub-system load.

  • preview

    Shows the number of alerts to be cleared on the target(s).

Example

The following example clears alerts generated from the database alert log over a week old. In this example, no notifications are sent when the alerts are cleared.

emcli clear_stateless_alerts -older_than=7 -target_type=oracle_database -tar  get_name=database -metric_internal_name=oracle_database:alertLog:genericErrStack -ignore_notifications

3.5.3 User-reported Events

Users may create (publish) events manually using the EM CLI verb publsh_event. A User-reported event is published as an event of the "User-reported event" class. Only users with Manage Target privilege can publish these events for a target. An error message is reported if the publish fails.

After an event is published with a severity other than CLEAR (see below), end-users with appropriate privileges can manually clear the event from the UI, or they can publish a new event using a severity level of CLEAR and the same details to report clearing of the underlying situation.

3.5.3.1 Format

emcli publish_event
        -target_name="Target name"
        -target_type="Target type internal name"
        -message="Message for the event"
        -severity="Severity level"
        -name="event name"
        [-key="sub component name"
         -context="name1=value1;name2=value2;.."
         -separator=context="alt. pair separator"
         -subseparator=context="alt. name-value separator"]

[ ] indicates that the parameter is optional

3.5.3.2 Options

  • target_name

    Target name.

  • target_type

    Target type name.

  • message

    Message to associate for the event. The message cannot exceed 4000 characters.

  • severity

    Numeric severity level to associate for the event. The supported values for severity level are as follows:

    "CLEAR"
    "MINOR_WARNING"
    "WARNING"
    "CRITICAL"
    "FATAL"
  • name

    Name of the event to publish. The event name cannot exceed 128 characters.

    This is indicative of the nature of the event. Examples include "Disk Used Percentage," "Process Down," "Number of Queues," and so on. The name must be repeated and identical when reporting different severities for the same sequence of events. This should not have any identifying information about a specific event; for example, "Process xyz is down." To identify any specific components within a target that the event is about, see the key option below.

  • key

    Name of the sub-component within a target this event is related to. Examples include a disk name on a host, name of a tablespace, and so forth. The key cannot exceed 256 characters.

  • context

    Additional context that can be published for a given event. This is a series of strings of format name:value separated by a semi-colon. For example, it might be useful to report the percentage size of a disk when reporting space issues on the disk. You can override the default separator ":" by using the sub-separator option, and the pair separator ";" by using the separator option.

    The context names cannot exceed 256 characters, and the values cannot exceed 4000 characters.

  • separator

    Set to override the default ";" separator. You typically use this option when the name or the value contains ";". Using "=" is not supported for this option.

  • subseparator

    Set to override the default ":" separator between the name-value pairs. You typically use this option when the name or value contains ":". Using "=" is not supported for this option.

3.5.3.3 Examples

Example 1

The following example publishes a warning event for "my acme target" indicating that a HDD restore failed, and the failure related to a component called the "Finance DB machine" on this target.

emcli publish_event  -target_name="my acme target" -target_type="oracle_acme" 
-name="HDD restore failed" -key="Finance DB machine" -message="HDD restoration
failed due to corrupt disk" -severity=WARNING

Example 2

The following example publishes a minor warning event for "my acme target" indicating that a HDD restore failed, and the failure related to a component called the "Finance DB machine" on this target. It specifies additional context indicating the related disk size and name using the default separators. Note the escaping of the \ in the disk name using an additional "\".

emcli publish_event  -target_name="my acme target" -target_type="oracle_acme" 
-name="HDD restore failed" -key="Finance DB machine" -message="HDD restoration
failed due to corrupt disk" -severity=MINOR_WARNING -context="disk size":800GB\;"disk name":\\uddo0111245

Example 3

The following example publishes a critical event for "my acme target" indicating that a HDD restore failed, and the failure related to a component called the "Finance DB machine" on this target. It specifies additional context indicating the related disk size and name. It uses alternate separators, because the name of the disk includes the ":" default separator.

emcli publish_event  -target_name="my acme target" -target_type="oracle_acme" 
-name="HDD restore failed" -key="Finance DB machine" -message="HDD restoration
failed due to corrupt disk" -severity=CRITICAL -context="disk size"^800GB\;"disk name"^\\sdd1245:2 -subseparator=context=^

3.5.4 Additional Rule Applications

Rules can be set up to perform more complicated tasks beyond straightforward notifications. The following tasks illustrate additional rule capabilities.

3.5.4.1 Setting Up a Rule to Send Different Notifications for Different Severity States of an Event

Before you perform this task, ensure the DBA has set appropriate thresholds for the metric so that a critical metric alert is generated as expected.

Consider the following example:

The Administration Manager sets up a rule to page the specific DBA when a critical metric alert event occurs for a database in a production database group and to e-mail the DBA when a warning metric alert event occurs for the same targets. This task occurs when a new group of databases is deployed and DBAs request to create appropriate rules to manage such databases.

Perform the following tasks to set appropriate thresholds:

  1. From the Setup menu, select Incidents, then select Incident Rules.

  2. On the Incident Rules - All Enterprise Rules page, highlight a rule set and click Edit.... (Rules are created in the context of a rule set. If there is no existing rule set to manage the newly added target, create a rule set.)

  3. In the Edit Rule Set page, locate the Rules section. Click Create...

  4. From the Select Type of Rule to Create dialog, choose Incoming events and updates to events. Click Continue.

  5. Provide the rule details as follows:

    1. For Type, select Metric Alerts as the Type.

    2. In the criteria section, select Severity. From the drop-down list, check and Critical and Warning as the selected values. Click Next.

    3. On the Add Actions page, click +Add.

      In the Create Incident section, check the Create Incident option. Click Continue. The Add Action page displays with the new rule. Click Next.

    4. Specify a name for the rule and a description. Click Next.

    5. On the Review page, ensure your settings are correct and click Continue. A message appears informing you that the rule has been successfully created. Click OK to dismiss the message.

      Next, you need to create a rule to perform the notification actions.

  6. From the Rules section on the Edit Rules page, click Create.

  7. Select Newly created incidents or updates to incidents as the rule type and click Continue.

  8. Check Specific Incidents.

  9. Check Severity and from the drop-down option selector, check Critical and Warning. Click Next.

  10. On the Add Actions page, click Add. The Conditional Actions page displays.

  11. In the Conditions for actions section, choose Only execute the actions if specified conditions match.

  12. From the Incident matches the following criteria list, choose Severity and then Critical from the drop-down option selector.

  13. In the Notifications section, enter the DBA in the Page field. Click Continue. The Add Actions page displays.

  14. Click Add to create a new action for the Warning severity.

  15. In the Conditions for actions section, choose Only execute the actions if specified conditions match.

  16. From the Incident matches the following criteria list, choose Severity and then Warning from the drop-down option selector.

  17. In the Notifications section, enter the DBA in the E-mail to field. Click Continue. The Add Actions page displays with the two conditional actions. Click Next.

  18. Specify a rule name and description. Click Next.

  19. On the Review page, ensure your rules have been defined correctly and click Continue. The Edit Rule Set page displays.

  20. Click Save to save your newly defined rules.

3.5.4.2 Creating a Rule to Notify Different Administrators Based on the Event Type

As per operations policy for production databases, the incidents that relate to application issues should go to the application DBAs and the incidents that relate to system parameters should go to the system DBAs. Accordingly, the respective incidents will be assigned to the appropriate DBAs and they should be notified by way of e-mail.

Before you set up rules, ensure the following prerequisites are met:

  • DBA has setup appropriate thresholds for the metric so that critical metric alert is generated as expected.

  • Rule has been setup to create incident for all such events.

  • Respective notification setup is complete, for example, global SMTP gateway, e-mail address, and schedule for individual DBAs.

Perform the following steps:

  1. Navigate to the Incident Rules page.

    From the Setup menu, select Incidents, then select Incident Rules.

  2. Search the list of enterprise rules matching the events from the production database.

  3. On the Incident Rules - All Enterprise Rules page, highlight a rule set and click Edit....

    Rules are created in the context of a rule set. If there is no existing rule set, create a rule set.

  4. From the Edit Rule Set page (Rules tab), select the rule which creates the incidents for the metric alert events for the database. Click Edit

  5. From the Select Events page, click Next.

  6. From the Add Actions page, click +Add. The Add Conditional Actions page displays.

  7. In the Notifications area, enter the e-mail address of the DBA you want to be notified for this specific event type and click Continue to add the action. Enterprise Manager returns you to the Add Actions page. Click Next.

  8. On the Specify Name and Description page, enter an intuitive rule name and a brief description.

  9. Click Next.

  10. On the Review page, review the Applies to, Actions and General information for correctness .

  11. Click Continue to create the rule.

  12. Create/Edit additional rules to handle alternate additional administrator notifications according to event type.

  13. Review the rules summary and make corrections as needed. Click Save to save your rule set changes.

3.5.4.3 Creating a Rule to Create a Ticket for Incidents

If your IT process requires a helpdesk ticket be created to resolve incidents, then you can use the helpdesk connector to associate the incident with a helpdesk ticket and have Enterprise Manager automatically open a ticket when the incident is created. Communication between Incident Manager and your helpdesk system is bidirectional, thus allowing you to check the changing status of the ticket from within Incident Manager. Enterprise Manager also allows you to link out to a Web-based third-part console directly from the ticket so that you can launch the console in context directly from the ticket.

For example, according to the operations policy of an organization, all critical incidents from a production database should be tracked by way of Remedy tickets. A rule is set up to create a Remedy ticket when a critical incident occurs for the database. When such an incident occurs, the ticket is generated by the rule, the incident is associated with the ticket, and the operation is logged for future reference to the updates of the incident. While viewing the details of the incident, the DBA can view the ticket ID and, using the attached URL link, access the Remedy to get the details about the ticket.

Before you perform this task, ensure the following prerequisites are met:

  • Monitoring support has been set up.

  • Remedy ticketing connector has been configured.

Perform the following steps:

  1. From the Setup menu, select Incidents, then select Incident Rules.

  2. On the Incident Rules - All Enterprise Rules page, select the appropriate rule set and click Edit.... (Rules are created in the context of a rule set. If there is no applicable rule set , create a new rule set.)

  3. Select the appropriate rule that covers the incident conditions for which tickets should be generated and click Edit...

  4. Click Next to proceed to the Add Actions page.

  5. Click +Add to access the Add Conditional Actions page.

    1. Specify that a ticket should be generated for incidents covered by the rule.

    2. Specify the ticket template to be used.

  6. Click Continue to return to the Add actions page.

  7. On the Add Actions page, click Next.

  8. On the Review page, click Continue.

  9. On the Specify Name and Description page, click Next.

  10. On the Review page, click Continue. A message displays indicating that the rule has been successfully modified. Click OK to close the message.

  11. Repeat steps 3 through 10 until all appropriate rules have been edited.

  12. Click Save to save your changes to the rule set.

3.5.4.4 Creating a Rule to Send SNMP Traps to Third Party Systems

As mentioned in Chapter 4, "Using Notifications," Enterprise Manager supports integration with third-party management tools through the SNMP. Sending SNMP traps to third party systems is a two-step process:

Step 1: Create an advanced notification method based on an SNMP trap.

Step 2: Create an incident rule that invokes the SNMP trap notification method.

The following procedure assumes you have already created the SNMP trap notification method. For instruction on creating a notification method based on an SNMP trap, see "Sending SNMP Traps to Third Party Systems".

  1. From the Setup menu, select Incidents, then select Incident Rules.

  2. On the Incident Rules - All Enterprise Rules page, click Create Rule Set...

  3. Enter the rule set Name, a brief Description, and select the type of source object the rule Applies to (Targets).

  4. Click on the Rules tab and then click Create...

  5. On the Select Type of Rule to Create dialog, select Incoming events and updates to events and then click Continue.

  6. On the Create New Rule : Select Events page, specify the criteria for the events for which you want to send SNMP traps and then click Next.

    Note:

    You must create one rule per event type. For example, if you want to send SNMP traps for Target Availability events and Metric Alert events, you must specify two rules.
  7. On the Create New Rule : Add Actions page, click Add. The Add Conditional Actions page displays.

  8. In the Notifications section, under Advanced Notifications, select an existing SNMP trap notification method as shown in the following graphic.

    snmp incident rule

    For information on creating SNMP trap notification methods, see "Sending SNMP Traps to Third Party Systems".

  9. Click Continue to return to the Create New Rule : Add Actions page.

  10. Click Next to go to the Create New Rule : Specify Name and Description page.

  11. Specify a rule name and a concise description and then click Next.

  12. Review the rule definition and then click Continue add the rule to the rule set. A message displays indicating the rule has been added to the rule set but has not yet been saved. Click OK to close the message.

  13. Click Save to save the rule set. A confirmation is displayed. Click OK to close the message.

3.5.5 Event Prioritization

When working in a large enterprise it is conceivable that when systems are under heavy load, a large number of incidents and events may be generated. All of these need to be processed in a timely and efficient manner in accordance with your business priorities. An effective prioritization scheme is needed to determine which events/incidents should be resolved first.

In order to determine which event/incidents are high priority, Enterprise Manager uses a prioritization protocol based on two incident/event attributes: Lifecycle Status of the target and the Incident/Event Type. Lifecycle Status is a target property that specifies a target's operational status. You can set/view a target's Lifecycle Status from the UI (from a target's Target Setup menu, select Properties). You can set target Lifecycle Status properties across multiple targets simultaneously by using the Enterprise Manager Command Line Interface (EM CLI) set_target_property_value verb.

A target's Lifecycle Status is set when it is added to Enterprise Manager for monitoring. At that time, you determine where in the prioritization hierarchy that target belongs—the highest level being "mission critical" and the lowest being "development."

Target Lifecycle Status

  • Mission Critical (highest priority)

  • Production

  • Stage

  • Test

  • Development (lowest priority)

Incident/Event Type

  • Availability events (highest priority)

  • Non-informational events.

  • Informational events

3.6 Moving from Enterprise Manager 10/11g to 12c

Enterprise Manager 12c incident management functionality leverages your existing pre-12c monitoring setup out-of-box. Migration is seamless and transparent. For example, if your Enterprise Manager 10/11g monitoring system sends you e-mails based on specific monitoring conditions, you will continue to receive those e-mails without interruption. To take advantage of 12c features, however, you may need to perform additional migration tasks.

Important:

Alerts that were generated pre-12c will still be available. For example, critical metric alerts will be available as critical incidents.

Rules

When you migrate to Enterprise Manger 12c, all of your existing notification rules are automatically converted to rules. Technically, they are converted to event rules first with incidents automatically being created for each event rule.

In general, event rules allow you to define which events should become incidents. However, they also allow you to take advantage of the Enterprise Manager's increased monitoring flexibility.

For more information on rule migration, see the following documents:

Privilege Requirements

The Create Enterprise Rule Set resource privilege is now required in order to edit/create enterprise rule sets and rules contained within. The exception to this is migrated notification rules. When pre-12c notification rules are migrated to event rules, the original notification rule owners will still be able to edit their own rules without having been granted the Create Enterprise Rule Set resource privilege. However, they must be granted the Create Enterprise Rule Set resource privilege if they wish to create new rules. Enterprise Manager Super Administrators, by default, can edit and create rule sets.