6 Fault Manager

Fault Management system allows you to manage all events, notifications, and alerts generated by either Networks Function (NF) or Oracle® Session Delivery Management Cloud (Oracle SDM Cloud) components. For the Session Border Controller (SBC), Enterprise Session Border Controller (E-SBC), and Oracle Communications Session Monitor (OCSM), the Events and Alarm information is based on the Oracle® standard and proprietary Management Information Bases (MIBs). All SNMP traps generated from these NFs are managed by Oracle SDM Cloud. The NFs send their traps to the Management Cloud Engine (MCE) located on the customer premises which acts as the trap receiver. The MCE then converts the SNMP trap into a REST payload which it sends to the cloud Oracle SDM Cloud SaaS offering. For more information on configuring traps, refer to the appropriate configuration user guide for that product.
The Fault Manager provides views for events and alarms.
  • Events view—Provides a historical view of all events generated by either managed NFs or by Oracle SDM Cloud components. This allows you to track the time and state of when NF traps of Oracle SDM Cloud alerts entered the system and how, for a specific failed resource, the associated events transitions to different states on a row per row basis.
  • Alarm view—Provides a summary view of the latest state of an alarm for a specific failed resource. This table provides only one row for each unique failed resource and updates this row with the latest information as new events for the same failed resource are identified.
For example, consider that "SBC-1" sends a apSysMgmtFanTrap trap which crosses the following states:
  • Fan speed Trap Minor alert: fan speed is more than minor alarm threshold, but less than major alarm threshold.
  • Fan speed Trap Major alert: fan speed is more than major alarm threshold, but less than critical alarm threshold.
  • Fan speed Trap Critical alert: the environment is very bad, such as Fan speed is more than critical threshold.
The Events view displays 3 events in the table for the failed resource Fan for device "SBC-1".
Event 1 Fan speed Minor alert event at Time.0.
Event 2 Fan speed Major alert event at Time.1.  
Event 3 Fan speed Critical alert event at Time.2.
The Alarms view displays only 1 row for the failed resource, displaying only the most current state.
Event 1 Fan speed Critical alarm at Time.2.
The following pre-requisites are required for receiving fault notifications:
  • You must use the sudo password (the password of the NNCentral user account on the server operating system) for the port on which TrapRelay listens. This port is configured during Media Cloud Engine (MCE) installation. For more information, see the Getting Started guide.

    Note:

    If you use port 1024 for the TrapRelay function, root permission is not required.
  • Ensure that SNMP communities and the MIB administrator contact name is configured on your southbound system(s).

  • A trap receiver for each MCE node in a cluster must be configured on each southbound device. Also, the SNMP community defined in the trap receiver must be the same for all MCE cluster nodes.

Alarm and Event Configuration Tasks

The following sections describe the Events table and Alarms table, with their accompanying features. The Events table shows a one to one correspondence with all device traps and generated server events. The Events table maintains the precise history of all events created and recorded. The Alarms table summarizes the Events table by showing the most recent update for the specific categories, failed resources, state and devices in each row.

Note:

Users can view only the alarms and events for the devices to which they have access, however, events and alarms generated by Oracle® Session Delivery Management Cloud (Oracle SDM Cloud) itself are accessible to all users.

Manage How Events are Displayed

  1. Expand the Fault Manager slider and select Events.
  2. Click the More Actions icon and select Set Columns to choose which columns are displayed in the Events table. The following table describes all of the columns available to view.
  3. In the events pane, select an event that you want to view, click the More Actions icon and select View.
  4. In the Event detail dialog box, view the following fields for this specific event:
    • Time Created
    • Description
    • Severity
    • Default Severity
    • Source
    • Source IP
    • Failed Resource
    • Category
    • Trap Name
    • System Up Time
    • Type

Manage How Alarms are Displayed

  1. Expand the Fault Manager slider and select Alarms.
  2. Click the More Actions icon and select Set Columns to choose which columns are displayed in the Alarms table. The following table describes all of the columns available to view.
  3. In the alarms pane, select an alarm that you want to view, click the More Actions icon and select View.
  4. In the Alarm detail dialog box, view the following fields for this specific alarm:
    • Annotation
    • Acknowledged by
    • Time
    • Description
    • Severity
    • Source
    • Source IP
    • Failed Resource
    • Category
    • System Up Time
    • Trap Name
    • Type

Manage the Page View for Events

  1. Expand the Fault Manager slider and select Events.
  2. In the Events pane, you can select from the following actions:

Oracle SDM Cloud Alarm Auto Refresh

  1. Expand the Fault Manager slider and select Alarms.
  2. In the alarms pane, you can select from the following actions:

Search for Alarms or Events by Specifying a Criteria

You can search for events and alarms by specifying one, some, or all of the search selection criteria. For example, you can select alarms for a specific IP address during a specified date-time range.

  1. Expand the Fault Manager slider and select from the following options:
    • Events
    • Alarms
  2. In the alarms or events pane, click Search.
  3. In the Search dialog box, complete the following fields:

Save Alarms or Event Data to a File

You can save event or alarm data in the content area to a comma-separated values (CSV) file that stores table data (numbers and text) in plain-text form.

  1. Expand the Fault Manager slider and select from the following options:
    • Events
    • Alarms
  2. Click Save to file.

    Note:

    The files are saved to your browser's default download location. Only the first 1000 entries can be saved to file.

Delete Alarms or Events

The appropriate administrator privileges must be assigned to delete alarms or events.

Note:

Deleting an alarm in Oracle® Session Delivery Management Cloud (Oracle SDM Cloud) has no affect on the node because the node is unaware that Oracle SDM Cloud displayed the alarm or deleted it from the alarms table.
  1. Expand the Fault Manager slider and select from the following options:
    • Events
    • Alarms
  2. In the alarms or events table, click the alarm or event that you want to remove, click the More Actions icon, and select Delete.
  3. In the Delete dialog box, click Yes to confirm the deletion of the alarm or event.

Specify a Criteria to Delete Alarms and Events

The appropriate administrator privileges must be assigned to delete alarms or events.

Use this task to specify one or more criterion for deleting alarms or events from Oracle® Session Delivery Management Cloud (Oracle SDM Cloud).
  1. Expand the Fault Manager slider and select from the following options:
    • Events
    • Alarms
  2. In the events or alarms pane, click the More Actions icon, and select Delete by criteria.
  3. In the Search dialog box, complete the following fields:

    Note:

    When there is a high number of faults that are being sent from devices, a purge interval of 2 days for events and 7 days for alarms is suggested.
  4. Click OK.

Alarm Specific Configuration Tasks

Alarms play a significant role in determining the overall health of the system. An alarm is triggered when a condition or event happens within the hardware or software of a system (node). Alarms contain an alarm code, a severity level, a textual description of the event, and the time the event occurred. The following sections describe how to configure the way alarms display in Oracle® Session Delivery Management Cloud.

Add an Annotation to an Alarm

  1. Expand the Fault Manager slider and select Alarms.
  2. In the alarms table, click the alarm to which you want to add explanatory note, click the More Actions icon, and select Edit.
  3. In the Edit annotation dialog box, add your explanatory note about this alarm in the Annotation field.
  4. Click OK.

Enable Alarm Acknowledgment

The appropriate administrator privileges must be assigned to acknowledge alarms.

  1. Expand the Fault Manager slider and select Alarms.
  2. In the alarms table, select the alarm that you want to acknowledge and click Acknowledge.
  3. In the Acknowledge dialog box, click Yes.
  4. In the Info dialog box, click OK.
  5. Click the alarm to view an updated Alarm detail dialog box with the Acknowledged by and Last modified fields updated.
  6. Click OK.

Disable Alarm Acknowledgment

The appropriate administrator privileges must be assigned to unacknowledge alarms.

  1. Expand the Fault Manager slider and select Alarms.
  2. In the alarms table, select the alarm that you want to unacknowledge and click Unacknowledge. The Acknowledge dialog box appears.
  3. In the Unacknowledge dialog box, click Yes.
  4. In the Info dialog box, click OK.

Clear an Alarm

The appropriate administrator privileges must be assigned to clear alarms.

Note:

Clearing an alarm in Oracle® Session Delivery Management Cloud (Oracle SDM Cloud)has no affect on the node because the node is unaware that Oracle SDM Cloud displayed the alarm or changed its severity to clear.
  1. Expand the Fault Manager slider and select Alarms.
  2. In the alarms table, select the alarm that you want to clear, click the More Actions icon, and select Clear.
  3. In the Clear dialog box, click Yes.
  4. In the Info dialog box, click OK.

Customize Trap Severity Levels

  1. Expand the Fault Manager slider and select Trap event setting.
  2. In the Trap Event Setting page, select the alarm trap groups you are customizing from the Trap Groups table:

    Note:

    The Oracle SDM Cloud determines the trap groups that you can access.
  3. Select a trap from the Trap OIDs table.
  4. In the Severity Mapping table, select a severity cell from the Current severity column for a trap condition row that you want to modify.
  5. In the drop-down list of severity levels that appears, click the severity level that you want to apply.

    Note:

    The Default severity column serves as a reference point and continues to show the default severity setting for the trap condition.
    The new level appears in the Current Severity column for the trap condition.
  6. Click Apply.
  7. In the success dialog box, click OK.

Customize Product Plugin Event Traps

The trap event setting allows you to override the default severities and customize them. Traps groups are provided for each product plugin that is installed in Oracle SDM Cloud. When you select a trap group the product plugin, SNMP trap (OID) list is provided. For more information on product-specific traps, refer to the appropriate MIB Reference Guide.

See your element manager product plugin documentation for the list of SNMP event traps and their definitions.

  1. Expand the Fault Manager slider and select Trap event setting.
  2. Select a trap group row from the Trap groups table and click OK.

Customize Session Delivery Manager Event Traps

The trap event setting allows you to override the core Oracle SDM Cloud default event trap severities and customize them.

  1. Expand the Fault Manager slider and select Trap event setting.
  2. In the Select dialog box, select the SDM trap group row from the Trap groups table and click OK.
  3. The following table describes the Oracle SDM Cloud product event types and a description that references its respective trap.
    Trap Description
    apEMSNodeUnreachable The trap is generated when the status of a node changes from reachable to unreachable. The trap contains the node ID of the device and the time of the event.
    apEMSNodeUnreachableClear The trap is generated when the status of a node changes from unreachable to reachable. The trap contains the node ID of the device and the time of the event.
    MCERegistration The trap is generated when, upon startup, the MCE successfully registers with Oracle SDM Cloud automatically.
    MCERegistrationClear The trap is generated when, after an unsuccessful registration attempt, the MCE is able to register successfully.
    UMSServiceStarted When the Oracle SDM Cloud starts and initializes each of its sub-services, this trap displays the status of each sub-service.
    ONSThrottling When email notifications are temporarily suspended, as the OCI Notification Service is unable to deliver messages.
    ONSThrottlingClear The trap generated when the OCI Notification Service is able to deliver messages again.
    spUmsDBusage When the DB usage crosses the threshold value:
    • WARNING - Total DB utilization (default + optional) across > 80% and < 85%
    • MAJOR - Total DB utilization (default + optional) across >= 85% and < 90%
    • CRITICAL - Total DB utilization (default + optional) across >= 90%
    apUmsDBUsageClear When DB usage returns to < 80%.
    InvalidMCERegistration When an MCE tries to connect to a site using invalid registration ID or when an MCE tries to connect to a site which already has a MCE connection.