Support and Incident Management

Incident management is the end-to-end business process of identifying, analyzing, and resolving an outage or service disruption. The goal of incident management is to keep services running or restore them as quickly as possible, while minimizing the impact to the business.

Incident Management Is Important

Service interruption incidents can be extremely costly to your business and its teams. Incidents can disrupt operations, lead to temporary downtime, and contribute to the loss of data and productivity. Incident management provides teams with a reliable method to prioritize incidents, get to resolution faster, and offer better service for users.

Benefits of Incident Management

Some of the benefits of incident management include the following:

  • Increased productivity and efficiency.
  • Increased visibility and transparency.
  • Improved mean time to resolution (MTTR). MTTR is a combination of the average time to detect, diagnose, and mitigate incidents.
  • Improved customer and employee experience.
  • Prevention of incidents.

Oracle Cloud Infrastructure Support

When using Oracle Cloud Infrastructure, sometimes you need to get help from the community or talk to someone in Oracle support. For information about support options, see Getting Help and Contacting Support.

Recommendations

Design a support and incident management strategy to support your environment and minimize service disruptions.

Proactively define your support and incident management strategy wherever possible, but learn from experience and adjust your practices as needed.

Put controls in place to prepare and respond to incidents. Recommendations include:

  • Use a system to determine risks, threats, vulnerabilities, and impacts related to security
  • Use a security information and event management (SIEM) system
  • Set up a security operations center (SOC)
  • Set up an incident response team
  • Implement incident detection, response, and reporting
  • Define escalation paths
  • Build a standard post-mortem mechanism

Develop an operations strategy to detect, prevent, respond to, and recover from events. Recommendations include:

  • Monitor system performance metrics
  • Document and test a disaster recovery plan
  • Understand key roles needed for disaster recovery coordination
  • Plan for interactions with Oracle Cloud Infrastructure support
  • Respond to incidents
  • Simulate attacks based on real incidents
  • Prepare for application failure
  • Recover from data corruption
  • Recover from network outage
  • Recover from a dependent service failure
  • Recover from a region-wide service disruption
  • Learn from disaster recovery tests, and improve processes
  • Expect failure and learn from mistakes

We recommend that you formalize a support contract with Oracle or an approved partner to help keep your organization's systems running at peak performance. Leverage these partnerships when critical events are scheduled, such as migrations or expected increases in demand. Doing so ensures that you can benefit from the right support, best practices, and expertise. It can also ensure a feedback mechanism directly with Oracle engineering for continuous improvement of the platform.