Detecting Anomalies in Large Datasets

Create a job that detects anomalies using asynchronous detection.

You can use asynchronous detection to detect anomalies in both univariate and multivariate detection datasets. Typical use cases suited for asynchronous detecting are:

Detecting anomalies in very large datasets

The maximum number of data points supported by the detectAnomalies REST Synchronous API is 30,000. This might impose restrictions in anomaly detection scenarios in which a large number of data points (typically in the millions) needs to be detected. Using asynchronous detection, you can analyze and detect anomalies in very large datasets upwards of 10 million data points.

Automating detection workflows

In IoT use cases, time-series data is usually collected from a large number of sensors and devices, and it’s stored in a persistent data store such as a database or a file system. Often, this raw data must be preprocessed (enriched) by using PaaS services such as Data Flow before inferencing can be performed. You can easily integrate the asynchronous detection APIs within data processing pipelines, and automate detection workflows.

Postprocessing anomalous events

In certain anomaly detection scenarios, the detection data (detected anomalies) might need to be transformed or enriched before it can be consumed by downstream applications. With asynchronous detection, detected anomalies are saved in an Object Storage bucket. You can use PaaS services such as Data Flow to analyze, process, and enrich the anomalous events. Furthermore, you can consume and render the anomalies in visualization graphs in Oracle Analytics Cloud to enable you to monitor target systems and take corrective actions.

  • Prerequisites:

    You must have a project that contains a trained model for use in an asynchronous Anomaly Detection job.

    1. Open the navigation menu and click Analytics & AI. Under AI Services, click Anomaly Detection.
    2. In the left-side navigation menu, click Projects.
    3. Select the compartment that contains the project that you want to use.
    4. Click the name of the project.

      The project details page opens.

    5. Click Jobs.
    6. Click Create job.
    7. (Optional) Enter a unique name (255 character limit) for the resource. If you don't provide a name, one is automatically generated.

      For example:

      aianomalydetection<resource>20230825155844

    8. (Optional) Enter a description (400 character limit) for the resource.
    9. Select the model that you want to run this job in.
    10. (Optional) Select the amount of sensitivity for the anomaly detection to use from 0 to 1.
    11. Select the input request type that you want to use for the job.
      • Inline:

        Drag a JSON or CSV file into the File box, or use Select file to locate and select it from a local drive.

      • Object store:

        Select the Object Storage bucket that contains the detection data file, and then select the file you that you want to use for this job. Only CSV files are supported.

        You can use multiple input buckets and detection data files by clicking Additional input bucket and making further selections.

    12. Select an Object Storage output bucket to store the output files in.

      The namespace shows you the tenancy that the job is being created in.

    13. (Optional) Enter a prefix to use to easily identify the results.

      For example, if myModel is the prefix, then the result file is myModel/results-file.json.

    14. Click Create job.

      The asynchronous job status is initially Accepted until the job starts running; then it's In Progress. When the job finishes the status changes to Succeeded. The time it takes to run the job depends on the size of the detection datasets.

    15. Click the completed asynchronous job to view its details and review the job results.

      The anomaly detection results file is saved in a separate folder in the selected Object Storage output bucket. The file name uses the <model-OCID>/<output_bucket_name> naming convention.

      • <model-OCID> is the OCID of the Anomaly Detection model.

      • <output_bucket_name> is the Object Storage bucket name.

      • The anomaly detection results file name is the same as the detection dataset file name suffixed with -results.

  • We recommend this approach when you want to detect anomalies and obtain results from large datasets.

    1. Download and configure OCI CLI or SDK as described in existing OCI documentation.
    2. Use a trained model by creating an HTTP POST request and make the request to the endpoint received in the previous step.
    3. Parse the HTTP response to get the results for use in applications.

    Example API Calls

    Use the following Anomaly Detection commands and required parameters to help you to detect and obtain results:

    1. Get the model:

      https://anomalydetection.aiservice.us-phoenix-1.oci.oraclecloud.com/20210101/models/{ModelId}
       Method: GET
       Body:
    2. Detect asynchronously with data:

      Endpoint: https://anomalydetection.aiservice.us-phoenix-1.oci.oraclecloud.com/20210101/detectAnomalyJobs
      {
        "compartmentId": "ocid1.compartment.oc1..aaaaaaaaaqf4b7xq6kxrrb…..rcmjpbdfmcjmzdufz6sy52pra",
        "description": Ashburn data center,
        "displayName": Ashburn data center,
        "modelId": "ocid1.aianomalydetectionmodel.oc1.iad.amaaaaaaor7l3jia2q565gumqsmurg3anj6a6xad4e5talry7ynqivboyh5a",
        "inputDetails": {
            "inputType": "INLINE",
      "signalNames":["sensor1","sensor2","sensor3","sensor4","sensor5","sensor6","sensor7","sensor8","sensor9","sensor10"],
          "data": [
              {
                  "timestamp": "2020-07-13T18:54:46.000Z",
                  "values": [ 0.2282, -0.7092, -1.2002, -0.7971, 2.0967, -0.7369, -0.5242, -0.3949, -0.6563, -0.9429 ]
              },
              {
                  "timestamp": "2020-07-13T18:55:46.000Z",
                  "values": [ -0.4359, -0.153, -1.3603, -1.4552, 1.3512, -0.3683, -0.7328, -0.5223, -2.1182, -0.6212 ]
              },
              {
                  "timestamp": "2020-07-13T18:56:46.000Z",
                  "values": [ -0.7482, -0.7112, -2.0408, -0.8236, 1.9157, -0.9435, -1.1136, 0.1365, -0.8872, -0.7323 ]
              },
              {
                  "timestamp": "2020-07-13T18:57:46.000Z",
                  "values": [ 0.2655, -1.23, -0.6551, -0.6294, 1.4812, -1.1023, -1.3472, -1.18, -1.4353, -1.1863 ]
              },
              {
                  "timestamp": "2020-07-13T18:58:46.000Z",
                  "values": [ -0.6848, -1.6165, -1.4954, -1.2594, 2.5512, -0.6693, -0.5837, -1.2494, -0.2837, -0.7751 ]
              }
          ]
        },
        "outputDetails": {
          "outputType": "OBJECT_STORAGE",
          "namespaceName": "ax3dvjxgkemg",
          "bucketName": "output-bucket",
          "prefix": "test-prefix"
        }
      }
  • We recommend this approach when you want to detect anomalies and obtain results from large datasets.

    1. Download and configure OCI CLI or SDK as described in existing OCI documentation.
    2. Use a trained model by creating an HTTP POST request and make the request to the endpoint received in the previous step.
    3. Parse the HTTP response to get the results for use in applications.

    Example API Calls

    Use the DetectAnomalies operation to help you to detect and obtain results.

    1. Get the model:

      https://anomalydetection.aiservice.us-phoenix-1.oci.oraclecloud.com/20210101/models/{ModelId}
       Method: GET
       Body:
    2. Detect asynchronously with data:

      Endpoint: https://anomalydetection.aiservice.us-phoenix-1.oci.oraclecloud.com/20210101/detectAnomalyJobs
      {
        "compartmentId": "ocid1.compartment.oc1..aaaaaaaaaqf4b7xq6kxrrb…..rcmjpbdfmcjmzdufz6sy52pra",
        "description": Ashburn data center,
        "displayName": Ashburn data center,
        "modelId": "ocid1.aianomalydetectionmodel.oc1.iad.amaaaaaaor7l3jia2q565gumqsmurg3anj6a6xad4e5talry7ynqivboyh5a",
        "inputDetails": {
            "inputType": "INLINE",
      "signalNames":["sensor1","sensor2","sensor3","sensor4","sensor5","sensor6","sensor7","sensor8","sensor9","sensor10"],
          "data": [
              {
                  "timestamp": "2020-07-13T18:54:46.000Z",
                  "values": [ 0.2282, -0.7092, -1.2002, -0.7971, 2.0967, -0.7369, -0.5242, -0.3949, -0.6563, -0.9429 ]
              },
              {
                  "timestamp": "2020-07-13T18:55:46.000Z",
                  "values": [ -0.4359, -0.153, -1.3603, -1.4552, 1.3512, -0.3683, -0.7328, -0.5223, -2.1182, -0.6212 ]
              },
              {
                  "timestamp": "2020-07-13T18:56:46.000Z",
                  "values": [ -0.7482, -0.7112, -2.0408, -0.8236, 1.9157, -0.9435, -1.1136, 0.1365, -0.8872, -0.7323 ]
              },
              {
                  "timestamp": "2020-07-13T18:57:46.000Z",
                  "values": [ 0.2655, -1.23, -0.6551, -0.6294, 1.4812, -1.1023, -1.3472, -1.18, -1.4353, -1.1863 ]
              },
              {
                  "timestamp": "2020-07-13T18:58:46.000Z",
                  "values": [ -0.6848, -1.6165, -1.4954, -1.2594, 2.5512, -0.6693, -0.5837, -1.2494, -0.2837, -0.7751 ]
              }
          ]
        },
        "outputDetails": {
          "outputType": "OBJECT_STORAGE",
          "namespaceName": "ax3dvjxgkemg",
          "bucketName": "output-bucket",
          "prefix": "test-prefix"
        }
      }