Train and Deploy the Model

After the data is cleansed, prepared, and moved into Object Storage, you are ready to train and deploy the model.

Create and Train the Model

When you create the model, you specify the training data asset and set some parameters. The model is trained automatically when you create it.

The following diagram shows the process.

Description of training-flow.png follows
Description of the illustration training-flow.png

Here is the process for creating and training a model:

  1. Create a project. You create a project in a compartment in OCI and give the project a name. The compartment can be one that is specifically for holding one or more anomaly detection projects.
  2. Specify the training data asset, which is a file in Object Storage. The File should be clean and ready for training. If it is not, you can use OCI services such as Data Science to do the cleaning and pre-processing. The file can be in CSV format or in JSON format.
  3. Create the model. When you create the model, you select the training data asset and set the False Alarm Probability and the Training Fraction Ratio. The model is trained as part of the creation process.

The Anomaly Detection Service documentation has detailed instructions on how to do this. You can use the console UI, or you can use the REST API.

Here is some guidance on setting the false alarm probability and the train fraction ratio:

False Alarm Probability
This is the probability that a detected anomaly is not actually an anomaly. Set this value to be close to the same level of the percentage of anomalies that you find in real business scenarios. A value 0.01 (1%) is appropriate for many scenarios. The lower the value, the longer it takes to train the model. Also, if you set the target false alarm probability too low, the model might not achieve the target.
Train Fraction Ratio
This is the amount of data that is used for training. For example, a value of 0.7 specifies that 70% of the data is used for training, and 30% is used for testing and evaluating the model's performance.

Deploy and Test the Model

After you create the model, you must deploy it before you can use it.

When the model is deployed, it's ready to receive data that you want to test for anomalies.

You can use the Console UI to deploy the model, or use the REST API. When you deploy the model, you give it a name. You can also give it a description, but that is optional. A model can have more than one deployment.

The following screenshot shows an example of a model in the console UI. To add a deployment you click the Add Deployment button.

Description of add-deployment.png follows
Description of the illustration add-deployment.png

Detect Anomalies

You can submit data for anomaly detection in batches, or you can detect anomalies in streaming data.

The following diagram illustrates a batch processing architecture.

Description of predictions-batch-flow.png follows
Description of the illustration predictions-batch-flow.png

Batches are processed as follows:

  1. Data is collected into an Object Storage bucket from Streaming or from other databases via Oracle Data Integration.
  2. Object Storage is the landing zone for batch data to be processed by Anomaly Detection service.
  3. Data pre-processing can be accomplished on hosted applications, containers, or through serverless functions. The processed data is sent to Anomaly Detection service.
  4. Anomaly Detection service makes predictions using the model that was trained and deployed during the training phase.
  5. Inferences produced by Anomaly Detection service become immediate actions that are sent to applications or notification platforms via REST.
  6. Anomaly Detection service inference results can be stored on object storage for later use in analytics, logging, and notification services.

The streaming architecture is more complex than the batch architecture, but is needed when you want real-time or near real-time anomaly detections.

The following diagram illustrates a streaming architecture.

Description of predictions-streaming-flow.png follows
Description of the illustration predictions-streaming-flow.png
  1. Streaming service ingests data from different streaming data sources.
  2. Data pre-processing, if needed, is accomplished on hosted applications, containers, or through serverless functions. The processed data is sent to Anomaly Detection service stream interface. If data is well known and no additional processing is required, the stream can connect directly to Anomaly Detection service.
  3. Anomaly Detection service makes predictions using the model that was trained and deployed during training phase.
  4. Anomaly Detection service posts inferences into an output stream for actions to be taken, and logging.
  5. Inferences produced by Anomaly Detection service become immediate actions that are sent to applications or notification platforms through applications on VMs or containers or through serverless Functions.
  6. The output stream from Anomaly Detection service can populate a pipeline for downstream operations and analysis.

Anomaly Detection service inference results can be stored on object storage for later use in analytics, logging, and notification services.