Troubleshooting guide

Troubleshoot ML Monitoring Managed containers

Jobs UI shows Service managed container name and/or version are not supported

User is passing invalid managed container url in the CONTAINER_CUSTOM_IMAGE variable

  • Ensure that the container is present on the url : dsmc://odsc-ml-monitoring-application:<version> Supported versions are listed here

  • Ensure the customer is using ML Monitoring Application in OC1 realm. We are not available on non-OC1 realms

Unable to read application config specified in CONFIG_LOCATION variable

The logs on ML job shows :

FileNotFoundError: Either the bucket named <bucket_name> does not exist in the namespace <namespace> or you are not authorized to access it

  • Ensure the application config is set using the environment variable: CONFIG_LOCATION

  • Ensure the application config is available on customer provided object storage

  • Ensure the URL provided is valid and exists

  • Ensure that ML job has Dynamic Group and Policies needs to be added for providing Object Storage access to the Job are configured appropriately Refer to the setup section.

Job run fails with not able to read baseline or prediction input dataset

Data reader specified in the application config does not have read permission to read from the input data location This is evident from the exception logs present in the ML job logs:

"Data reader baseline_reader read permissions": "(‘Invalid application configuration passed for : OciObjectStorageResourceValidation, Error: Read permission for file path <file_path> is unauthorized

  • Ensure the baseline_reader or prediction_reader section is set in the monitor config. Please see here for details

  • Ensure the input dataset is available on customer provided object storage

  • Ensure the URL provided is valid and exists

  • Ensure that ML job has Dynamic Group and Policies needs to be added for providing Object Storage data location specified in data reader in application config read access to the Job are configured appropriately Refer to the setup section.

  • Ensure the right reader type is set Supported reader type are here

  • One should run the ML job with action type as RUN_CONFIG_VALIDATION to ensure that all the configs are correct , If all the configs are correct then Job run is successful

Output json is not generated however job run is successful

Output json is not generated however job run is successful is due to missing Postprocessor Configuration in Application config

  • Ensure that a valid post processor is configured in the application config

  • Supported post processor SaveMetricOutputAsJsonPostProcessor

  • Namespace and bucket_name are mandatory parameters

  • One should run the ML job with action type as RUN_CONFIG_VALIDATION to ensure that all the configs are correct , If all the configs are correct then Job run is successful

Miscellaneous Run failures

Multiple issues that can cause runtime failures are as follows:

  • ApplicationActionType is not valid

  • The Value for required parameter ACTION_TYPE is empty

  • DATE_RANGE configured is not valid Refer to the Input Contracts

  • Valid Monitor id should be present in the application config

  • In valid storage details are provided

  • Valid Metric/transormer/InputScema/Postprocessor Configuration should be present

Possible solution are as follows :

  • One should run the ML job with action type as RUN_CONFIG_VALIDATION to ensure that all the configs are correct , If all the configs are correct then Job run is successful

  • Refer to the Input Contracts