Publish GPU Instance Metrics to Oracle Cloud Infrastructure Monitoring
Note:
Instructions in this article are specific to the operating system you are using. Ensure you are using the instructions related to your operating system.Test the Linux Shell Script
If you created a Linux shell script, run the script manually to verify there are no errors.
publishGPUMetrics.sh
saved to the directory oci-gpu-monitoring/scripts
.
Create a Cron Job to Run the Linux Script Automatically
Asssuming you don't see any errors in the logs from the previous exercise, now create a Cron job so the script runs automatically
The example job below runs the script every minute, but you can change the frequency of the Cron job depending on your needs. Custom metrics can be posted as frequently as every second (minimum frequency of one second), but the minimum aggregation interval is one minute.
Note:
Instead of selecting the values from the fields in the console, you can also use the Query Code Editor by selecting Advanced Mode. For example, to get the same chart as above, use this query (be sure to change the value of theresourceId
to your GPU instance ID):gpuTemperature[1m]{resourceId = "your_gpu_instance_id"}.mean()
Test the PowerShell Script
If you created a Windows PowerShell script, run it manually to verify there are no errors.
publishGPUMetrics.ps1
saved to the directory oci-gpu-monitoring
. You will run the following commands in PowerShell.
Create and Execute a Scheduled Task to Run the Windows Script Automatically
Asssuming you don't see any errors in the logs from the previous exercise, now create a scheduled task so the script runs automatically
This example runs the script every minute, but you can change the frequency of the task depending on your needs. Custom metrics can be posted as frequently as every second (minimum frequency of one second), but the minimum aggregation interval is one minute.
In your code editor, create a new scheduled task script:
Note:
Instead of selecting the values from the fields in the console, you can also use the Query Code Editor by selecting Advanced Mode. For example, to get the same chart as above, use this query (be sure to change the value of theresourceId
to your GPU instance ID):gpuTemperature[1m]{resourceId = "your_gpu_instance_id"}.mean()