AMD GPU Plugin

When you enable the AMD GPU Plugin cluster add-on, you can pass the following key/value pairs as arguments.

Note that to ensure that workloads running on AMD GPU worker nodes are not interrupted unexpectedly, we recommend that you choose the version of the AMD GPU Plugin add-on to deploy, rather than specifying that you want Oracle to update the add-on automatically.

Configuration Arguments Common to all Cluster Add-ons
Key (API and CLI) Key's Display Name (Console) Description Required/Optional Default Value Example Value
affinity affinity

A group of affinity scheduling rules.

JSON format in plain text or Base64 encoded.

Optional null null
nodeSelectors node selectors

You can use node selectors and node labels to control the worker nodes on which add-on pods run.

For a pod to run on a node, the pod's node selector must have the same key/value as the node's label.

Set nodeSelectors to a key/value pair that matches both the pod's node selector, and the worker node's label.

JSON format in plain text or Base64 encoded.

Optional null {"foo":"bar", "foo2": "bar2"}

The pod will only run on nodes that have the foo=bar or foo2=bar2 label.

numOfReplicas numOfReplicas The number of replicas of the add-on deployment.

For AMD GPU Plugin, not used.

For CoreDNS, use nodesPerReplica instead.

Required 1

Creates one replica of the add-on deployment per cluster.

2

Creates two replicas of the add-on deployment per cluster.

rollingUpdate rollingUpdate

Controls the desired behavior of rolling update by maxSurge and maxUnavailable.

JSON format in plain text or Base64 encoded.

Optional null null
tolerations tolerations

You can use taints and tolerations to control the worker nodes on which add-on pods run.

For a pod to run on a node that has a taint, the pod must have a corresponding toleration.

Set tolerations to a key/value pair that matches both the pod's toleration, and the worker node's taint.

JSON format in plain text or Base64 encoded.

Optional null [{"key":"tolerationKeyFoo", "value":"tolerationValBar", "effect":"noSchedule", "operator":"exists"}]

Only pods that have this toleration can run on worker nodes that have the tolerationKeyFoo=tolerationValBar:noSchedule taint.

topologySpreadConstraints topologySpreadConstraints

How to spread matching pods among the given topology.

JSON format in plain text or Base64 encoded.

Optional null null
Configuration Arguments Specific to this Cluster Add-on
Key (API and CLI) Key's Display Name (Console) Description Required/Optional Default Value Example Value
amd-gpu-device-plugin.ContainerResources amd-gpu-device-plugin container resources

You can specify the resource quantities that the add-on containers request, and set resource usage limits that the add-on containers cannot exceed.

JSON format in plain text or Base64 encoded.

Optional null {"limits": {"cpu": "500m", "memory": "200Mi" }, "requests": {"cpu": "100m", "memory": "100Mi"}}

Create add-on containers that request 100 milllicores of CPU, and 100 mebibytes of memory. Limit add-on containers to 500 milllicores of CPU, and 200 mebibytes of memory.

pulse Enable health checks

Time interval in seconds for the plugin to update the kubelet with device health status.

Set to 0 to disable the health check.

Optional 0