AMD GPU Plugin

When you enable the AMD GPU Plugin cluster add-on, you can pass the following key/value pairs as arguments.

Note that to ensure that workloads running on AMD GPU worker nodes are not interrupted unexpectedly, we recommend that you choose the version of the AMD GPU Plugin add-on to deploy, rather than specifying that you want Oracle to update the add-on automatically.

Configuration Arguments Common to all Cluster Add-ons


Key (API and CLI)	Key's Display Name (Console)	Description	Required/Optional	Default Value	Example Value
`affinity`	affinity	A group of affinity scheduling rules. JSON format in plain text or Base64 encoded. Not used by: Nvidia GPU Operator Possible equivalents: Node Feature Discovery, use `master.affinity` NVIDIA Network Operator, use `operator.affinity` CSI Driver SMB, use `contoller.affinity`	Optional	null	null
`nodeSelectors`	node selectors	You can use node selectors and node labels to control the worker nodes on which add-on pods run. For a pod to run on a node, the pod's node selector must have the same key/value as the node's label. Set `nodeSelectors` to a key/value pair that matches both the pod's node selector, and the worker node's label. JSON format in plain text or Base64 encoded. Not used by: NVIDIA GPU Operator CSI Driver SMB Possible equivalents: Node Feature Discovery, use `worker.nodeSelector` NVIDIA Network Operator, use `operator.nodeSelectors`	Optional	null	`{"foo":"bar", "foo2": "bar2"}` The pod will only run on nodes that have the `foo=bar` or `foo2=bar2` label.
`numOfReplicas`	numOfReplicas	The number of replicas of the add-on deployment. Not used by: AMD GPU Plugin NVIDIA GPU Operator NVIDIA Network Operator CSI Driver SMB Possible equivalents: CoreDNS, use `nodesPerReplica` Node Feature Discovery, use `master.replicaCount`	Required	`1` Creates one replica of the add-on deployment per cluster.	`2` Creates two replicas of the add-on deployment per cluster.
`rollingUpdate`	rollingUpdate	Controls the desired behavior of rolling update by maxSurge and maxUnavailable. JSON format in plain text or Base64 encoded. Not used by: Node Feature Discovery NVIDIA Network Operator CSI Driver SMB Possible equivalents: NVIDIA GPU Operator, use `daemonsets.rollingUpdate.maxUnavailable`	Optional	null	null
`tolerations`	tolerations	You can use taints and tolerations to control the worker nodes on which add-on pods run. For a pod to run on a node that has a taint, the pod must have a corresponding toleration. Set `tolerations` to a key/value pair that matches both the pod's toleration, and the worker node's taint. JSON format in plain text or Base64 encoded. Possible equivalents: Node Feature Discovery, use `master.tolerations` and/or `worker.tolerations` NVIDIA GPU Operator, use `daemonsets.tolerations` NVIDIA Network Operator, use `operator.tolerations` CSI Driver SMB, use `controller.tolerations`	Optional	null	`[{"key":"tolerationKeyFoo", "value":"tolerationValBar", "effect":"noSchedule", "operator":"exists"}]` Only pods that have this toleration can run on worker nodes that have the `tolerationKeyFoo=tolerationValBar:noSchedule` taint.
`topologySpreadConstraints`	topologySpreadConstraints	How to spread matching pods among the given topology. JSON format in plain text or Base64 encoded. Not used by: Node Feature Discovery NVIDIA GPU Operator NVIDIA Network Operator CSI Driver SMB	Optional	null	null

Configuration Arguments Specific to this Cluster Add-on


Key (API and CLI)	Key's Display Name (Console)	Description	Required/Optional	Default Value	Example Value
`amd-gpu-device-plugin.ContainerResources`	amd-gpu-device-plugin container resources	You can specify the resource quantities that the add-on containers request, and set resource usage limits that the add-on containers cannot exceed. JSON format in plain text or Base64 encoded.	Optional	null	`{"limits": {"cpu": "500m", "memory": "200Mi" }, "requests": {"cpu": "100m", "memory": "100Mi"}}` Create add-on containers that request 100 milllicores of CPU, and 100 mebibytes of memory. Limit add-on containers to 500 milllicores of CPU, and 200 mebibytes of memory.
`pulse`	Enable health checks	Time interval in seconds for the plugin to update the kubelet with device health status. Set to `0` to disable the health check.	Optional	`0`

Oracle Cloud Infrastructure Documentation

AMD GPU Plugin