Hyperparameters for Fine-Tuning a Model in Generative AI

OCI Generative AI fine-tunes each base model using the following hyperparameters, which are based on the pre-trained base model.

Tip

Start training each model with its default hyperparameter values. After the model is created, in the model's detail page, under Model Performance, check the values for accuracy and loss. If you're not happy with the results, create another model with either a larger dataset or different hyperparameters until the performance improves.

meta.llama-3.3-70b-instruct

The following table outlines the hyperparameters OCI Generative AI uses to train a meta.llama-3.3-70b-instruct base model with the LoRA method.


Hyperparameter	Description	Valid Range	Default Value
Total training epochs	The number of times the training iterates through the entire training dataset. For example, 1 `epoch` means that the model is trained by using the entire training dataset one time.	1 or a higher integer	3
Learning rate	The speed at which the model weights are updated against the error gradient.	A number between 0 and 1.0	0.0002
Training batch size	The number of samples in a mini batch to go through before updating the model's parameters.	An integer between 8 and 16	8
Early stopping patience	Defines the number of grace periods to continue the evaluation cycle, after the early stopping threshold is triggered. Training stops if the loss metric doesn't improve beyond the early stopping threshold for this many times of evaluation.	0 to disable and 1 or a higher integer to add a grace period	15
Early stopping threshold	Loss improves when it decreases in the next training cycle. If loss doesn't improve enough, you can stop the training. Define the minimum evaluation loss improvement that should trigger the early stopping counter. If loss doesn't improve beyond the minimum value during the patience period, training stops. Otherwise, training continues and the counter resets.	0 or a positive number	0.0001
Log model metrics interval in steps	The number of steps per logging. Model metrics such as training loss and learning rate are logged. If the training loss is not decreasing as expected, review the training data or training rate.	Preset to 10	10
`LoRA r` (for `LoRA` method only)	The attention dimension (rank) of the update matrices. A lower rank results in smaller update matrices with fewer trainable parameters.	An integer between 1 and 64	8
`LoRA alpha` (for `LoRA` method only)	The `alpha` parameter for `LoRA` scaling. The `LoRA` weight matrices are scaled by dividing `LoRA` alpha by `LoRA r`. The `alpha` parameter defines the `LoRA` weights, which are a smaller number of new weights and are the only weights that are trained in the model.	An integer between 1 and 128	8
`LoRA dropout` (For `LoRA` method only)	The dropout probability for neurons in the `LoRA` layers. The dropout method prevents overfitting by randomly ignoring (dropping out) neurons within a layer. A 10% dropout means that each neuron has a 10% chance of being dropped.	A decimal number less than 1 for percentage, such as 0.1 for 10%	0.1

The following equation shows how the model calculates the totalTrainingSteps parameter.

totalTrainingSteps = (totalTrainingEpochs * size(trainingDataset)) / trainingBatchSize

In the preceding equation, the model ignores some rounding calculations.

meta.llama-3.1-70b-instruct

The following table outlines the hyperparameters OCI Generative AI uses to train a meta.llama-3.1-70b-instruct base model with the LoRA method.


Hyperparameter	Description	Valid Range	Default Value
Total training epochs	The number of times the training iterates through the entire training dataset. For example, 1 `epoch` means that the model is trained by using the entire training dataset one time.	1 or a higher integer	3
Learning rate	The speed at which the model weights are updated against the error gradient.	A number between 0 and 1.0	0.0002
Training batch size	The number of samples in a mini batch to go through before updating the model's parameters.	An integer between 8 and 16	8
Early stopping patience	Defines the number of grace periods to continue the evaluation cycle, after the early stopping threshold is triggered. Training stops if the loss metric doesn't improve beyond the early stopping threshold for this many times of evaluation.	0 to disable and 1 or a higher integer to add a grace period	15
Early stopping threshold	Loss improves when it decreases in the next training cycle. If loss doesn't improve enough, you can stop the training. Define the minimum evaluation loss improvement that should trigger the early stopping counter. If loss doesn't improve beyond the minimum value during the patience period, training stops. Otherwise, training continues and the counter resets.	0 or a positive number	0.0001
Log model metrics interval in steps	The number of steps per logging. Model metrics such as training loss and learning rate are logged. If the training loss is not decreasing as expected, review the training data or training rate.	Preset to 10	10
`LoRA r` (for `LoRA` method only)	The attention dimension (rank) of the update matrices. A lower rank results in smaller update matrices with fewer trainable parameters.	An integer between 1 and 64	8
`LoRA alpha` (for `LoRA` method only)	The `alpha` parameter for `LoRA` scaling. The `LoRA` weight matrices are scaled by dividing `LoRA` alpha by `LoRA r`. The `alpha` parameter defines the `LoRA` weights, which are a smaller number of new weights and are the only weights that are trained in the model.	An integer between 1 and 128	8
`LoRA dropout` (For `LoRA` method only)	The dropout probability for neurons in the `LoRA` layers. The dropout method prevents overfitting by randomly ignoring (dropping out) neurons within a layer. A 10% dropout means that each neuron has a 10% chance of being dropped.	A decimal number less than 1 for percentage, such as 0.1 for 10%	0.1

The following equation shows how the model calculates the totalTrainingSteps parameter.

totalTrainingSteps = (totalTrainingEpochs * size(trainingDataset)) / trainingBatchSize

In the preceding equation, the model ignores some rounding calculations.

meta.llama-3-70b-instruct

The following table outlines the hyperparameters OCI Generative AI uses to train a meta.llama-3-70b-instruct (deprecated) base model with the LoRA method.


Hyperparameter	Description	Valid Range	Default Value
Total training epochs	The number of times the training iterates through the entire training dataset. For example, 1 `epoch` means that the model is trained by using the entire training dataset one time.	1 or a higher integer	3
Learning rate	The speed at which the model weights are updated against the error gradient.	A number between 0 and 1.0	0.0002
Training batch size	The number of samples in a mini batch to go through before updating the model's parameters.	An integer between 8 and 16	8
Early stopping patience	Defines the number of grace periods to continue the evaluation cycle, after the early stopping threshold is triggered. Training stops if the loss metric doesn't improve beyond the early stopping threshold for this many times of evaluation.	0 to disable and 1 or a higher integer to add a grace period	15
Early stopping threshold	Loss improves when it decreases in the next training cycle. If loss doesn't improve enough, you can stop the training. Define the minimum evaluation loss improvement that should trigger the early stopping counter. If loss doesn't improve beyond the minimum value during the patience period, training stops. Otherwise, training continues and the counter resets.	0 or a positive number	0.0001
Log model metrics interval in steps	The number of steps per logging. Model metrics such as training loss and learning rate are logged. If the training loss is not decreasing as expected, review the training data or training rate.	Preset to 10	10
`LoRA r` (for `LoRA` method only)	The attention dimension (rank) of the update matrices. A lower rank results in smaller update matrices with fewer trainable parameters.	An integer between 1 and 64	8
`LoRA alpha` (for `LoRA` method only)	The `alpha` parameter for `LoRA` scaling. The `LoRA` weight matrices are scaled by dividing `LoRA` alpha by `LoRA r`. The `alpha` parameter defines the `LoRA` weights, which are a smaller number of new weights and are the only weights that are trained in the model.	An integer between 1 and 128	8
`LoRA dropout` (For `LoRA` method only)	The dropout probability for neurons in the `LoRA` layers. The dropout method prevents overfitting by randomly ignoring (dropping out) neurons within a layer. A 10% dropout means that each neuron has a 10% chance of being dropped.	A decimal number less than 1 for percentage, such as 0.1 for 10%	0.1

The following equation shows how the model calculates the totalTrainingSteps parameter.

totalTrainingSteps = (totalTrainingEpochs * size(trainingDataset)) / trainingBatchSize

In the preceding equation, the model ignores some rounding calculations.

cohere.command-r-16k (deprecated)

The following table outlines the hyperparameters OCI Generative AI uses to train a cohere.command-r-16k (deprecated) base model with the T-Few method.


Hyperparameter	Description	Valid Range	Default Value
Total training epochs	The number of times the training iterates through the entire training dataset. For example, 1 `epoch` means that the model is trained by using the entire training dataset one time.	An integer between 1 and 10	1
Learning rate	The speed at which the model weights are updated against the error gradient.	A number between 0.000005 and 0.1	0.01
Training batch size	The number of samples in a mini batch to go through before updating the model's parameters.	An integer between 8 and 32	16
Early stopping patience	Defines the number of grace periods to continue the evaluation cycle, after the early stopping threshold is triggered. Training stops if the loss metric doesn't improve beyond the early stopping threshold for this many times of evaluation.	0 to disable and an integer between 1 and 16 to add a grace period	10
Early stopping threshold	Loss improves when it decreases in the next training cycle. If loss doesn't improve enough, you can stop the training. Define the minimum evaluation loss improvement that should trigger the early stopping counter. If loss doesn't improve beyond the minimum value during the patience period, training stops. Otherwise, training continues and the counter resets.	A number between 0.001 and 0.1	0.001
Log model metrics interval in steps	The number of steps per logging. Model metrics such as training loss and learning rate are logged. If the training loss is not decreasing as expected, review the training data or training rate.	Can't be tuned and is set to 1.	1

The following equation shows how the model calculates the totalTrainingSteps parameter.

totalTrainingSteps = (totalTrainingEpochs * size(trainingDataset)) / trainingBatchSize

In the preceding equation, the model ignores some rounding calculations.

cohere.command-r-08-2024

The following table outlines the hyperparameters OCI Generative AI uses to train a cohere.command-r-16k base model with the T-Few method.


Hyperparameter	Description	Valid Range	Default Value
Total training epochs	The number of times the training iterates through the entire training dataset. For example, 1 `epoch` means that the model is trained by using the entire training dataset one time.	An integer between 1 and 10	1
Learning rate	The speed at which the model weights are updated against the error gradient.	A number between 0.000005 and 0.1	0.01
Training batch size	The number of samples in a mini batch to go through before updating the model's parameters.	An integer between 8 and 32	16
Early stopping patience	Defines the number of grace periods to continue the evaluation cycle, after the early stopping threshold is triggered. Training stops if the loss metric doesn't improve beyond the early stopping threshold for this many times of evaluation.	0 to disable and an integer between 1 and 16 to add a grace period	10
Early stopping threshold	Loss improves when it decreases in the next training cycle. If loss doesn't improve enough, you can stop the training. Define the minimum evaluation loss improvement that should trigger the early stopping counter. If loss doesn't improve beyond the minimum value during the patience period, training stops. Otherwise, training continues and the counter resets.	A number between 0.001 and 0.1	0.001
Log model metrics interval in steps	The number of steps per logging. Model metrics such as training loss and learning rate are logged. If the training loss is not decreasing as expected, review the training data or training rate.	Can't be tuned and is set to 1.	1

The following equation shows how the model calculates the totalTrainingSteps parameter.

totalTrainingSteps = (totalTrainingEpochs * size(trainingDataset)) / trainingBatchSize

In the preceding equation, the model ignores some rounding calculations.

cohere.command (deprecated)

The following table describes the hyperparameters that OCI Generative AI uses to train the cohere.command (deprecated) base model and displays the default values for the T-Few and the Vanilla methods.

Note

The cohere.command is available only in the US Midwest (Chicago) region.


Hyperparameter	Description	Valid Range	Default Value for T-Few	Default Value for Vanilla
Total training epochs	The number of times the training iterates through the entire training dataset. For example, 1 `epoch` means that the model is trained by using the entire training dataset one time.	1 or a higher integer	3	3
Learning rate	The speed at which the model weights are updated against the error gradient.	A number between 0 and 1.0	0.01	0.0000006 (6e-7)
Training batch size	The number of samples in a mini batch to go through before updating the model's parameters.	8	8	8
Early stopping patience	Defines the number of grace periods to continue the evaluation cycle, after the early stopping threshold is triggered. Training stops if the loss metric doesn't improve beyond the early stopping threshold for this many times of evaluation.	0 to disable and 1 or a higher integer to add a grace period	6	6
Early stopping threshold	Loss improves when it decreases in the next training cycle. If loss doesn't improve enough, you can stop the training. Define the minimum evaluation loss improvement that should trigger the early stopping counter. If loss doesn't improve beyond the minimum value during the patience period, training stops. Otherwise, training continues and the counter resets.	0 or a positive number	0.01	0.01
Log model metrics interval in steps	The number of steps per logging. Model metrics such as training loss and learning rate are logged. If the training loss is not decreasing as expected, review the training data or training rate.	0 to disable and an integer between 1 and the total training steps to log.	10	10
Number of last layers (for `Vanilla` method only)	The number of last layers to be fine-tuned in the `Vanilla` method.	An integer between 1 and 15	not applicable	15

The following equation shows how the model calculates the totalTrainingSteps parameter.

totalTrainingSteps = (totalTrainingEpochs * size(trainingDataset)) / trainingBatchSize

In the preceding equation, the model ignores some rounding calculations.

cohere.command-light (deprecated)

The following table describes the hyperparameters that OCI Generative AI uses to train the cohere.command-light (deprecated) base model and displays the default values for the T-Few and the Vanilla methods.

Note

The cohere.command-light is available only in the US Midwest (Chicago) region.


Hyperparameter	Description	Valid Range	Default Value for T-Few	Default Value for Vanilla
Total training epochs	The number of times the training iterates through the entire training dataset. For example, 1 `epoch` means that the model is trained by using the entire training dataset one time.	1 or a higher integer	3	3
Learning rate	The speed at which the model weights are updated against the error gradient.	A number between 0 and 1.0	0.01	0.00001
Training batch size	The number of samples in a mini batch to go through before updating the model's parameters.	An integer between 8 and 16	16	16
Early stopping patience	Defines the number of grace periods to continue the evaluation cycle, after the early stopping threshold is triggered. Training stops if the loss metric doesn't improve beyond the early stopping threshold for this many times of evaluation.	0 to disable and 1 or a higher integer to add a grace period	6	6
Early stopping threshold	Loss improves when it decreases in the next training cycle. If loss doesn't improve enough, you can stop the training. Define the minimum evaluation loss improvement that should trigger the early stopping counter. If loss doesn't improve beyond the minimum value during the patience period, training stops. Otherwise, training continues and the counter resets.	0 or a positive number	0.01	0.01
Log model metrics interval in steps	The number of steps per logging. Model metrics such as training loss and learning rate are logged. If the training loss is not decreasing as expected, review the training data or training rate.	0 to disable and an integer between 1 and the total training steps to log.	10	10
Number of last layers (for `Vanilla` method only)	The number of last layers to be fine-tuned in the `Vanilla` method.	An integer between 1 and 14	not applicable	14

The following equation shows how the model calculates the totalTrainingSteps parameter.

totalTrainingSteps = (totalTrainingEpochs * size(trainingDataset)) / trainingBatchSize

In the preceding equation, the model ignores some rounding calculations.

Oracle Cloud Infrastructure Documentation

Hyperparameters for Fine-Tuning a Model in Generative AI