Fine-Tune & Deploy an Open Source LLM on GPU using Data Science and AI Quick Actions

Introduction

This tutorial walks you through using the Oracle Cloud Infrastructure (OCI) Data Science service to fine-tune an open-source LLM using the AI Quick Actions functionality that is provided by Data Science. With point-and-click ease, you will use AI Quick Actions to fine-tuned a Mistral LLM provided by Hugging Face, with that LLM fine-tuned on a FAQ published by NVIDIA. AI Quick Actions is then used to deploy that tuned model in OCI on an A10 GPU shape. Python code running in a Jupyter notebook is then used to show that the tuned model’s output has the desired style and tone that is similar to the NVIDIA training data.

Objectives

Prerequisites

Task 1: Provision a Data Science Notebook Session

  1. Use the OCI console to create a Data Science Project.

  2. Navigate to that Project and create a Data Science notebook session having two or more ECPUs.

  3. Open that notebook session and click Extend.

  4. Start a terminal session in Data Science.

  5. Use that terminal to clone a github repo containing the Jupyter notebooks that will be used by this tutorial:

    git clone https://github.com/oracle-nace-dsai/quick-actions-demo-archive.git
    
  6. Clone NVIDIA FAQ:

    git clone https://huggingface.co/datasets/ajsbsd/nvidia-qa
    
    
  7. Copy the NVIDIA FAQ to the first repo’s data directory

    cp nvidia-qa/NvidiaDocumentationQandApairs.csv quick-actions-demo-archive/data/.
    
    
  8. Install and then activate the General Machine Learning for CPUs on Python 3.11 conda:

    odsc conda install -s generalml_p311_cpu_x86_64_v1 
    conda activate /home/datascience/conda/generalml_p311_cpu_x86_64_v1
    
  9. Install LangChain per https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/ai-quick-actions/model-deployment-tips.md#using-python-sdk-without-streaming

    pip install langgraph "langchain>=0.3" "langchain-community>=0.3" "langchain-openai>=0.2.3" "oracle-ads>2.12"
    

Task 2: Set up a Hugging Face Account

  1. Create a Hugging Face account at https://huggingface.co.

  2. Navigate to your Hugging Face account > Access Tokens and create a new User Access Token having these permissions checked:

    • Read access to contents of all repos under your personal namespace
    • Read access to contents of all public gated repos you can access
  3. Use a Data Science terminal session to log your User Access Token with Hugging Face:

    git config --global credential.helper store
    huggingface-cli login
    
    

Task 3: Create an Object Storage Bucket

Create an Object Storage bucket in the same region and compartment as the Data Science notebook.

Task 4: Set up Logging

Create a Log Group and then Create Custom Log

Task 5: Use Data Science’s AI Quick Actions to Deploy LLM on A10 GPU Shape, with No Fine-Tuning

  1. Navigate to Data Science notebook > Launcher > AI Quick Actions

    a. Search for the Mistral models
    b. Click on the mistralai/Mistral-7B-Instruct-v0.3 tile
    c. Click Deploy with the above Log selected

  2. Model deployment takes about 15 minutes. You can monitor the deployment log by selecting Open logs in terminal.

  3. After model deployment completes, navigate to Deployments > <your just-deployed model> > Test your model and playtest that model with simple questions, such as:

    Who wrote the Harry Potter book series?
    
    
  4. Some simple LLMs can fail to answer the following test questions correctly, but Mistral-7B-Instruct-v0.3 does a fairly good job answering of answering these:

    A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?
    Every cat has four legs. My pet has four legs. Is my pet a cat?
    Who is President of the United States? 
    

Task 6: Interact with Deployed Model’s Endpoint

  1. Navigate to Deployments > <your just-deployed model> Invoke your model to see your deployed model’s endpoint. Then use the Data Science terminal to store that endpoint as a shell variable. For example:

    endpoint=https://modeldeployment.<region>.oci.customer-oci.com/<model_ocid>/predict
    
  2. Send a prompt into the deployed model’s endpoint:

    prompt="Who is President of the United States?"
    request_body='{"model":"odsc-llm","prompt":"'$prompt'","max_tokens":100,"temperature":0.1,"top_k":50,"top_p":0.99,"stop":[],"frequency_penalty":0,"presence_penalty":0}'
    oci raw-request --http-method POST --target-uri $endpoint --request-body "$request_body" --auth resource_principal
    
  3. Use this bash loop to call the model’s endpoint 100 times in ten seconds:

    for i in $(seq 1 100); do
         oci raw-request --http-method POST --target-uri $endpoint --request-body "$request_body" --auth resource_principal &
         echo $i
         sleep 0.1
      done
    
    
  4. Navigate to Deployments > <your just-deployed model> > Log to view that traffic just sent into the model’s endpoint.

img.png

Description of the illustration log_traffic.png

Task 7: Use AI Quick Actions to Fine-Tune an LLM

The previous task deployed a Mistral LLM with no fine-tuning. This Task will fine-tune and then deploy the same LLM. Task 9 will compare both models’ outputs, tuned and untuned. This Task and the next also make use of two Jupyter notebooks that were downloaded from this code archive.

  1. Use the Data Science notebook’s file browser to navigate to the quick-actions-demo-archive folder and open the prep_data.ipynb Jupyter notebook.

  2. Select the generalml_p311_cpu_x86_64_v1 kernel.

  3. Revise notebook’s second-to-last paragraph so that it refers to your tenancy/namespace and your Object Storage bucket.

  4. Execute the prep_data.ipynb Jupyter notebook, which will:

    • read the NVIDIA FAQ from file data/NvidiaDocumentationQandApairs.csv
    • recast the CSV FAQ as JSON records having the prompt and completion fields that are expected by AI Quick Actions.
    • perform a 90:10 split of that data into train:test samples.
    • push the training sample into the file quick_actions/tuning_data/tune_sample.jsonl in Object Storage.
  5. Navigate to AI Quick Actions > Models > mistralai/Mistral-7B-Instruct-v0.3. Then click Fine-Tune with these settings:

    • Object Storage path = quick_actions/tuning_data/tune_sample.jsonl
    • validation split = 20%
    • results Object Storage path = quick_actions/tuning_results
    • shape = BM.GPU.A10.4 if availability exists. Otherwise use 10.2 or 10.1 shapes
    • select your Log Group and Log
  6. Enable Show advanced configurations with these settings:

    • batch_size = 64
    • sequence_len = 256
    • learning_rate = 0.000025
    • epochs = 12
  7. Fine-tuning takes about 60 minutes on an A10.2, so click Open logs in terminal to monitor the fine-tuning job’s logs.

  8. View the fine-tuned model’s learning curve in the Metrics section. A well-tuned model will have a Validation Loss curve that descends and then plateaus with increasing epoch. img.png

Description of the illustration learning_curve.png

Task 8: Deploy the Fine-Tuned LLM

  1. Navigate to AI Quick Actions > Fine-tuned models > <your just-tuned model> > Deploy with these settings:

    • Compute shape = VM.GPU.A10.1
    • Select your Log Group and Log
  2. Click Open logs in terminal to monitor deployment log

Task 9: Test the Fine-Tuned LLM’s Deployment

  1. After model deployment completes, navigate to Deployments > <your just-tuned model> > Test your model and playtest it using questions from the testing sample questions that are displayed in the prep_data.ipynb notebook, such as:

    What benefits does Unified Memory bring to complex data structures and classes?
    
    
  2. Copy/paste the model’s endpoint into your terminal session’s shell variable:

    endpoint=https://modeldeployment.<region>.oci.customer-oci.com/<model_ocid>/predict
    
  3. Send a prompt into the deployed model’s endpoint:

    prompt="What benefits does Unified Memory bring to complex data structures and classes?"
    request_body='{"model":"odsc-llm","prompt":"'$prompt'","max_tokens":100,"temperature":0.1,"top_k":50,"top_p":0.99,"stop":[],"frequency_penalty":0,"presence_penalty":0}'
    oci raw-request --http-method POST --target-uri $endpoint --request-body "$request_body" --auth resource_principal
    
  4. Use this bash loop to call the model endpoint 100 times:

    for i in $(seq 1 100); do
       oci raw-request --http-method POST --target-uri $endpoint --request-body "$request_body" --auth resource_principal &
       echo $i
       sleep 0.1
    done
    
  5. Click the fine-tuned/deployed model’s Log to view the recent traffic into that model’s endpoint

  6. Open the compare_models.ipynb Jupyter notebook and update paragraph [8] to refer to the endpoints for your two models, tuned and untuned.

  7. Execute that notebook, which will:

    • Read the testing sample of FAQ records.
    • Use python to feed five test questions into the endpoints of the fine-tuned and untuned model, and compare their responses.
    • Note paragraph [10] which illustrates how to call a deployed model’s endpoint using python, which is quite simple: img.png

      Description of the illustration call_endpoint.png

  8. Review the main findings from this test:

    • The fine-tuned LLM’s responses have a tone, style, and length that is fairly similar to the actual NVIDIA-composed FAQ answers.
    • The untuned LLM’s responses are much more verbose and include many extraneous statements that are likely incorrect.
    • Responses from the fine-tuned versus untuned LLM are incorrect more often than not, and about equally so.
    • Fine-tuning on a much larger dataset would likely increase the accuracy of its responses.

Task 10: Delete the Resources

  1. Navigate to AI Quick Actions > Deployments and delete your model deployments.

  2. Navigate to AI Quick Actions > Models > Fine-tuned models and delete.

  3. Use the OCI console page to navigate to your Data Science notebook session and Terminate.

  4. Click on Jobs and delete your fine-tuning jobs.

  5. Delete your Data Science Project.

  6. Use the OCI console page to navigate to your Object Storage bucket and delete it.

  7. Use the OCI console to delete your Log and Log Group.

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.