Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Build Llama Optical Character Recognition Web Application using OCI Generative AI
Introduction
If you are a developer, cloud architect, or AI enthusiast who liked Llama Optical Character Recognition (OCR), this tutorial is for you. In this tutorial, you will build a simple, Llama OCR web application that:
-
Uses Oracle Cloud Infrastructure (OCI) Generative AI’s vision Large Language Models (LLMs) for Meta.
-
Extracts structured text from images (like receipts, scanned forms).
-
Runs locally on your machine with Streamlit.
-
Does not require any frontend coding.
Objectives
We will build a web user interface (UI) that allows you to:
-
Upload an image (receipt, invoice, screenshot) in the application.
-
Get the extracted Markdown output from the image using LLM.
-
View and copy the structured text.
Prerequisites
-
Configure Oracle Cloud Infrastructure Command Line Interface (OCI CLI) (
~/.oci/config
). -
Access to an OCI Generative AI service in the regions.
Regions with OCI Generative AI
Region Name Location Region Identifier Region Key Brazil East (Sao Paulo) Sao Paulo sa-saopaulo-1 GRU Germany Central (Frankfurt) Frankfurt eu-frankfurt-1 FRA Japan Central (Osaka) Osaka ap-osaka-1 KIX UAE East (Dubai) Dubai me-dubai-1 DXB UK South (London) London uk-london-1 LHR US Midwest (Chicago) Chicago us-chicago-1 ORD -
Deploy a vision capable model (like
meta.llama-3.2-90b-vision-instruct
,llama 4
). -
Install Python
version 3.8
or later and required Python packages.
Task 1: Download Python Code and Set up Config File
-
Download the code from here:
llama-ocr-oci.py
-
Make sure you have the correct config profile configured in the file
~/.oci/config
with a name for it. For example,OCI_PROFILE
.
Task 2: Set up a Virtual Environment
Creating a virtual environment helps isolate dependencies and ensures your Streamlit OCR app does not interfere with other Python projects on your system.
-
Windows: Run the following commands.
-
Open the Command Prompt (
cmd
) or PowerShell and navigate to your project folder.cd path\\to\\your\\project
-
Create a virtual environment.
python -m venv venv
-
Activate the virtual environment.
venv\\Scripts\\activate
-
Install dependencies.
pip install streamlit oci
-
-
macOS/Linux: Run the following command.
-
Open Terminal and navigate to your project directory.
cd ~/path/to/your/project
-
Create a virtual environment.
python3 -m venv venv
-
Activate the virtual environment.
source venv/bin/activate
-
Install dependencies.
pip install streamlit oci
-
Task 3: Launch the Application
Run the following command to launch the application.
streamlit run ocr_vision_app.py
You should see the application launch in your browser.
Task 4: Upload an Image and Extract the Text
-
In Select OCI Config Profile, select your config profile from the drop-down menu.
-
In Enter Compartment OCID, enter the compartment Oracle Cloud Identifier (OCID) where you have the access to the OCI Generative AI service.
-
In Select Vision Model, select a model.
-
Click Upload and select an image (receipt, invoice, screenshot).
The application will process the image and display the extracted text.
Related Links
Acknowledgments
- Authors - Mukund Murali (Principal Cloud Architect)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Build Llama Optical Character Recognition Web Application using OCI Generative AI
G35920-01
Copyright ©2025, Oracle and/or its affiliates.