Impact of AI Enrichment: Practical Examples

4.2 Impact of AI Enrichment: Practical Examples

See how enriching your schema directly impacts LLM effectiveness through these examples.

Example 1: Healthcare Events-Tracking Schema

This example uses a schema designed to track healthcare events, focusing on doctor activity and the specific procedures performed during patient events.

The schema consists of the following three tables:

The Practitioner table: Stores information about doctors, including their name, specialization, and status.
The Event table: Records high-level patient appointments, including when and where they occurred and the responsible doctor.
The EventDetail table: Logs individual procedures (like surgery) performed during an event, capturing the procedure description, exact time stamp, and duration.

CREATE TABLE Practitioner (
    practitioner_id NUMBER PRIMARY KEY,
    nm VARCHAR2(80),
    speciality VARCHAR2(40),    -- This represents 'department'
    status VARCHAR2(20)         -- e.g., 'Active', 'Inactive'
);

CREATE TABLE Event (
    event_id NUMBER PRIMARY KEY,
    dt DATE,                    -- Date or time of the overall event
    practitioner_id NUMBER,
    location VARCHAR2(40)       -- e.g., 'OR1', 'ER', 'ICU'
);

CREATE TABLE EventDetail (
    event_id NUMBER,
    act_desc VARCHAR2(200),
    act_time TIMESTAMP,
    duration_minutes NUMBER,    -- Duration of this action or procedure
    CONSTRAINT FK_EVENT_ID FOREIGN KEY (event_id) REFERENCES Event(event_id)
);

Schema interpretation notes

The term practitioner is synonymous with doctor.
The speciality column represents the doctor's department.
The act_desc column in the EventDetail table records the procedure performed, such as surgery.
The act_time column in the EventDetail table provides the time stamp for time-based filtering of procedures.

Business query

Consider a hospital administrator asking the following question to an LLM:

"For each doctor, display their name, department, and the count of surgeries performed after 8 PM within the last six months."

Without AI enrichment

Without any annotations, the LLM must interpret the raw schema. This could lead to a flawed query like the one below:

SELECT
    PRAC.nm,
    PRAC.speciality,
    COUNT(EVT.event_id) AS surgery_count
FROM
    Practitioner PRAC
JOIN
    Event EVT 
    ON PRAC.practitioner_id = EVT.practitioner_id
WHERE
    EVT.dt >= ADD_MONTHS(SYSDATE, -6)
    AND EXTRACT(HOUR FROM EVT.dt) >= 20
GROUP BY
    PRAC.nm,
    PRAC.speciality;

This query has several critical errors:

It counts EVT.event_id, incorrectly tallying all appointments after 8 PM instead of only the surgeries.
It filters on EVT.dt, which is the start time of an appointment and not of a specific procedure. A 1 PM appointment could include a 9 PM surgery, which this query would miss.
It misses a required join with the EventDetail table.

With AI enrichment

By applying the following annotations, you can provide the LLM with the necessary semantic context.

Add an ALIAS annotation to the Practitioner table, identifying it as doctor.
Add an ALIAS annotation to the Practitioner.speciality column, identifying it as department.
Add DESCRIPTION and SAMPLE VALUES annotations to the EventDetail.act_desc column to clarify that it contains specific procedures performed during an event, such as surgery or consultation.

With this enriched context, the LLM can generate a far more accurate and reliable query:

SELECT
    PRAC.nm,
    PRAC.speciality,
    COUNT(*) AS surgery_count
FROM
    Practitioner PRAC
JOIN
    Event EVT 
    ON PRAC.practitioner_id = EVT.practitioner_id
JOIN
    EventDetail ED 
    ON EVT.event_id = ED.event_id
WHERE
    ED.act_desc = 'Surgery'
    AND ED.act_time >= ADD_MONTHS(SYSDATE, -6)
    AND EXTRACT(HOUR FROM ED.act_time) >= 20
GROUP BY
    PRAC.nm,
    PRAC.speciality;

Annotations markedly improve the query:

Now, it performs the necessary join to the EventDetail table, which is essential to find specific procedures.
It filters on ED.act_desc = 'Surgery', correctly understanding from the annotation that surgery is a type of procedure.
It uses the correct time stamp, ED.act_time, to find surgeries that actually occurred after 8 PM.

As this example illustrates, annotations bridge the gap between business terminology and the physical schema, guiding the LLM to understand data relationships and generate correct SQL.

Example 2: Human Resources Schema

For this example, we extend the HR sample schema by adding two new columns to the EMPLOYEES table: salary_conv and hire_dt.

Schema interpretation notes

In the EMPLOYEES table:

The salary column represents employee compensation in local currency.
The salary_conv column represents employee compensation converted to United States Dollars (USD).
The hire_date column represents the official start date of the employees.
The hire_dt column represents the system entry date (the timestamp when the record was originally created).

Business query

Consider a business analyst asking the following question to an LLM:

List each department and the average salary in USD of employees hired after 2020. Order the result by average salary, highest first.

Without AI enrichment

Without any annotations, the LLM can only read the raw table and column names in the schema. It may guess at names and joins, returning a flawed query:

SELECT 
    d.department_name, 
    AVG(e.salary) AS avg_salary 
FROM 
    EMPLOYEES e 
JOIN 
    DEPARTMENTS d 
    ON e.manager_id = d.manager_id
WHERE 
    e.hire_dt > DATE '2020-01-01' 
GROUP BY 
    d.department_name
ORDER BY 
    avg_salary DESC;

This query has several issues:

It selects salary and hire_dt, incorrectly retrieving ambiguous data (such as local currency and system timestamps) instead of the required columns.
It joins on manager_id, which is not the correct key for this specific relationship. This creates a logical mismatch.

With AI enrichment

By applying the following annotations, you can provide the LLM with the necessary semantic context.

Add a UNITS annotation to the salary_conv column to specify the currency type as USD.
Add a JOIN COLUMN annotation to the d.department_id column, identifying its preferred join partner as the e.department_id column.
Add a DESCRIPTION annotation to the hire_date column to explicitly identify it as the official hire date field.

With these annotations, the generated query is more accurate:

SELECT 
    d.department_name, 
    AVG(e.salary_conv) AS avg_salary_conv
FROM 
    EMPLOYEES e
JOIN 
    DEPARTMENTS d 
    ON e.department_id = d.department_id
WHERE 
    e.hire_date > DATE '2020-01-01'
GROUP BY 
    d.department_name
ORDER BY 
    avg_salary_conv DESC;

The improvements here are significant:

The model uses the salary_conv column instead of the salary field, guided by the UNITS annotation.
It chooses the correct join column (department_id) as specified in the JOIN COLUMN annotation.
It selects the correct column (hire_date) for the date predicate, avoiding system timestamps.

Key Takeaways

As illustrated in these examples, annotations enhance query accuracy by guiding the LLM to select correct join columns, apply appropriate units, use valid predicates, decipher object names, and more.

When you apply annotations consistently, you see a faster turnaround for data questions, fewer faulty joins, and a reduction in production defects from incorrect SQL. For an LLM, each annotation reduces the search space of possible query plans, allowing it to focus on the exact pattern you intended.

For organizations with strict compliance rules, annotations also allow administrators to control the exact contextual information shared with an LLM, which helps reduce the unintentional disclosure of sensitive data that may be present in free-form column comments.