4.2 Impact of AI Enrichment: Practical Examples
See how enriching your schema directly impacts LLM effectiveness through these examples.
Example 1: Healthcare Events-Tracking Schema
This example uses a schema designed to track healthcare events, focusing on doctor activity and the specific procedures performed during patient events.
- The
Practitionertable: Stores information about doctors, including their name, specialization, and status. - The
Eventtable: Records high-level patient appointments, including when and where they occurred and the responsible doctor. - The
EventDetailtable: Logs individual procedures (like surgery) performed during an event, capturing the procedure description, exact time stamp, and duration.
CREATE TABLE Practitioner (
practitioner_id NUMBER PRIMARY KEY,
nm VARCHAR2(80),
speciality VARCHAR2(40), -- This represents 'department'
status VARCHAR2(20) -- e.g., 'Active', 'Inactive'
);
CREATE TABLE Event (
event_id NUMBER PRIMARY KEY,
dt DATE, -- Date or time of the overall event
practitioner_id NUMBER,
location VARCHAR2(40) -- e.g., 'OR1', 'ER', 'ICU'
);
CREATE TABLE EventDetail (
event_id NUMBER,
act_desc VARCHAR2(200),
act_time TIMESTAMP,
duration_minutes NUMBER, -- Duration of this action or procedure
CONSTRAINT FK_EVENT_ID FOREIGN KEY (event_id) REFERENCES Event(event_id)
);
- The term
practitioneris synonymous with doctor. - The
specialitycolumn represents the doctor's department. - The
act_desccolumn in theEventDetailtable records the procedure performed, such as surgery. - The
act_timecolumn in theEventDetailtable provides the time stamp for time-based filtering of procedures.
Business query
Consider a hospital administrator asking the following question to an LLM:
"For each doctor, display their name, department, and
the count of surgeries performed after 8 PM within the last
six months."
Without AI enrichment
Without any annotations, the LLM must interpret the raw schema. This could lead to a flawed query like the one below:
SELECT
PRAC.nm,
PRAC.speciality,
COUNT(EVT.event_id) AS surgery_count
FROM
Practitioner PRAC
JOIN
Event EVT
ON PRAC.practitioner_id = EVT.practitioner_id
WHERE
EVT.dt >= ADD_MONTHS(SYSDATE, -6)
AND EXTRACT(HOUR FROM EVT.dt) >= 20
GROUP BY
PRAC.nm,
PRAC.speciality;
- It counts
EVT.event_id, incorrectly tallying all appointments after 8 PM instead of only the surgeries. - It filters on
EVT.dt, which is the start time of an appointment and not of a specific procedure. A 1 PM appointment could include a 9 PM surgery, which this query would miss. - It misses a required join with the
EventDetailtable.
With AI enrichment
- Add an
ALIASannotation to thePractitionertable, identifying it asdoctor. - Add an
ALIASannotation to thePractitioner.specialitycolumn, identifying it asdepartment. - Add
DESCRIPTIONandSAMPLE VALUESannotations to theEventDetail.act_desccolumn to clarify that it contains specific procedures performed during an event, such as surgery or consultation.
With this enriched context, the LLM can generate a far more accurate and reliable query:
SELECT
PRAC.nm,
PRAC.speciality,
COUNT(*) AS surgery_count
FROM
Practitioner PRAC
JOIN
Event EVT
ON PRAC.practitioner_id = EVT.practitioner_id
JOIN
EventDetail ED
ON EVT.event_id = ED.event_id
WHERE
ED.act_desc = 'Surgery'
AND ED.act_time >= ADD_MONTHS(SYSDATE, -6)
AND EXTRACT(HOUR FROM ED.act_time) >= 20
GROUP BY
PRAC.nm,
PRAC.speciality;
- Now, it performs the necessary join to the
EventDetailtable, which is essential to find specific procedures. - It filters on
ED.act_desc = 'Surgery', correctly understanding from the annotation that surgery is a type of procedure. - It uses the correct time stamp,
ED.act_time, to find surgeries that actually occurred after 8 PM.
As this example illustrates, annotations bridge the gap between business terminology and the physical schema, guiding the LLM to understand data relationships and generate correct SQL.
Example 2: Human Resources Schema
For this example, we extend the HR sample schema by adding
two new columns to the EMPLOYEES table:
salary_conv and
hire_dt.
Schema interpretation notes
In the EMPLOYEES table:
- The
salarycolumn represents employee compensation in local currency. - The
salary_convcolumn represents employee compensation converted to United States Dollars (USD). - The
hire_datecolumn represents the official start date of the employees. - The
hire_dtcolumn represents the system entry date (the timestamp when the record was originally created).
Business query
Consider a business analyst asking the following question to an LLM:
List each department and the average salary in USD
of employees hired after 2020. Order the result by average
salary, highest first.
Without AI enrichment
Without any annotations, the LLM can only read the raw table and column names in the schema. It may guess at names and joins, returning a flawed query:
SELECT
d.department_name,
AVG(e.salary) AS avg_salary
FROM
EMPLOYEES e
JOIN
DEPARTMENTS d
ON e.manager_id = d.manager_id
WHERE
e.hire_dt > DATE '2020-01-01'
GROUP BY
d.department_name
ORDER BY
avg_salary DESC;
- It selects
salaryandhire_dt, incorrectly retrieving ambiguous data (such as local currency and system timestamps) instead of the required columns. - It joins on
manager_id, which is not the correct key for this specific relationship. This creates a logical mismatch.
With AI enrichment
- Add a
UNITSannotation to thesalary_convcolumn to specify the currency type as USD. - Add a
JOIN COLUMNannotation to thed.department_idcolumn, identifying its preferred join partner as thee.department_idcolumn. - Add a
DESCRIPTIONannotation to thehire_datecolumn to explicitly identify it as the official hire date field.
With these annotations, the generated query is more accurate:
SELECT
d.department_name,
AVG(e.salary_conv) AS avg_salary_conv
FROM
EMPLOYEES e
JOIN
DEPARTMENTS d
ON e.department_id = d.department_id
WHERE
e.hire_date > DATE '2020-01-01'
GROUP BY
d.department_name
ORDER BY
avg_salary_conv DESC;
The improvements here are significant:
- The model uses the
salary_convcolumn instead of thesalaryfield, guided by theUNITSannotation. - It chooses the correct join column
(
department_id) as specified in theJOIN COLUMNannotation. - It selects the correct column
(
hire_date) for the date predicate, avoiding system timestamps.
Key Takeaways
As illustrated in these examples, annotations enhance query accuracy by guiding the LLM to select correct join columns, apply appropriate units, use valid predicates, decipher object names, and more.
When you apply annotations consistently, you see a faster turnaround for data questions, fewer faulty joins, and a reduction in production defects from incorrect SQL. For an LLM, each annotation reduces the search space of possible query plans, allowing it to focus on the exact pattern you intended.
For organizations with strict compliance rules, annotations also allow administrators to control the exact contextual information shared with an LLM, which helps reduce the unintentional disclosure of sensitive data that may be present in free-form column comments.