Create an Evaluation Data Set
Here’s an example of an evaluation data set:
| Number | Query | Expected Answer |
|---|---|---|
| ACME_UTD_SPD_001 | Is aromatherapy covered | No, aromatherapy is not covered. According to the provided context, aromatherapy is listed under "Alternative Treatments" which are not covered by UnitedHealthcare Medical Plans. |
| ACME_UTD_SPD_002 | do you pay for thermometers | Based on the provided context, thermometers are not covered under the UnitedHealthcare Medical Plans. Specifically, the document titled "MEDICAL SUPPLIES AND APPLIANCES" lists thermometers as excluded supplies. |
| ACME_UTD_SPD_003 | Is laser surgery for eyes covered by United | Based on the provided context, laser surgery for eyes is not covered by UnitedHealthcare Medical Plans. The relevant document text under the "VISION" section states that surgery and other related treatments intended to correct nearsightedness, farsightedness, presbyopia, and astigmatism, including procedures such as radial keratotomy and laser surgery, are listed under plan exclusions. Therefore, these procedures are not covered by UnitedHealthcare Medical Plans. |
Questions must be designed to test the agent’s ability to deal with various complexities in analyzing the source documents. Use the following guidelines to develop these questions:
- Long Range Context - Some questions require information that’s scattered across distant sections of the document. Check if the Agent can successfully resolve such long-range dependencies that may extend across several pages.
- Distributed Context - Ensure the Oracle Fusion AI Agent can gather information from multiple non-contiguous parts of the document to answer a question comprehensively. This tests the system’s ability to aggregate and synthesize information from diverse sections.
- Concealed Context - Test whether the Oracle Fusion AI Agent can find and extract specific, obscure, or hard-to-spot details from deep within the text.
- Reasoning - Check if the AI Agent can not only retrieve information but also apply reasoning to provide a correct answer.
- Table-Sourced - Test the AI agent’s ability to interpret and pull accurate data from tables within the document.