Create an Evaluation Data Set

Here’s an example of an evaluation data set:

Number Query Expected Answer
ACME_UTD_SPD_001 Is aromatherapy covered No, aromatherapy is not covered. According to the provided context, aromatherapy is listed under "Alternative Treatments" which are not covered by UnitedHealthcare Medical Plans.
ACME_UTD_SPD_002 do you pay for thermometers Based on the provided context, thermometers are not covered under the UnitedHealthcare Medical Plans. Specifically, the document titled "MEDICAL SUPPLIES AND APPLIANCES" lists thermometers as excluded supplies.
ACME_UTD_SPD_003 Is laser surgery for eyes covered by United Based on the provided context, laser surgery for eyes is not covered by UnitedHealthcare Medical Plans. The relevant document text under the "VISION" section states that surgery and other related treatments intended to correct nearsightedness, farsightedness, presbyopia, and astigmatism, including procedures such as radial keratotomy and laser surgery, are listed under plan exclusions. Therefore, these procedures are not covered by UnitedHealthcare Medical Plans.

Questions must be designed to test the agent’s ability to deal with various complexities in analyzing the source documents. Use the following guidelines to develop these questions:

  • Long Range Context - Some questions require information that’s scattered across distant sections of the document. Check if the Agent can successfully resolve such long-range dependencies that may extend across several pages.
  • Distributed Context - Ensure the Oracle Fusion AI Agent can gather information from multiple non-contiguous parts of the document to answer a question comprehensively. This tests the system’s ability to aggregate and synthesize information from diverse sections.
  • Concealed Context - Test whether the Oracle Fusion AI Agent can find and extract specific, obscure, or hard-to-spot details from deep within the text.
  • Reasoning - Check if the AI Agent can not only retrieve information but also apply reasoning to provide a correct answer.
  • Table-Sourced - Test the AI agent’s ability to interpret and pull accurate data from tables within the document.