Run Evaluations

You need to evaluate each answer and give it a scoring.

  • Record the answer generated by the AI Agent for each question in the data set, .
  • Assign a correctness score for each generated answer, along with an explanation of why the score was assigned.

Scoring Guidelines

Follow these guidelines for scoring:

  • Your score must be between 1 and 5, where 1 is the lowest and 5 is the highest score.
  • If the generated answer is completely incorrect or not relevant to the user query, give a score of 1.
  • If the generated answer is relevant but contains mistakes, or is partially correct, give a score between 2 and 3.
  • If the generated answer is relevant and fully correct, give a score between 4 and 5.

Evaluation Example

Here's an example of an evaluation:

Number Query Expected Answer AI Agent Answer Correctness Score Comments
ACME_UPD_SPD_ 001 Is aromatherapy covered No, aromatherapy is not covered. According to the provided context, aromatherapy is listed under 'Alternative Treatments' which are not covered by UnitedHealthcare Medical Plans. No, aromatherapy is not covered. According to the given context, aromatherapy is listed under the 'Alternative Treatments' section of the plan exclusions, which means it is not covered by the plan. 5 The generated answer is relevant and fully correct, closely matching the reference answer in both content and structure, effectively conveying that aromatherapy is not covered under the plan due to its classification as an 'Alternative Treatment.'
ACME_UPD_SPD_ 002 Do you pay for thermometers Based on the provided context, thermometers are not covered under the UnitedHealthcare Medical Plans. Specifically, the document titled 'MEDICAL SUPPLIES AND APPLIANCES' lists thermometers as excluded supplies. There is no mention of thermometers in the provided context. Therefore, the answer is no, thermometers are not paid for. 4 The generated answer is relevant and correct, aligning with the reference answer's conclusion that thermometers are not covered. However, it lacks the specific detail provided in the reference answer about the source of this information, which could be useful for understanding the context and reliability of the statement.