EvaluateRagAnswerCorrectness 2025.3.28.13-SNAPSHOT¶
BUNDLE¶
com.snowflake.openflow.runtime | runtime-rag-evaluation-processors-nar
DESCRIPTION¶
Evaluates the correctness of generated answers in a Retrieval-Augmented Generation (RAG) context by computing metrics such as F1 score, cosine similarity, and answer correctness. The processor uses an LLM (e.g., OpenAI’s GPT) to assess the generated answer against the ground truth.
INPUT REQUIREMENT¶
REQUIRED
Supports Sensitive Dynamic Properties¶
false
PROPERTIES¶
Property |
Description |
---|---|
Cosine Similarity Weight |
The weight to apply to the cosine similarity when calculating answer correctness (between 0.0 and 1.0) |
Evaluation Results Record Path |
The RecordPath to write the results of the evaluation to. |
F1 Score Weight |
The weight to apply to the F1 score when calculating answer correctness (between 0.0 and 1.0) |
Generated Answer Record Path |
The path to the answer field in the record |
Generated Answer Vector Record Path |
The path to the answer vector field in the record. |
Ground Truth Record Path |
The RecordPath to the ground truth field in the record. |
Ground Truth Vector Record Path |
The path to the ground truth vector field in the record. |
LLM Provider Service |
The provider service for sending evaluation prompts to LLM |
Question Record Path |
The RecordPath to the question field in the record. |
Record Reader |
The Record Reader to use for reading the FlowFile. |
Record Writer |
The Record Writer to use for writing the results. |
RELATIONSHIPS¶
NAME |
DESCRIPTION |
---|---|
failure |
FlowFiles that cannot be processed are routed to this relationship |
success |
FlowFiles that are successfully processed are routed to this relationship |
WRITES ATTRIBUTES¶
NAME |
DESCRIPTION |
---|---|
average.f1Score |
The average F1 score computed over all records. |
average.cosineSim |
The average cosine similarity between the ground truth and answer embeddings. |
average.answerCorrectness |
The average answer correctness score computed over all records. |
json.parse.failures |
Number of JSON parse failures encountered. |
USE CASES¶
Use this processor to assess the quality of answers generated by an LLM in comparison to ground truth answers, providing metrics that can be used for monitoring and improving the performance of RAG systems. |