EvaluateRagRetrieval 2025.10.9.21

Bundle

com.snowflake.openflow.runtime | runtime-rag-evaluation-processors-nar

Description

Calculates retrieval metrics (Precision@N, Recall@N, FScore@N, MAP@N, MRR) for a RAG system using an LLM as a judge. For each record, it uses both Precision and Recall prompts to evaluate the response, and adds the metrics as attributes to the FlowFile.

Tags

evaluation, fscore, llm, metrics, mrr, openai, openflow, precision, rag, recall, retrieval

Input Requirement

REQUIRED

Supports Sensitive Dynamic Properties

false

Properties

PropertyDescription
Context Identifier Record PathThe RecordPath to the array of contexts IDs in the record.
Context Record PathThe RecordPath to the array of contexts in the record.
Evaluation Results Record PathThe RecordPath to write the results of the evaluation to.
Ground Truth Record PathThe RecordPath to the ground truth field in the record.
LLM Provider ServiceThe provider service for sending evaluation prompts to LLM
Question Record PathThe RecordPath to the question field in the record.
Record ReaderThe Record Reader to use for reading the FlowFile.
Record WriterThe Record Writer to use for writing the results.

Relationships

NameDescription
failureFlowFiles that cannot be processed are routed to this relationship
successFlowFiles that are successfully processed are routed to this relationship

Writes attributes

NameDescription
nThe average number of retrieved documents per query.
precision.at.nThe average precision at N over all queries.
recall.at.nThe average recall at N over all queries.
fscore.at.nThe average F-Score at N over all queries.
mrrThe Mean Reciprocal Rank.
retrieval.eval.failuresNumber of records where the eval could not be calculated.
json.parse.failuresNumber of JSON parse failures encountered.