PerformSnowflakeCortexOCR 2025.10.9.21

Bundle

com.snowflake.openflow.runtime | runtime-snowflake-processors-nar

Description

Performs Optical Character Recognition (OCR) on PDF documents using Snowflake Cortex ML functions. Documents must be staged in a Snowflake internal stage with server-side encryption enabled. The processor extracts text content from PDFs and can output the results either as FlowFile content or as an attribute.

Tags

ai, cortex, document, ml, ocr, openflow, pdf, snowflake

Input Requirement

REQUIRED

Supports Sensitive Dynamic Properties

false

Properties

PropertyDescription
DatabaseThe Snowflake database containing the stage
FilenameThe filename of the file to perform OCR on, it must be uploaded to the stage prior to performing OCR. FlowFile attributes may be referenced via Expression Language.
Max Attribute SizeThe maximum size of the OCR results that can be written to an attribute. If the OCR results exceed this, the FlowFile will be routed to failure.
OCR ModeSpecifies how document text and structure should be extracted. In ‘OCR’ mode, only raw text content is extracted, ignoring formatting and table structures. In ‘LAYOUT’ mode, the output preserves table structures as markdown.
Output StrategyDetermines response output destination
Results AttributeThe name of the attribute to write the OCR response to.
SchemaThe Snowflake schema containing the stage
Snowflake Connection ServiceDatabase Connection Service for accessing Snowflake
StageThe Snowflake stage where PDFs will be temporarily stored. The stage must have server-side encryption enabled. FlowFile attributes may be referenced via Expression Language

Relationships

NameDescription
emptyFlowFiles for which OCR results are empty
failureFlowFiles that cannot be processed are routed to this relationship
successFlowFiles that are successfully processed (with non-empty OCR results) are routed to this relationship

Writes attributes

NameDescription
mime.typeThe MIME type of the output content (text/plain when output strategy is FLOW_FILE)
snowflake.error.informationContains error information if Snowflake Cortex OCR operation returns an error

See also