PerformSnowflakeCortexOCR 2025.3.28.13-SNAPSHOT¶

BUNDLE¶

com.snowflake.openflow.runtime | runtime-snowflake-processors-nar

DESCRIPTION¶

Performs Optical Character Recognition (OCR) on PDF documents using Snowflake Cortex ML functions. Documents must be staged in a Snowflake internal stage with server-side encryption enabled. The processor extracts text content from PDFs and can output the results either as FlowFile content or as an attribute.

TAGS¶

ai, cortex, document, ml, ocr, openflow, pdf, snowflake

INPUT REQUIREMENT¶

REQUIRED

Supports Sensitive Dynamic Properties¶

false

PROPERTIES¶

Property

Description

Database

The Snowflake database containing the stage

Filename

The filename of the file to perform OCR on, it must be uploaded to the stage prior to performing ocr.FlowFile attributes may be referenced via Expression Language.

Max Attribute Size

The maximum size of the ocr results that can written to an attribute. If the ocr results are larger than this, it will be routed to ‘failure’.

OCR Mode

Specifies how document text and structure should be extracted. In ‘OCR’ mode, only raw text content is extracted, ignoring formatting and table structures. In ‘LAYOUT’ mode, the output preserves table structures as markdown.

Output Strategy

Determines response output destination

Results Attribute

The name of the attribute to write the response to.

Schema

The Snowflake schema containing the stage

Snowflake Connection Service

Database Connection Service for accessing Snowflake

Stage

The Snowflake stage where PDFs will be temporarily stored. The stage must have server-side encryption enabled.FlowFile attributes may be referenced via Expression Language

RELATIONSHIPS¶

NAME

DESCRIPTION

failure

FlowFiles that cannot be processed are routed to this relationship

success

FlowFiles that are successfully processed are routed to this relationship

WRITES ATTRIBUTES¶

NAME

DESCRIPTION

mime.type

The MIME type of the output content (text/plain when output strategy is FLOW_FILE)

SEE ALSO¶