ParseTableImage 2025.3.28.13-SNAPSHOT¶

BUNDLE¶

com.snowflake.openflow.runtime | runtime-document-layout-nar

DESCRIPTION¶

Extracts the text from a Table image and writes it to the FlowFile content in csv format.

TAGS¶

document, element, image, openflow, rag, retrieval augmented generation, table, unstructured

INPUT REQUIREMENT¶

REQUIRED

Supports Sensitive Dynamic Properties¶

false

PROPERTIES¶

Property

Description

Communication Timeout

The amount of time to wait for a response from the microservices before timing out.

Custom Table Structure Recognition Service URL

The Custom URL of the Openflow Table Structure Recognition Service.

MIME Type

The MIME Type of the image file.

OCR Confidence Threshold

The minimum confidence level required for a text block to be included in the output. Text blocks with a confidence level below this value will be excluded.

OCR Service

An OCR Service for reading files to output text.

Service Location Strategy

Determines how Service Locations are configured within this processor for the Table Structure Recognition Service.

RELATIONSHIPS¶

NAME

DESCRIPTION

table.not.found

If the processor determines that an input FlowFile does not contain a table, the original FlowFile will be routed to this relationship.

failure

If a FlowFile cannot be convert into a CSV, the input FlowFile will be routed to this relationship.

success

When the table text has been successfully extracted, the CSV representation of the text will be routed to this relationship.

comms.failure

If the processor is unable to communicate with one of the necessary services, the input FlowFile will be routed to this relationship.

WRITES ATTRIBUTES¶

NAME

DESCRIPTION

filename

The filename of the FlowFile.

mime.type

The MIME type of the FlowFile.

table.text.json

If the processor successfully extracts the table text, or if it is determined that the FlowFile does not contain a table, this attribute will be removed.

SEE ALSO¶