March 06, 2025 — Cortex AI PARSE_DOCUMENT function for OCR – General Availability¶
Snowflake is pleased to announce the General Availability of Snowflake Cortex AI PARSE_DOCUMENT’s OCR mode, enabling customers to accurately extract text and data from millions of document pages. This SQL function is fully-managed, offering OCR quality on par with other cloud providers in combination with the scalability, performance, and ease of use of Snowflake. PARSE_DOCUMENT OCR extracts text content from PDF, DOCX, and PPTX files stored in a Snowflake or external stage using SQL, without requiring a complex cloud architecture.
The Cortex AI PARSE_DOCUMENT OCR mode enables:
Text extraction from both digital-born and scanned documents.
High-quality extraction for documents in English, German, French, Italian, Norwegian, Polish, Portuguese, Spanish, and Swedish.
Seamless integration with RAG pipelines powering Cortex Search, and with Cortex AI Functions for document summarization, translation, and entity extraction.
Automatic page orientation detection.
For details, see Cortex PARSE_DOCUMENT.