March 06, 2025 — Cortex AI PARSE_DOCUMENT function for OCR – General Availability

Snowflake is pleased to announce the General Availability of Snowflake Cortex AI PARSE_DOCUMENT’s OCR mode, enabling customers to accurately extract text and data from millions of document pages. This SQL function is fully-managed, offering OCR quality on par with other cloud providers in combination with the scalability, performance, and ease of use of Snowflake. PARSE_DOCUMENT OCR extracts text content from PDF, DOCX, and PPTX files stored in a Snowflake or external stage using SQL, without requiring a complex cloud architecture.

The Cortex AI PARSE_DOCUMENT OCR mode enables:

  • Text extraction from both digital-born and scanned documents.

  • High-quality extraction for documents in English, German, French, Italian, Norwegian, Polish, Portuguese, Spanish, and Swedish.

  • Seamless integration with RAG pipelines powering Cortex Search, and with Cortex AI Functions for document summarization, translation, and entity extraction.

  • Automatic page orientation detection.

For details, see Cortex PARSE_DOCUMENT.