Cortex AI Functions: Image extraction with AI_PARSE_DOCUMENT¶
AI_PARSE_DOCUMENT is a Cortex AI function that extracts text, data, layout elements, and images, from PDFs, Word documents, and images. Use this high-fidelity image extraction capability to power advanced, multimodal document processing workflows, such as:
Enrich data: Extract images from documents to add visual context for deeper insights.
Multimodal RAG: Combine images and text for retrieval-augmented generation (RAG) to improve model responses.
Image classification: Use extracted images with AI_EXTRACT or AI_COMPLETE for automatic tagging and analysis.
Knowledge bases: Build richer repositories by including both text and images for better search and reasoning.
Compliance: Extract and analyze images (e.g., charts, signatures) for regulatory and audit workflows.
For an introduction to AI_PARSE_DOCUMENT, see Parsing documents with AI_PARSE_DOCUMENT.
Using AI_PARSE_DOCUMENT to extract images¶
To extract images from a document using AI_PARSE_DOCUMENT:
Set the
'mode'option to'LAYOUT'. Image extraction requires LAYOUT mode.Set the
'extract_images'option to TRUE.
AI_PARSE_DOCUMENT image extraction returns an array, images, in the JSON output. Each element of images contains a
field, image_base64, with the extracted image data encoded as a base64 string. Image OBJECT_CONSTRUCT also contains fields
for a unique ID and image bounding boxes.
You can decode the images using BASE64_DECODE_BINARY, then pass them directly to AI_EXTRACT to process or describe the image contents. Alternatively, you can store them in a stage for processing using multimodal AI_COMPLETE. (AI_COMPLETE does not currently support direct image input.)
Examples¶
Extract and describe images¶
After extracting image data, you can use AI_EXTRACT to process or describe the image content. The following example generates a description for the first extracted image after converting it to binary from base64. (AI_EXTRACT requires binary input.) The query uses a regular expression to strip the metadata (schema and format) from the base64 string.
Store extracted images in a stage¶
You can store extracted images from documents in a Snowflake stage for reuse, auditing, or additional processing with other Cortex AI functions. This example creates and uses a Python stored procedure to decode base64 image data from AI_PARSE_DOCUMENT and upload the resulting image files to a specified stage.
After creating the SAVE_EXTRACTED_IMAGES procedure, you can call it to extract images from a document and store them in a stage, as shown in the following code snippet:
The output of this query is a list of file paths for the images stored in the specified stage, such as:
Now you can process the stored images using other Cortex AI functions, such as AI_COMPLETE for multimodal analysis or generation.
Response:
Cost considerations¶
AI_PARSE_DOCUMENT uses billing based on the number of pages processed. A single image file is considered to be a page for billing purposes. Extracting images does not incur additional costs.
Current limitations¶
No more than fifty images can be extracted from a single document. Additional images are ignored.
Images smaller than 4x4 pixels are not extracted.
If the size of a response exceeds the account parameter EXTERNAL_FUNCTION_MAx_RESPONSE_SIZE, the function returns an error. Increase the value of this parameter if necessary.