Troubleshooting Document AI

The following scenarios can help you troubleshoot issues that might occur when working with Document AI.

Extracting query is not working

For the extracting query to work, you must store the documents for extraction in either an internal or external stage. Ensure that you specify the SNOWFLAKE_SSE encryption type when you create an internal stage.

Error

Depending on the document format, you might get an error such as one of the following:

{   "__processingErrors": [     "File extension does not match actual mime type. Mime-Type: application/octet-stream"   ] }
{   "__processingErrors": [     "cannot identify image file <_io.BytesIO object at 0x7f8a800ba020>"   ] }

Cause

You didn’t specify the SNOWFLAKE_SSE encryption type when you created internal stage to store documents.

Solution

To create an internal stage, run the CREATE STAGE command as shown in the following example:

CREATE STAGE doc_ai_stage
  DIRECTORY = (ENABLE = TRUE)
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');
Copy

Presigned URL has expired

The presigned URL of the staged documents is a required argument to <model_build_name>!PREDICT. To get the presigned URL, call the GET_PRESIGNED_URL function, which has the default expiration time.

For more information, see GET_PRESIGNED_URL.

Error

{ "__processingErrors": [ "Received HTTP 403 response for presigned URL. URL may be expired." ] }

Cause

Presigned URL has expired.

Solution

Either reduce the number of documents in one query, or extend the expiration time. For more information about extending the expiration time, see GET_PRESIGNED_URL.

Too many documents in one query

Document AI has a limitation on the number of documents processed in one extracting query. For more information, see Known limitations to Document AI.

Error

{ "__processingErrors": [ "Query limit reached: too many documents in a single query." ] }

Cause

You tried to process too many documents in one query.

Solution

Use several queries to process the documents.

Documents don’t meet specific requirements

The documents you process with Document AI must meet specific requirements. For more information, see Prepare your documents for Document AI.

Error

You might get one of the following errors:

{ "__processingErrors": [ "Page 0 size is larger than the limit. Actual: 1083 mm x 1384 mm. Maximum: 1200 mm x 1200 mm." ] }
{ "__processingErrors": [ "Document has too many pages. Actual: 150. Maximum: 125." ] }
{ "__processingErrors": [ "Image size is too small. Actual: 20x20 px. Minimum: 50x50 px." ] }
{ "__processingErrors": [ "Unsupported file format. Actual: csv. Supported: docx, eml, htm, html, jpeg, jpg, pdf, png, text, tif, tiff, txt." ] }
{ "__processingErrors": [ "File exceeds maximum size. Actual: 54096026 bytes. Maximum: 50000000 bytes." ] }

Cause

The documents attempted to process don’t meet the requirements of Document AI. For more information about the requirements, see Prepare your documents for Document AI.

Solution

Prepare your documents to meet the requirements.

The Document AI model build was not published

To extract information with Document AI, you need to have the Document AI model build published. You don’t need to publish the model build if you trained the model and didn’t add new data values (ask new questions) after the training.

Error

The error message starts with the following:

Request failed for external function DOCUMENT_EXTRACT_FEATURES$V1 with remote service error: 422

Cause

The Document AI model build was not published.

Solution

Publish the Document AI model build. For more information, see Publish a Document AI model build.

Required privileges are not granted or the model build name is duplicated

To create a Document AI model build, you must grant the required privileges to your role, and choose a unique model build name.

For more information on required privileges, see Document AI access control.

Error

Unable to create a build on the specified database and schema. Please check the documentation to learn more.

Cause

Possible causes are:

  • The CREATE SNOWFLAKE.ML.DOCUMENT_INTELLIGENCE privilege is not granted to your role.

  • The model build name already exists in the database and schema.

Solution