<model_build_name>!PREDICT

Extracts information from documents in a stage, and provides answers in a JSON object. If you specify a single document, the method returns results for that document. Otherwise, the method returns results for each document in the stage.

Syntax

<model_build_name>!PREDICT(<presigned_url>,
                           [ <model_build_version> ]
                          )
Copy

Arguments

Required:

presigned_url

Pre-signed URL of the staged documents.

To get the pre-signed URL to pass in as an argument, call the GET_PRESIGNED_URL function. See GET_PRESIGNED_URL.

For more information, see Example.

Note

The GET_PRESIGNED_URL function has a default expiration time (60 minutes). For more information about extending the expiration time, see GET_PRESIGNED_URL.

Optional:

model_build_version

Version of the Document AI model build.

If not specified, the latest available model build version is used by default.

Returns

Returns a JSON object with the following fields:

ocrScore

Specifies the confidence score for the optical character recognition (OCR) process.

score

Specifies the confidence score for a specific value.

value

Specifies the extracted answer to the question.

{
  "__documentMetadata": {
    "ocrScore": 0.918
  },
  "invoice_number": [
    {
      "score": 0.925,
      "value": "123/20"
    }
  ],
  "invoice_items": [
    {
      "score": 0.839,
      "value": "NEW CRUSHED VELVET DIVAN BED"
    },
    {
      "score": 0.839,
      "value": "Vintage Radiator"
    },
    {
      "score": 0.839,
      "value": "Solid Wooden Worktop"
    },
    {
      "score": 0.839,
      "value": "Sienna Crushed Velvet Curtains"
    }
  ],
  "tax_amount": [
    {
      "score": 0.879,
      "value": "77.57"
    }
  ],
  "total_amount": [
    {
      "score": 0.809,
      "value": "465.43 GBP"
    }
  ],
  "buyer_name": [
    {
      "score": 0.925
    }
  ]
  "vendor_name": [
    {
      "score": 0.9,
      "value": "UK Exports & Imports Ltd"
    }
  ]
}
Copy

Access control requirements

To extract information with Document AI, you must use an account role that is granted the SNOWFLAKE.DOCUMENT_INTELLIGENCE_CREATOR database role. For more information, see Document AI access control.

Usage notes

  • Ensure you meet the prerequisites for using this method. For more information, see Prerequisites.

  • Document AI has a limitation for the number of documents processed in one query. For more information, see Known limitations to Document AI.

  • All documents must be in the same directory of the stage.

  • Document AI uses directory tables. For more information, see Querying directory tables.

  • If the Document AI model does not find an answer in the document, the model does not return a value key. However, it does return the score key, which indicates how confident the model is that the document does not contain the answer. See the buyer_name field as an example.

  • The Document AI model can return lists. See the invoice_items field as an example.

Example

The following example extracts information from all of the documents on the pdf_inspections_stage stage for version 1 of the inspections model build:

SELECT inspections!PREDICT(
  GET_PRESIGNED_URL(@pdf_inspections_stage, RELATIVE_PATH), 1)
  FROM DIRECTORY(@pdf_inspections_stage);
Copy

The following example extracts information from the 'paystubs/paystub01.pdf' document on the pdf_paystubs_stage stage for version 1 of the paystubs model build:

SELECT paystubs!PREDICT(
  GET_PRESIGNED_URL(@pdf_paystubs_stage, 'paystubs/paystub01.pdf'), 1);
Copy