Categories:

String & binary functions (AI Functions)

AI_EXTRACTΒΆ

Extracts information from an input string or file.

SyntaxΒΆ

Extract information from an input string:

AI_EXTRACT( <text>, <responseFormat> )
Copy
AI_EXTRACT( text => <text>,
            responseFormat => <responseFormat> )
Copy

Extract information from a file:

AI_EXTRACT( <file>, <responseFormat> )
Copy
AI_EXTRACT( file => <file>,
            responseFormat => <responseFormat> )
Copy

ArgumentsΒΆ

text

An input string for extraction.

file

A FILE for extraction.

Supported file formats:

  • PDF

  • PNG

  • PPTX

  • EML

  • DOC, DOCX

  • JPEG, JPG

  • HTM, HTML

  • TEXT, TXT

  • TIF, TIFF

The files must be less than 100 MB in size.

responseFormat

Information to be extracted in one of the following response formats:

  • Simple object schema that maps the feature name and information to be extracted, for example:

    {'name': 'What is the last name of the employee?', 'address': 'What is the address of the employee?'}
    
  • An array of strings that contain the information to be extracted, for example:

    ['What is the last name of the employee?', 'What is the address of the employee?']
    
  • An array of arrays that contain two strings (feature name and the information to be extracted), for example:

    [['name', 'What is the last name of the employee?'], ['address', 'What is the address of the employee?']]
    
  • An array of strings that contain the feature name and the information to be extracted, separated with a colon (β€œ:”), for example:

    ['name: What is the last name of the employee?', 'address: What is the address of the employee?']
    

Note

You can either ask questions in natural language or describe information to be extracted (such as city, street, ZIP code), for example:

['address': 'City, street, ZIP', 'name': 'First and last name']

ReturnsΒΆ

A JSON object containing the extracted information.

Access control requirementsΒΆ

Users must use a role that has been granted the SNOWFLAKE.CORTEX_USER database role. For information about granting this privilege, see Required privileges.

Usage notesΒΆ

  • The following languages are supported:

    • Arabic

    • Bengali

    • Burmese

    • Cebuano

    • Chinese

    • Czech

    • Dutch

    • English

    • French

    • German

    • Hebrew

    • Hindi

    • Indonesian

    • Italian

    • Japanese

    • Khmer

    • Korean

    • Lao

    • Malay

    • Persian

    • Polish

    • Portuguese

    • Russian

    • Spanish

    • Tagalog

    • Thai

    • Turkish

    • Urdu

    • Vietnamese

  • The documents must be no more than 125 pages long.

  • The maximum number of features that can be extracted is 100.

  • The maximum output length is 512 tokens per question.

  • To extract a list, add List: at the beginning of each question. For example:

    [['languages', 'List: What languages are supported for AI_EXTRACT?']]
    
  • You can’t use a colon (β€œ:”) inside of a feature name when using the response format that uses a colon to separate the feature name and the information to be extracted, for example:

    ['location: Where does the employee live?', 'name:employee: What is the first name of the employee?']
    
  • Confidence scores are not supported.

ExamplesΒΆ

The following example extracts information from the input text:

SELECT AI_EXTRACT(
 text => 'John Smith lives in San Francisco and works for Snowflake',
 responseFormat => {'name': 'What is the first name of the employee?', 'city': 'What is the address of the employee?'}
);
Copy

The following example extracts and parses information from the input text:

SELECT AI_EXTRACT(
 text => 'John Smith lives in San Francisco and works for Snowflake',
 responseFormat => PARSE_JSON('{"name": "What is the first name of the employee?", "address": "What is the address of the employee?"}')
);
Copy

The following example extracts information from the document.pdf file:

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files','document.pdf'),
  responseFormat => [['name', 'What is the first name of the employee?'], ['city', 'Where does the employee live?']]
);
Copy

The following example extracts information from all files on a stage:

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', relative_path),
  responseFormat => [
    'What is this document?',
    'How would you classify this document?'
  ]
) FROM DIRECTORY (@db.schema.files);
Copy