- Categories:
String & binary functions (AI Functions)
AI_EXTRACTΒΆ
Extracts information from an input string or file.
SyntaxΒΆ
Extract information from an input string:
AI_EXTRACT( <text>, <responseFormat> )
AI_EXTRACT( text => <text>,
responseFormat => <responseFormat> )
Extract information from a file:
AI_EXTRACT( <file>, <responseFormat> )
AI_EXTRACT( file => <file>,
responseFormat => <responseFormat> )
ArgumentsΒΆ
text
An input string for extraction.
file
A FILE for extraction.
Supported file formats:
PDF
PNG
PPTX
EML
DOC, DOCX
JPEG, JPG
HTM, HTML
TEXT, TXT
TIF, TIFF
The files must be less than 100 MB in size.
responseFormat
Information to be extracted in one of the following response formats:
Simple object schema that maps the feature name and information to be extracted, for example:
{'name': 'What is the last name of the employee?', 'address': 'What is the address of the employee?'}
An array of strings that contain the information to be extracted, for example:
['What is the last name of the employee?', 'What is the address of the employee?']
An array of arrays that contain two strings (feature name and the information to be extracted), for example:
[['name', 'What is the last name of the employee?'], ['address', 'What is the address of the employee?']]
An array of strings that contain the feature name and the information to be extracted, separated with a colon (β:β), for example:
['name: What is the last name of the employee?', 'address: What is the address of the employee?']
Note
You can either ask questions in natural language or describe information to be extracted (such as city, street, ZIP code), for example:
['address': 'City, street, ZIP', 'name': 'First and last name']
ReturnsΒΆ
A JSON object containing the extracted information.
Access control requirementsΒΆ
Users must use a role that has been granted the SNOWFLAKE.CORTEX_USER database role. For information about granting this privilege, see Required privileges.
Usage notesΒΆ
The following languages are supported:
Arabic
Bengali
Burmese
Cebuano
Chinese
Czech
Dutch
English
French
German
Hebrew
Hindi
Indonesian
Italian
Japanese
Khmer
Korean
Lao
Malay
Persian
Polish
Portuguese
Russian
Spanish
Tagalog
Thai
Turkish
Urdu
Vietnamese
The documents must be no more than 125 pages long.
The maximum number of features that can be extracted is 100.
The maximum output length is 512 tokens per question.
To extract a list, add
List:
at the beginning of each question. For example:[['languages', 'List: What languages are supported for AI_EXTRACT?']]
You canβt use a colon (β:β) inside of a feature name when using the response format that uses a colon to separate the feature name and the information to be extracted, for example:
['location: Where does the employee live?', 'name:employee: What is the first name of the employee?']
Confidence scores are not supported.
ExamplesΒΆ
The following example extracts information from the input text:
SELECT AI_EXTRACT(
text => 'John Smith lives in San Francisco and works for Snowflake',
responseFormat => {'name': 'What is the first name of the employee?', 'city': 'What is the address of the employee?'}
);
The following example extracts and parses information from the input text:
SELECT AI_EXTRACT(
text => 'John Smith lives in San Francisco and works for Snowflake',
responseFormat => PARSE_JSON('{"name": "What is the first name of the employee?", "address": "What is the address of the employee?"}')
);
The following example extracts information from the document.pdf
file:
SELECT AI_EXTRACT(
file => TO_FILE('@db.schema.files','document.pdf'),
responseFormat => [['name', 'What is the first name of the employee?'], ['city', 'Where does the employee live?']]
);
The following example extracts information from all files on a stage:
SELECT AI_EXTRACT(
file => TO_FILE('@db.schema.files', relative_path),
responseFormat => [
'What is this document?',
'How would you classify this document?'
]
) FROM DIRECTORY (@db.schema.files);
Legal noticesΒΆ
Refer to Snowflake AI and ML for legal notices.