- Categories:
- String & binary functions (AI Functions) 
AI_EXTRACT¶
Extracts information from an input string or file.
Syntax¶
Extract information from an input string:
AI_EXTRACT( <text>, <responseFormat> )
AI_EXTRACT( text => <text>,
            responseFormat => <responseFormat> )
Extract information from a file:
AI_EXTRACT( <file>, <responseFormat> )
AI_EXTRACT( file => <file>,
            responseFormat => <responseFormat> )
Arguments¶
- text
- An input string for extraction. 
- file
- A FILE for extraction. - Supported file formats: - PDF 
- PNG 
- PPTX, PPT 
- EML 
- DOC, DOCX 
- JPEG, JPG 
- HTM, HTML 
- TEXT, TXT 
- TIF, TIFF 
- BMP, GIF, WEBP 
- MD 
 - The files must be less than 100 MB in size. 
- responseFormat
- Information to be extracted in one of the following response formats: - Simple object schema that maps the label and information to be extracted; for example: - {'name': 'What is the last name of the employee?', 'address': 'What is the address of the employee?'}
- An array of strings that contain the information to be extracted, for example: - ['What is the last name of the employee?', 'What is the address of the employee?']
- An array of arrays that contain two strings (label and the information to be extracted); for example: - [['name', 'What is the last name of the employee?'], ['address', 'What is the address of the employee?']]
- A JSON schema that defines the structure of the extracted information. Supports entity and table extraction. For example: - { 'schema': { 'type': 'object', 'properties': { 'income_table': { 'description': 'Income for FY2026Q2', 'type': 'object', 'properties': { 'month': { 'description': 'Month', 'type': 'array' }, 'income': { 'description': 'Income', 'type': 'array' } } }, 'title': { 'description': 'What is the title of the document?', 'type': 'string' }, 'employees': { 'description': 'What are the names of employees?', 'type': 'array' } } } } - Note - You can’t combine the JSON schema format with other response formats. If - responseFormatcontains the- schemakey, you must define all questions within the JSON schema. Additional keys are not supported.
- The model only accepts certain shapes of JSON schema. Top level type must always be an object, which contains independently extracted sub-objects. Sub-objects may be a table (object of lists of strings representing columns), a list of strings, or a string. - String is currently the only supported scalar type. 
- The - descriptionfield is optional.- Use the - descriptionfield to provide context to the model; for example, to help the model localize the right table in a document.
 
 
Returns¶
A JSON object containing the extracted information.
Example of an output that includes array, table, and single value extraction:
{
  "error": null,
  "response": {
    "employees": [
      "Smith",
      "Johnson",
      "Doe"
    ],
    "income_table": {
      "income": ["$120 678","$130 123","$150 998"],
      "month": ["February", "March", "April"]
    },
    "title": "Financial report"
  }
}
Access control requirements¶
Users must use a role that has been granted the SNOWFLAKE.CORTEX_USER database role. For information about granting this privilege, see Cortex LLM privileges.
Usage notes¶
- You can’t use both - textand- fileparameters simultaneously in the same function call.
- You can either ask questions in natural language or describe information to be extracted (such as city, street, ZIP code); for example: - ['address': 'City, street, ZIP', 'name': 'First and last name']
- The following languages are supported: - Arabic 
- Bengali 
- Burmese 
- Cebuano 
- Chinese 
- Czech 
- Dutch 
- English 
- French 
- German 
- Hebrew 
- Hindi 
- Indonesian 
- Italian 
- Japanese 
- Khmer 
- Korean 
- Lao 
- Malay 
- Persian 
- Polish 
- Portuguese 
- Russian 
- Spanish 
- Tagalog 
- Thai 
- Turkish 
- Urdu 
- Vietnamese 
 
- The documents must be no more than 125 pages long. 
- In a single AI_EXTRACT call, you can ask a maximum of 100 questions for entity extraction, and a maximum of 10 questions for table extraction. - A table extraction question is equal to 10 entity extraction questions. For example, you can ask 4 table extraction questions and 60 entity extraction questions in a single AI_EXTRACT call. 
- The maximum output length for entity extraction is 512 tokens per question. For table extraction, the model returns answers that are a maximum of 4096 tokens. 
- Client-side encrypted stages are not supported. 
- Confidence scores are not supported. 
Examples¶
Extraction from an input string¶
- The following example extracts information from the input text: - SELECT AI_EXTRACT( text => 'John Smith lives in San Francisco and works for Snowflake', responseFormat => {'name': 'What is the first name of the employee?', 'city': 'What is the address of the employee?'} ); 
- The following example extracts and parses information from the input text: - SELECT AI_EXTRACT( text => 'John Smith lives in San Francisco and works for Snowflake', responseFormat => PARSE_JSON('{"name": "What is the first name of the employee?", "address": "What is the address of the employee?"}') ); 
Extraction from a file¶
- The following example extracts information from the - document.pdffile:- SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files','document.pdf'), responseFormat => [['name', 'What is the first name of the employee?'], ['city', 'Where does the employee live?']] ); 
- The following example extracts information from all files in a directory on a stage: - Note - Ensure that the directory table is enabled. For more information, see Managing directory tables. - SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', relative_path), responseFormat => [ 'What is this document?', 'How would you classify this document?' ] ) FROM DIRECTORY (@db.schema.files); 
- The following example extracts the - titlevalue from the- report.pdffile:- SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'title': { 'description': 'What is the title of document?', 'type': 'string' } } } } ); 
- The following example extracts the - employeesarray from the- report.pdffile:- SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'employees': { 'description': 'What are the surnames of employees?', 'type': 'array' } } } } ); 
- The following example extracts the - income_tabletable from the- report.pdffile:- SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'income_table': { 'description': 'Income for FY2026Q2', 'type': 'object', 'properties': { 'month': { 'type': 'array' }, 'income': { 'type': 'array' } } } } } } ); 
- The following example extracts table ( - income_table), single value (- title), and array (- employees) from the- report.pdffile:- SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'income_table': { 'description': 'Income for FY2026Q2', 'type': 'object', 'properties': { 'month': { 'type': 'array' }, 'income': { 'type': 'array' } } }, 'title': { 'description': 'What is the title of document?', 'type': 'string' }, 'employees': { 'description': 'What are the surnames of employees?', 'type': 'array' } } } } ); 
Regional availability¶
Legal notices¶
Refer to Snowflake AI and ML for legal notices.