Fine-tuning arctic-extract models¶
You can now fine-tune arctic-extract models using the Snowflake Cortex Fine-tuning
function and Snowflake Datasets. The fine-tuned model can then be used for inference with the
AI_EXTRACT function.
Syntax¶
For specific syntax, usage notes, and examples, see:
FINETUNE ('CREATE') (SNOWFLAKE.CORTEX)¶
Creates a fine-tuning job.
Syntax¶
Required parameters¶
'CREATE'Specifies that you want to create a fine-tuning job.
'training_dataset'Dataset object to use for training. For more information, see Dataset requirements.
Optional parameters¶
'validation_dataset'Dataset object to use for validation. For more information, see Dataset requirements.
Note
The options parameter is not supported for fine-tuning arctic-extract models. The number of epochs is automatically
determined by the system.
Access control requirements¶
Privilege |
Object |
Notes |
|---|---|---|
USAGE or OWNERSHIP |
DATABASE |
The database that the Dataset object is stored in. |
USAGE or OWNERSHIP |
SCHEMA |
The schema that the Dataset object is stored in. |
READ or OWNERSHIP |
STAGE |
The stage that the document files are stored in. |
USAGE or OWNERSHIP |
SCHEMA |
The schema that the fine-tuned model is stored in. |
CREATE MODEL |
SCHEMA |
The schema that the fine-tuned model is stored in. |
Additionally, to use the FINETUNE function, the ACCOUNTADMIN role must grant the SNOWFLAKE.CORTEX_USER database role to the user who will call the function. See LLM Functions required privileges topic for details.
Example¶
FINETUNE ('DESCRIBE') (SNOWFLAKE.CORTEX)¶
Describes the properties of a fine-tuning job.
For syntax and parameters, see FINETUNE ('DESCRIBE') (SNOWFLAKE.CORTEX).
An example output for a successful job when fine-tuning arctic-extract model:
Dataset requirements¶
The Dataset used for training and validation must contain the following columns:
- File:
A string containing the file path to the document for extraction. For example:
@db.schema.stage/file.pdf- Prompt:
A JSON value that specifies key and question pairs for extraction in one of the formats supported by the
responseFormatargument of the AI_EXTRACT function.For more information, see AI_EXTRACT.
- Response:
A JSON object containing key and response pairs.
Note
Column names are case-insensitive and can be in any order in the Dataset; however, all required columns
(File, Prompt, and Response) must be present for the Dataset to be valid. Additional columns in the Dataset are ignored.
When preparing the Dataset, note the following:
The schema of the fine-tuned model is the unique set of all questions in the Dataset.
The answers in the
Responsecolumn should match the questions in thePromptcolumn by matching keys in thePromptandResponsecolumns.You don’t have to specify the same set of questions for every document.
To improve model accuracy, add a prompt and response row for each question, even if the model’s default response is correct. This action confirms that the default answer is accurate.
For more information about Datasets, see Snowflake Datasets.
Example Dataset¶
File |
Prompt |
Response |
|---|---|---|
|
|
|
|
|
|
|
Note
When you create the Dataset, set the response to None if the document does not contain an answer to the question.
Usage notes¶
Snowflake recommends using at least 20 documents for fine-tuning.
Supported file formats for documents are:
PDF
PNG
JPG, JPEG
TIFF, TIF
The maximum number of pages per document is:
64 pages for AWS US West 2 (Oregon) and AWS Europe Central 1 (Frankfurt)
125 pages for AWS US East 1 (N. Virginia) and Azure East US 2 (Virginia)
The maximum number of unique document files in the Dataset is 1,000. You can reference the same document file multiple times.
A limit exists on how many questions and documents can be in a fine-tuning job. Number of questions multiplied by total number of pages in all document files in the Dataset must be equal or less than 50,000.
For example, some valid combinations are:
Number of questions
Number of pages
Number of document file references [1]
10
1
5,000
100
1
500
10
10
500
25
10
200
Create a fine-tuning job¶
To create a fine-tuning job, you must create a Dataset object that contains the training data. The following example shows how to
create a Dataset object and use the Dataset to create a fine-tuning job for an arctic-extract model.
Create the table which will contain the training data:
Populate the table with the training data:
Create the Dataset object:
Create a new version of the Dataset that adds the training data, using the FL_GET_STAGE and the FL_GET_RELATIVE_PATH functions to get the file paths:
Create a fine-tuning job:
Use your fine-tuned arctic-extract model for inference¶
To use the fine-tuned arctic-extract model for inference, ensure you have the following privileges on the model object:
OWNERSHIP
USAGE
READ
To use the fine-tuned arctic-extract model for inference with the AI_EXTRACT function,
specify the model using the model parameter as shown in the following example:
You can overwrite questions used for fine-tuning by using the responseFormat parameter as shown in the following example:
For more information, see AI_EXTRACT.
Tip
You can copy your fine-tuned arctic-extract model between databases and/or schemas within an account or between accounts.
For more information, see Copy arctic-extract models between databases, schemas, and accounts.