Cortex COMPLETE Structured Outputs

COMPLETE can extract data from a text document and respond with a JSON object that conforms to a JSON schema you supply. This reduces the need for post-processing in your AI data pipelines and enables seamless integration with systems that require deterministic responses. COMPLETE verifies each generated token against your JSON schema to ensure that the response conforms to the schema.

Every model supported by COMPLETE supports structured output, but the most powerful models typically generate higher quality responses.

Using COMPLETE Structured Outputs

To obtain a response in a structured format, specify a JSON schema as the response_format argument. The supplied JSON schema object defines the structure, data types, and constraints that the generated text must conform to, including required fields. You don’t need to specify any details of the output format, or even instruct the model to “respond in JSON.”

The schema must be specified as a sub-object within the options argument, as shown here, not as a string. This requires the use of single quotes for strings, not the quotation marks used in JSON, as this is a SQL object. Responses are string representations of a JSON object.

options: {
    ...
    response_format: {
        'type': 'json',
        'schema': {
            'type': 'object',
            'properties': {
                'property_name': {
                    'type': 'string'
                },
                ...
            },
            'required': ['property_name', ...]
        }
    }
Copy

Example

The following example demonstrates how to use the response_format argument to specify a JSON schema for the response.

SELECT SNOWFLAKE.CORTEX.COMPLETE('mistral-large2', [
        {
        'role': 'user',
        'content': 'Return the customer sentiment for the following review: New kid on the block, this pizza joint! The pie arrived neither in a flash nor a snail\'s pace, but the taste? Divine! Like a symphony of Italian flavors, it was a party in my mouth. But alas, the party was a tad pricey for my humble abode\'s standards. A mixed bag, I\'d say!'
            }
    ],
    {
        'temperature': 0,
        'max_tokens': 1000,
        'response_format':{
            'type':'json',
            'schema':{'type' : 'object','properties' : {'sentiment_categories':{'type':'array','items':{'type':'object','properties':
            {'food_quality' : {'type' : 'string', 'maxLength':10},'food_taste': {'type':'string', 'maxLength':10}, 'wait_time': {'type':'string', 'maxLength':10}, 'food_cost': {'type':'string', 'maxLength':10}},'required':['food_quality','food_taste' ,'wait_time','food_cost']}}}}
            }
    }
);
Copy

Response:

{
    "created": 1738683744,
    "model": "mistral-large2",
    "structured_output": [
        {
        "raw_message": {
            "sentiment_categories": [
            {
                "food_cost": "negative",
                "food_quality": "positive",
                "food_taste": "positive",
                "wait_time": "neutral"
            }
            ]
        },
        "type": "json"
        }
    ],
    "usage": {
        "completion_tokens": 46,
        "prompt_tokens": 97,
        "total_tokens": 143
    }
}

See also this example that uses the Cortex LLM REST API.

Create a JSON schema definition

To get the best performance from COMPLETE Structured Outputs, follow these guidelines:

  • Use the “required” field in the schema to specify required fields. COMPLETE will raise an error if a required field cannot be extracted.

    In the following example, the schema directs COMPLETE to find people mentioned in the document. The people field is marked as required to make sure people are identified.

    {
        'type': 'object',
        'properties': {
            'dataset_name': {
                'type': 'string'
            },
            'created_at': {
                'type': 'string'
            },
            'people': {
                'type': 'array',
                'items': {
                    'type': 'object',
                    'properties': {
                        'name': {
                            'type': 'string'
                        },
                        'age': {
                            'type': 'number'
                        },
                        'isAdult': {
                            'type': 'boolean'
                        }
                    }
                }
            }
        },
        'required': [
            'dataset_name',
            'created_at',
            'people'
        ]
    }
    
    Copy

    Response:

    {
        "dataset_name": "name",
        "created_at": "date",
        "people": [
            {
                "name": "Andrew",
                "isAdult": true
            }
        ]
    }
    
    Copy
  • Provide detailed descriptions of the fields to be extracted so that the model can more accurately identify them. For example, the following schema includes a description of each of the fields of people: name, age, and isAdult.

    {
        'type': 'object',
        'properties': {
            'dataset_name': {
                'type': 'string'
            },
            'created_at': {
                'type': 'string'
            },
            'people': {
                'type': 'array',
                'items': {
                    'type': 'object',
                    'properties': {
                        'name': {
                            'type': 'string',
                            'description': 'name should be between 9 to 10 characters'
                        },
                        'age': {
                            'type': 'number',
                            'description': 'Should be a value between 0 and 200'
                        },
                        'isAdult': {
                            'type': 'boolean',
                            'description': 'Persons is older than 18'
                        }
                    }
                }
            }
        }
    }
    
    Copy

Tip

For the most consistent results, set the temperature option to 0.

Cost considerations

Cortex COMPLETE Structured Outputs incurs compute cost based on the number of tokens processed, but does not incur additional compute cost for the overhead of verifying each token against the supplied JSON schema. However, the number of tokens processed (and billed) increases with schema complexity. In general, the larger and more complex the supplied schema is, the more input and output tokens are consumed. Highly-structured responses with deep nesting (e.g., hierarchical data) consume a larger number of tokens than simpler schemas.

Limitations

You cannot address external schemas using $ref or $dynamicRef.

The following constraint keywords are not supported.

Type

Keywords

integer

multipleOf

number

multipleOf, minimum, maximum, exclusiveMinimum, exclusiveMaximum

string

minLength, maxLength, format

array

uniqueItems, contains, minContains, maxContains, minItems, maxItems

object

patternProperties, minProperties, maxProperties, propertyNames

These limitations might be addressed in future releases.

Error conditions

Situation

Example message

Model output validation failed. The model could not generate a response that matched the schema. This can be caused by required fields that do not appear in the document.

An error occurred while validating the model output%!(EXTRA string=unexpected end of JSON input)

Model output validation failed. The given property does not follow the schema. The error message provides details about the specific property that failed validation.

[{"evaluationPath":"/properties/dataset_name","errors":{"maxLength":"Value should be at most 5 characters"}}])”

Model output validation failed. The given array items do not match the schema. The error message provides details about the specific array items that failed validation.

[{"evaluationPath":"/properties/people","errors":{"items":"Items at index 0, 2, 3, 4, 5, 6, 7, 8, 9 do not match the schema"}}])”

Additional property ‘dataset_name’ does not follow the schema. The error message provides details about the specific property that failed validation.

{"$ref":"Value does not match the reference schema"}},{"errors":{"properties":"Property ‘type’ does not match the schema"}}, {"evaluationPath":"/properties/type","errors":{"enum":"Value should match one of the values specified by the enum"}}]”