SnowConvert AI: Data validation [Preview]¶
Data Validation uses the data-validation skill from the migration-plugin. For more details, contact snowconvert-support@snowflake.com.
The Data Validation feature of SnowConvert AI provides a fault-tolerant, scalable way to verify that data migrated into Snowflake matches the data in the original source system. It runs as a Cloud Data Validation workflow on the same infrastructure used by Data Migration, so you can migrate and validate with the same Orchestrator and Workers.
Cloud Data Validation is designed for migration scenarios where you are moving data from a system that you plan to decommission and need confidence that the migrated data is correct before cutting over. Supported source platforms are SQL Server, Amazon Redshift, and Teradata.
Architecture overview¶
Cloud Data Validation uses the same two components as Data Migration: an Orchestrator and one or more Workers.
- The Orchestrator connects to the Snowflake account. It requires privileges to create and operate the
SNOWCONVERT_AIdatabase, where workflow, task, and validation metadata is stored. - One or more Workers connect to both the source system and the Snowflake account. Workers run validation queries on both sides, export intermediate results, upload them to Snowflake, and write their outcomes to shared results tables. Workers pick up tasks created by the Orchestrator and process them in parallel.
- Validation results (schema, metrics, and row-level outcomes) are ingested into the shared results tables via Snowpipe by default and then evaluated by the Orchestrator.
Each Worker that executes validation tasks must have the validation runtime available in the same environment the migration skill or scai starts for that process. Workers that only run data-migration tasks can skip that runtime; they will not pick up data_validation tasks from the queue.
Deployment options¶
The Orchestrator and Workers can be deployed in multiple ways:
- Both on Snowpark Container Services (in the Snowflake account).
- Both in the customer’s environment, including custom hardware, virtual machines, or containers.
- Orchestrator on Snowpark Container Services and Workers in the customer’s environment, or the other way around.
The following requirements apply to the environment where the Orchestrator and Workers run:
- Use the Snowflake Migration Agent with Cortex Code, or the SnowConvert AI CLI (
scai) from a SnowConvert AI project. The skill and CLI install and manage the underlying runtime when needed. - Workers typically require an ODBC driver to connect to the source system. For Teradata, the pure-Python
teradatasqldriver is supported and preferred when the skill or environment provides it. - The Orchestrator must be able to connect to the Snowflake account using a role that has privileges to create the
SNOWCONVERT_AIdatabase and create schemas and objects within it.
Validation levels¶
Cloud Data Validation performs comparisons between each source table (or view) and its corresponding target in Snowflake at three increasingly detailed levels. Each level can be enabled or disabled independently in the workflow configuration.
Schema validation (L1)¶
Schema validation confirms that the structure of each migrated table is preserved in Snowflake. It compares the following attributes between source and target:
- Table name
- Column names
- Ordinal position of each column
- Data types
- Character maximum length for text columns
- Numeric precision and scale for numeric columns
- Row count
Metrics validation (L2)¶
Metrics validation confirms that aggregate statistics of the migrated data match the original source. Specific metrics vary by column data type, but metrics validation typically compares:
- Minimum value
- Maximum value
- Average
- Null count
- Distinct count
- Standard deviation
- Variance
Row validation (L3)¶
Row validation performs row-level or cell-level comparison between source and target. Configure the mode with validation_configuration.row_validation_mode (see Validation configuration):
row(default): MD5-chunked whole-row comparison usingindex_column_listalignment.cell: cell-level comparison with per-column mismatch reporting.hybrid: two-phase flow—row-level fingerprint first, then cell drilldown on partitions that fail the first phase (reduces cost versus purecellon large tables when supported).
Row validation is disabled by default and is typically applied only to the tables where it is needed because it is the most resource-intensive level.
Prerequisites¶
Before you use Cloud Data Validation, make sure the following are in place:
- SnowConvert AI project and skill: Cortex Code with the Snowflake Migration Agent, or the SnowConvert AI CLI (
scai) with a migration project. Use thedata-validationskill in the migration plugin for guided validation, or thescai data validate …commands in this page. Workers that run validation tasks receive the validation runtime through the same install path the skill orscaiuses for that worker process. - Snowflake access: Connections for the Orchestrator and Workers in your Snowflake
config.tomlorconnections.toml, using a role that can create theSNOWCONVERT_AIdatabase and its objects. The first time the Orchestrator starts, it creates that database and related resources if they don’t exist yet. On later runs, use a role that can administerSNOWCONVERT_AIand its objects; sticking with the same role you used for the initial creation is the simplest way to avoid permission issues. - Source connectivity: For SQL Server and Redshift sources, an ODBC driver on the machine where Workers run. For Teradata, prefer the
teradatasqldriver when provided by your environment, or configurepyodbcwith the Teradata ODBC driver. Programmatic Access Tokens (PATs) are recommended for Snowflake connections; see Connecting to Snowflake with a PAT. - Target data available: The tables you want to validate must already exist in Snowflake, either because you migrated them using Data Migration or by another means. For accurate validation, don’t alter the migrated data between the migration and the validation run.
- Snowpark Container Services (optional): If you deploy the Orchestrator or Workers on Snowflake compute, your account needs SPCS. See the Snowpark Container Services overview. Running both components outside Snowflake doesn’t require SPCS.
Source-platform specifics¶
Workers reuse the same [connections.source.*] TOML as data migration. The tabs below call out validation-specific behavior and common source-side deployment patterns. For full connectivity examples (including IAM Redshift, Oracle, and Teradata regular / write_nos / tpt migration settings), see SnowConvert AI: Data migration.
- Connectivity: ODBC to SQL Server from the Worker host; follow the same driver and encryption guidance as in SnowConvert AI: Data migration (SQL Server tab).
- Partitioning: Use
column_names_to_partition_byandtarget_partition_size_mb/target_partition_size_rowsin the validation JSON to keep L2/L3 work within reasonable bounds. - L3 alignment: Set
index_column_list(andtarget_index_column_listwhen names differ) so row and hybrid modes can align rows deterministically.
- Connectivity: Standard or IAM Redshift profiles in Worker TOML match migration; see the Redshift tab under SnowConvert AI: Data migration.
- After UNLOAD migrations: If tables were loaded via UNLOAD to S3, validation still runs SQL against live Redshift for source-side metrics and row checks—ensure the Worker can reach the cluster and that result sets for large partitions stay within your timeout and spool limits.
- Targets in Snowflake: Validation compares whatever is in Snowflake (native or Iceberg). Iceberg targets do not change L2/L3 SQL on the Redshift side; they only affect how results land in Snowflake metadata and storage.
| Topic | Data migration (load) | Cloud data validation |
|---|---|---|
| Purpose | Move data with extraction.strategy regular, write_nos, or tpt | Compare live Teradata tables/views to Snowflake with L1/L2/L3 |
| Teradata connection | Required on Workers that run migration tasks | Required on Workers that run validation tasks (same teradatasql / ODBC patterns) |
write_nos_* TOML | Required when strategy is write_nos | Not required for validation-only workloads (validation does not execute WRITE_NOS) |
tbuild / TTU | Required on Workers that can receive tpt migration tasks | Not required for validation-only Workers (L2/L3 use SQL, not TPT) |
Connectivity
Match Data migration — Teradata: prefer teradatasql when the environment provides it; otherwise set odbc_driver to the exact registered driver name, optional dbc_name, port 1025 by default, optional authentication (TD2, LDAP, KRB5).
After a migration that used tpt or write_nos
The migrated data in Snowflake was produced by those extraction paths, but validation still reads the source over SQL (metrics and row/cell plans). You do not re-enable write_nos_* or install tbuild on a host solely to run validation—unless the same Worker process also executes migration tpt or write_nos tasks. Ensure the Teradata objects you validate are still reachable and representative (same database, schema, and table or view names as in the validation workflow).
Schema validation on views (L1)
For Teradata views, L1 is a reduced comparison (column existence and datatype via HELP COLUMN metadata). Precision, scale, length, nullability, and ordinal checks are not available for Teradata views; use L2/L3 for deeper assurance.
Partitioning (L2/L3)
Use column_names_to_partition_by and target_partition_size_mb / target_partition_size_rows so wide tables do not time out; large tpt/write_nos migrations do not change how validation issues source SQL.
Warning
For accurate validation and to avoid false negatives, don’t alter the migrated data during the validation process.
Setup¶
Installation¶
Follow the Snowflake Migration Agent setup in Cortex Code so the migration plugin, scai, and worker dependencies (including validation components when you run validation workflows) are installed for you. For manual CLI-only use, install the SnowConvert AI CLI and work from a project directory; scai data validate start and related commands resolve missing worker-side components when orchestrating local workers.
Workers that pick up only data-migration tasks do not require the validation runtime; workers that execute data-validation tasks must have validation support available in the same environment the skill or scai prepares for that worker process.
Usage¶
To validate migrated data using this solution, complete the following high-level steps:
- Start the Orchestrator.
- Start the Workers.
- Create a Cloud Data Validation Workflow.
- Monitor the validation workflow until completion.
A Cloud Data Validation Workflow is a job submitted to the system that describes which tables to validate, at which levels, and with which comparison rules. You can submit multiple workflows simultaneously and monitor them. The Orchestrator breaks each workflow into smaller tasks, one or more per table, and dispatches them to available Workers.
The Orchestrator that runs validation is the same process that runs Data Migration. If you have already set up and started an Orchestrator and Workers for migration, you can reuse them for validation: ensure workers are started with a configuration and environment that includes validation support (the data-validation skill or scai data validate start / create-workflow with --start-worker handles this when running locally), then create a validation workflow as described in Creating a Cloud Data Validation Workflow.
Ask the migration agent for validation—for example, “Run cloud data validation for my project”—or use the scai data validate … commands below. See the Snowflake Migration Agent for guided prompts.
Using SCAI CLI¶
Cloud validation commands mirror migration under scai data validate …, sharing the same scai data worker start and scai data orchestrator branches. Deprecated names still run with a short warning:
| Deprecated (still runs) | Preferred command |
|---|---|
scai data generate-cloud-validation-config | scai data validate generate-config |
scai data cloud-validate | scai data validate create-workflow |
scai data cloud-validate-status | scai data validate status |
scai data cloud-list-validations | scai data validate list |
Generate a JSON validation workflow file from the SCAI project (default data-validation-config.json):
Create a validation workflow (options mirror migration’s create-workflow, but the workflow file must be JSON—.json only):
Local all-in-one project run (auto-generates config when missing unless --no-auto-config; forces local Worker + Orchestrator + watch, same as migration’s migrate start):
Status and listing:
Use scai data orchestrator setup|stop and scai data worker start exactly as described in SnowConvert AI: Data migration.
Orchestrator metadata and environment¶
By default, workflow and task metadata objects are created under SNOWCONVERT_AI.DATA_MIGRATION, and data validation metadata is created under SNOWCONVERT_AI.DATA_VALIDATION. When you run a local Orchestrator through scai (for example scai data validate start with an implied orchestrator), the CLI sets the Snowflake connection from your project. Advanced deployments can override metadata location with Orchestrator environment variables (CUSTOM_SNOWFLAKE_DATABASE_FOR_METADATA, CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_MIGRATION_METADATA, CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_VALIDATION_METADATA); if you do, set matching snowflake_database_for_metadata and snowflake_schema_for_data_migration_metadata on each Worker (see Worker configuration).
The Orchestrator runs until you stop it. Cloud Data Validation Workflows require an active Orchestrator to complete. The Orchestrator can be safely stopped and resumed; ongoing workflows resume at that point.
Creating a Cloud Data Validation Workflow¶
Recommended: From your SnowConvert AI project directory, generate a workflow JSON file and submit it with scai:
Or use a single local run (generates config if needed, starts local Worker and Orchestrator, watches to completion):
Keep the following in mind:
- The validation configuration specification is in Validation workflow configuration reference.
scai data validate create-workflowaccepts a.jsonworkflow file (the default fromgenerate-configisdata-validation-config.json). Thesource_platformfield in that file must match your source (sqlserver,redshift, orteradata). Cloud data migration additionally supports Oracle; validation does not at this time.- Workflow names must be alphanumeric and cannot start with a digit.
New workflow rows are inserted into the WORKFLOW table in the data migration metadata schema (default SNOWCONVERT_AI.DATA_MIGRATION) with WORKFLOW_TYPE set to data-validation. Validation results and related objects are stored under the data validation metadata schema (default SNOWCONVERT_AI.DATA_VALIDATION).
To migrate data before validating, use the cloud migration flow in SnowConvert AI: Data migration (scai data migrate generate-config, create-workflow, or start).
Monitoring a Cloud Data Validation Workflow¶
Each workflow goes through different stages throughout its lifecycle:
- Pending: No tasks have been created for this workflow yet.
- Executing: Tasks have been created for this workflow and there are still tasks that haven’t reached a terminal state (
COMPLETEDorFAILED). - Completed: All tasks have reached a terminal state (
COMPLETEDorFAILED).
In the data validation metadata schema (default SNOWCONVERT_AI.DATA_VALIDATION), the following views can be queried to understand the status of validation workflows:
| View | Description |
|---|---|
TABLE_PROGRESS | One row per validated table. Summarizes overall validation status. Can be filtered by WORKFLOW_ID. |
TABLE_PROGRESS_DETAIL | Per-table breakdown with partition-level L2/L3 status (VALID, INVALID, EXECUTION_ERROR). Can be filtered by WORKFLOW_ID. |
DATA_VALIDATION_ERROR | Errors encountered during validation. Can be filtered by WORKFLOW_ID. |
DATA_VALIDATION_WARNING | Non-fatal warnings, for example, unsupported column types or metric exclusions. Can be filtered by WORKFLOW_ID. |
In the same schema, the DATA_VALIDATION_DASHBOARD Streamlit dashboard provides a visual overview of validation progress and results, including a Table Progress tab that aggregates the views above.
You can also inspect validation queries in QUERY_HISTORY using the QUERY_TAG values set by the Orchestrator and Workers. See Query tagging.
Validation outcomes are classified into three categories:
| Category | Description |
|---|---|
| OK | Values match exactly between the source database and Snowflake. |
| Warning | The Snowflake table has minor differences that don’t affect the data (for example, higher numeric precision). |
| Error | Values don’t match between the original database and the Snowflake database. |
Considerations and recommendations¶
Connecting to Snowflake with a PAT¶
Use Programmatic Access Tokens (PATs) for connections used by the Orchestrator and Workers. This avoids the need to constantly authenticate through the browser or with an authenticator app. You need to establish a Network Policy or temporarily bypass the requirement for a Network Policy (which you can do from Snowsight).
Running Orchestrator and Workers on SPCS¶
To leverage Snowflake compute for these tasks:
- Prepare Docker images that bundle the Orchestrator and Worker runtimes and configuration expected by your SnowConvert AI release (see Snowflake guidance for SPCS service images).
- Push those Docker images to an Image Repository in Snowflake.
- Execute the Orchestrator and/or Worker images using Snowpark Container Services.
Keep the following in mind:
- Run them as Services, not Jobs.
- You can run only one component (Orchestrator or Workers) in SPCS and the other on a different platform.
- Monitor the SPCS service and suspend it when it isn’t being used.
- Depending on the network configuration of the source system, you might need to configure an External Access Integration so that these services can connect to your source system.
Initial testing¶
For an early test run, use a separate validation configuration whose tables array lists only the table or small set of tables you want to validate. On each of those entries:
- Set
where_clause(and optionallytarget_where_clause) to an SQL-like predicate so only a subset of rows is validated, for example, a bounded primary-key range or a narrow date range in the source dialect. - Keep
validation_configuration.row_validationdisabled for the first run, and enable it later on a smaller subset of tables. - Use a small
target_partition_size_mbortarget_partition_size_rowsto keep partitions tiny during the test.
After you confirm connectivity and results, create your full workflow: remove or relax where_clause, adjust partition sizes, and enable row validation only on the tables where it is needed.
Managing Workers¶
The time it takes to complete a validation workflow depends on many variables, but the number of Workers (and threads per Worker) has the greatest impact, as it determines how many validation tasks can be executed in parallel. Consider the following:
- You don’t need to run two Workers on the same machine. If you want more parallelism on a single machine, increase the thread count in the Worker TOML (
max_parallel_tasksunder[application]) or use the options your skill/scairun exposes for local workers. - Network bandwidth greatly affects Worker speed and is shared between threads of a Worker.
- Even with many Workers and threads running in parallel, the source system might not have enough resources to handle the load.
- Keep a low Worker count to avoid overloading your source system.
- Consider stopping some or all Workers when the source system is already under heavy load from other operations.
Query tagging¶
Both the Orchestrator and the Worker automatically set Snowflake’s QUERY_TAG session parameter on every query they submit. Tags are compact JSON strings containing identifiers such as the workflow ID, task ID, and component version. You can use these tags to filter and attribute validation queries in QUERY_HISTORY:
| Tag key | Present on | Description |
|---|---|---|
DMVF_VERSION | Infrastructure queries | Component package version. |
DMVF_WORKFLOW_ID | Task-processing queries | Workflow that originated the task. |
DMVF_TASK_ID | Task-processing queries | Individual task identifier. |
DMVF_ORCHESTRATOR_VERSION | Orchestrator task-processing queries | Orchestrator package version. |
DMVF_WORKER_VERSION | Worker task-processing queries | Worker package version. |
Configuration reference¶
Validation workflow configuration reference¶
The validation workflow configuration file is a JSON object. The following sections describe its structure and properties.
Note
Names that require quoting (or brackets) must be manually quoted as they would normally be in JSON. For example: "tableName": "\"MyCaseSensitiveTable\"".
Top-level object¶
| Property | Type | Required | Description |
|---|---|---|---|
source_platform | String | Yes | Source dialect identifier: sqlserver, redshift, or teradata. Must match the --source-platform argument used to create the workflow. |
target_platform | String | Defaults to Snowflake. | |
target_database | String | Default target database name for tables that don’t specify one. | |
validation_configuration | Object | Global validation levels and options. See Validation configuration. | |
comparison_configuration | Object | Numeric tolerance and optional type mapping file. See Comparison configuration. | |
database_mappings | Object | Map of source database names to Snowflake database names. | |
schema_mappings | Object | Map of source schema names to Snowflake schema names. | |
tables | Array | Yes | At least one table to validate. See Per-table and per-view entry. |
views | Array | Additional view entries. Uses the same shape as tables. | |
use_snowflake_compute | Boolean | When true, enables Snowflake-side computation paths where supported. Default false. | |
target_partition_size_rows | Integer | Desired rows per partition. Mutually exclusive with target_partition_size_mb. Must be greater than 0. Default is 200 MB when both are omitted. | |
target_partition_size_mb | Integer | Desired MB per partition. Mutually exclusive with target_partition_size_rows. Must be greater than 0. Default is 200 MB when both are omitted. | |
use_snowpipe_for_results | Boolean | When true (default), L2/L3 validation results are ingested into the shared results tables via Snowpipe. Set to false to fall back to per-partition COPY INTO tasks. |
Validation configuration¶
The validation_configuration object sets global defaults for which validation levels to run. Any field set here can be overridden per table by nesting a validation_configuration object on that table entry.
When validation_configuration is omitted, the Orchestrator applies these defaults: schema validation and metrics validation are enabled; row validation is disabled; row_validation_mode defaults to row; continue_on_failure defaults to false; max_failed_rows_number defaults to 100; exclude_metrics defaults to false; apply_metric_column_modifier defaults to true.
| Property | Type | Description |
|---|---|---|
schema_validation | Boolean | Level 1: schema and column consistency checks. |
metrics_validation | Boolean | Level 2: statistical metrics comparison. |
row_validation | Boolean | Level 3: row-level or cell-level data comparison. |
row_validation_mode | String | For row validation: row, cell, or hybrid (see Row validation (L3)). |
continue_on_failure | Boolean | Whether to continue to the next validation level after a failure. |
max_failed_rows_number | Integer | Cap on failed rows reported for L3 validation. Must be greater than 0 when set. |
exclude_metrics | Boolean | Whether to exclude unsupported metric columns. |
apply_metric_column_modifier | Boolean | Whether to apply metric column modifiers. |
Comparison configuration¶
The comparison_configuration object controls numeric tolerance and optional type-mapping overrides used during comparisons.
| Property | Type | Description |
|---|---|---|
tolerance | Number | Numeric comparison tolerance for metrics. Must be greater than 0 when set. Default 0.001 when omitted. |
type_mapping_file_path | String | Optional path to a custom type mapping file for comparisons. |
Per-table and per-view entry¶
Each object in tables (or views) describes one database object to validate.
| Property | Type | Required | Description |
|---|---|---|---|
fully_qualified_name | String | Yes | Source object name. The exact format depends on the source platform. |
use_column_selection_as_exclude_list | Boolean | Default false. | |
column_selection_list | String[] | Columns to include or exclude, depending on use_column_selection_as_exclude_list. | |
target_name | String | Target object name override. | |
target_database | String | Per-table target database override. | |
target_schema | String | Per-table target schema override. | |
where_clause | String | Filter on the source side. | |
target_where_clause | String | Filter on the target side. | |
index_column_list | String[] | Columns used to align rows on the source. | |
target_index_column_list | String[] | Columns used to align rows on the target. | |
column_mappings | Object | Map of source column name to target column name. | |
is_case_sensitive | Boolean | Case sensitivity for identifiers. | |
chunk_number | Integer | Chunking hint for validation. Must be greater than 0 when set. | |
max_failed_rows_number | Integer | Overrides the global cap for this object. | |
exclude_metrics | Boolean | Per-object metrics exclusion override. | |
apply_metric_column_modifier | Boolean | Per-object modifier override. | |
object_type | String | Typically TABLE or VIEW. | |
column_names_to_partition_by | String[] | Columns used for range-based (NTILE) partitioning during validation. Without this, the table is processed as a single partition. | |
target_partition_size_rows | Integer | Per-table override for desired rows per partition. Mutually exclusive with target_partition_size_mb. Must be greater than 0. | |
target_partition_size_mb | Integer | Per-table override for desired MB per partition. Mutually exclusive with target_partition_size_rows. Must be greater than 0. | |
validation_configuration | Object | Nested object with the same fields as global validation_configuration to override defaults for this object only. |
Partitioning¶
When column_names_to_partition_by is set, the Orchestrator splits the table into range-based partitions. Cloud Data Validation and Data Migration share the same sizing logic:
-
Compute a target rows-per-partition from whichever user setting is provided. The two settings are mutually exclusive:
target_partition_size_rowsis used as-is.target_partition_size_mbis converted to rows usingtarget_mb / avg_row_mb.- If neither is set, Cloud Data Validation defaults to 200 MB per partition.
-
Apply an internal cap. System-imposed maximums (not user-configurable) limit partition size to safe infrastructure bounds.
-
Derive the partition count as
ceil(row_count / effective_rows_per_partition), or 1 when the entire table fits in a single partition.
Example: SQL Server validation¶
The following configuration runs schema and metrics validation on two SQL Server tables, overrides defaults on the second table, and enables row-level validation there:
Example: Redshift validation after UNLOAD migration¶
The following configuration validates Redshift tables that were migrated using the UNLOAD extraction strategy. Schema and metrics validation are enabled globally, and row validation is enabled on a single high-value table:
For the corresponding migration workflow, see Redshift UNLOAD in the Data Migration page.
Example: Teradata validation¶
The following configuration validates Teradata tables using schema and metrics validation, with a custom tolerance and per-table partition sizing:
Worker configuration¶
The Worker configuration file uses TOML format and is shared between Data Migration and Cloud Data Validation. Workers that execute validation tasks must be started through a path that includes validation support (for example the data-validation skill or scai data validate start / create-workflow with --start-worker).
| Section | Property | Type | Description |
|---|---|---|---|
| Top level | selected_task_source | String | Required for cloud workflows. Use "snowflake_stored_procedure". |
[task_source.snowflake_stored_procedure] | connection_name | String | Snowflake profile used for task-queue stored procedures, or "@SPCS_CONNECTION" on SPCS. |
[application] | max_parallel_tasks | Integer | Maximum parallel tasks (threads). |
[application] | task_fetch_interval | Integer | Seconds between idle polls for new tasks. |
[application] | lease_refresh_interval | Integer | Optional. Seconds between lease renewals (default 120 in code defaults). |
[application] | affinity | String | Optional. Worker affinity for task routing. |
[application] | snowflake_database_for_metadata | String | Optional. Task-queue database (default SNOWCONVERT_AI). Must match CUSTOM_SNOWFLAKE_DATABASE_FOR_METADATA on the Orchestrator if overridden. |
[application] | snowflake_schema_for_data_migration_metadata | String | Optional. Task-queue schema (default DATA_MIGRATION). Must match CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_MIGRATION_METADATA if overridden. |
[application] | local_results_directory | String | Optional. Export base directory (default ~/.data_exchange_agent/result_data; expanded to absolute at load). |
[connections.source.*] | (per engine) | Object | Source connection(s). |
[connections.target.snowflake_connection_name] | connection_name | String | Snowflake profile for data sessions (validation queries, uploads). |
When selected_task_source is "snowflake_stored_procedure", the Worker calls task-queue stored procedures using application.snowflake_database_for_metadata and application.snowflake_schema_for_data_migration_metadata. These values are independent of the Snowflake session defaults (SNOWFLAKE_DATABASE, SNOWFLAKE_SCHEMA) in the connection profile. Validation results metadata lives under SNOWCONVERT_AI.DATA_VALIDATION by default (CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_VALIDATION_METADATA on the Orchestrator); Workers do not set that schema in TOML—they reach it through the same Snowflake connection and orchestrated tasks.
An example configuration file looks like this:
Note
Only one source connection is needed.
Source connection configuration examples¶
The following examples show the supported source connection types for Cloud Data Validation.
1. SQL Server (standard authentication)
The Worker automatically detects the best available ODBC driver for SQL Server, preferring newer versions (ODBC Driver 18 > 17 > 13 > 11). To manually select a driver, set odbc_driver to the exact name returned by pyodbc.drivers():
The encrypt and trust_server_certificate parameters are optional. When omitted, the ODBC driver uses its default behavior:
- ODBC Driver 17 and below: encryption is disabled by default.
- ODBC Driver 18 and above: encryption is mandatory by default.
For development environments or SQL Servers without encryption support, either omit the encryption parameters or set encrypt = false.
2. Amazon Redshift (IAM authentication)
3. Amazon Redshift (standard authentication)
4. Teradata
The Worker supports two Teradata drivers and automatically selects the best one available:
teradatasql(preferred). Pure Python driver, no OS-level ODBC installation required. The migration skill or Teradata-enabled worker environment supplies this driver when configured.- ODBC fallback. If
teradatasqlisn’t installed, the Worker falls back topyodbcwith the Teradata ODBC driver. Setdriver_nameto the exact name returned bypyodbc.drivers().
When teradatasql is available, driver_name is ignored and no ODBC driver needs to be installed on the host. Use dbc_name when your Teradata COP or TDPID alias differs from host.
Command reference¶
The following table lists scai commands used for Cloud Data Validation. For worker TOML, workflow JSON fields, and Snowflake observability, see Using SCAI CLI, Worker configuration, and Validation workflow configuration reference. Orchestrator and worker install paths are covered in the Snowflake Migration Agent.
| Command | Purpose |
|---|---|
scai data validate generate-config | Generate data-validation-config.json from the SnowConvert AI project. |
scai data validate create-workflow | Create a cloud validation workflow; --config must be a .json file. |
scai data validate start | Local all-in-one: generate config if needed, start local Worker + Orchestrator, watch. |
scai data validate status | Show or watch validation workflow status. |
scai data validate list | List validation workflows (--status, --limit, --connection). |
scai data worker start <config.toml> | Start the Data Exchange Worker (see SnowConvert AI: Data migration). |
scai data worker start --auto-config [PATH] | Emit a Worker TOML template for you to edit. |
scai data orchestrator setup / stop | Deploy or stop the SPCS Orchestrator service (shared with data migration). |