Snowflake Data Validation CLI - Quick Reference¶
This quick reference guide provides a condensed overview of commands, configuration options, and common usage patterns for the Snowflake Data Validation CLI tool, designed for easy lookup during validation tasks.
Installation¶
Prerequisites¶
Before running the commands below, ensure that Python 3.10 or later and pip are installed on your system.
# Base installation
pip install snowflake-data-validation
# With source-specific drivers
pip install "snowflake-data-validation[sqlserver]"
pip install "snowflake-data-validation[teradata]"
pip install "snowflake-data-validation[redshift]"
Command Structure¶
snowflake-data-validation <dialect> <command> [options]
# or
sdv <dialect> <command> [options]
Dialects: sqlserver | teradata | redshift
Common Commands¶
run-validation¶
# SQL Server
sdv sqlserver run-validation --data-validation-config-file config.yaml
# Teradata
sdv teradata run-validation --data-validation-config-file config.yaml
# Redshift
sdv redshift run-validation --data-validation-config-file config.yaml
generate-validation-scripts¶
sdv <dialect> generate-validation-scripts --data-validation-config-file config.yaml
run-async-validation¶
sdv <dialect> run-async-validation --data-validation-config-file config.yaml
get-configuration-files¶
sdv <dialect> get-configuration-files --templates-directory ./templates
auto-generated-configuration-file¶
sdv <dialect> auto-generated-configuration-file
table-partitioning-helper¶
sdv <dialect> table-partitioning-helper
Interactive command to partition large tables for more efficient validation.
Configuration Template¶
This template provides the core structure for configuring data validation jobs, defining source and target connections, validation rules, and table-specific settings that control how data is compared between your source database and Snowflake.
# GLOBAL
source_platform: SqlServer # SqlServer | Teradata | Redshift
target_platform: Snowflake
output_directory_path: ./output
max_threads: auto # "auto" or 1-32
target_database: teradataTargetDatabase # For Teradata sources only - specify target database
# SOURCE CONNECTION
source_connection:
mode: credentials
host: "hostname"
port: 1433
username: "user"
password: "pass"
database: "db"
# SQL Server only:
trust_server_certificate: "no" # yes | no
encrypt: "yes" # yes | no | optional
# TARGET CONNECTION
target_connection:
mode: name # name | default
name: "connection_name" # if mode=name
# VALIDATION
validation_configuration:
schema_validation: true
metrics_validation: true
row_validation: false
max_failed_rows_number: 100
exclude_metrics: false
apply_metric_column_modifier: false
# COMPARISON
comparison_configuration:
tolerance: 0.01 # 1% tolerance
# LOGGING (optional)
logging_configuration:
level: INFO # DEBUG | INFO | WARNING | ERROR | CRITICAL
console_level: WARNING
file_level: DEBUG
# MAPPINGS (optional)
database_mappings:
source_db: target_db
schema_mappings:
source_schema: target_schema
# TABLES
tables:
- fully_qualified_name: database.schema.table1
use_column_selection_as_exclude_list: false
column_selection_list: []
index_column_list: []
where_clause: ""
target_where_clause: ""
chunk_number: 0
column_mappings: {}
Table Configuration Examples¶
Include All Columns¶
- fully_qualified_name: db.schema.table
use_column_selection_as_exclude_list: false
column_selection_list: []
Include Specific Columns¶
- fully_qualified_name: db.schema.table
use_column_selection_as_exclude_list: false
column_selection_list:
- column1
- column2
- column3
Exclude Specific Columns¶
- fully_qualified_name: db.schema.table
use_column_selection_as_exclude_list: true
column_selection_list:
- audit_timestamp
- internal_notes
With Filtering¶
- fully_qualified_name: db.schema.table
use_column_selection_as_exclude_list: false
column_selection_list: []
where_clause: "status = 'ACTIVE' AND created_date >= '2024-01-01'"
target_where_clause: "status = 'ACTIVE' AND created_date >= '2024-01-01'"
With Column Mappings¶
- fully_qualified_name: db.schema.table
use_column_selection_as_exclude_list: false
column_selection_list: []
column_mappings:
source_col1: target_col1
source_col2: target_col2
Large Table with Chunking¶
- fully_qualified_name: db.schema.large_table
use_column_selection_as_exclude_list: false
column_selection_list: []
index_column_list:
- primary_key
chunk_number: 50
max_failed_rows_number: 500
Connection Examples¶
SQL Server¶
source_connection:
mode: credentials
host: "sqlserver.company.com"
port: 1433
username: "sql_user"
password: "sql_pass"
database: "prod_db"
trust_server_certificate: "no"
encrypt: "yes"
Teradata¶
source_connection:
mode: credentials
host: "teradata.company.com"
username: "td_user"
password: "td_pass"
database: "prod_db"
Redshift¶
source_connection:
mode: credentials
host: "cluster.region.redshift.amazonaws.com"
port: 5439
username: "rs_user"
password: "rs_pass"
database: "prod_db"
Snowflake (Named) (See more info here: https://docs.snowflake.com/en/developer-guide/snowflake-cli/connecting/configure-connections)¶
target_connection:
mode: name
name: "my_snowflake_connection"
Snowflake (Default)¶
target_connection:
mode: default
Validation Levels¶
Level |
Type |
Description |
Cost |
|---|---|---|---|
1 |
Schema |
Column names, types, nullability |
Low |
2 |
Metrics |
Row counts, distinct values, min/max, avg |
Medium |
3 |
Row |
Hash-based row comparison |
High |
Common CLI Options¶
Option |
Short |
Description |
Default |
|---|---|---|---|
|
|
Config file path |
Required |
|
|
Log level |
INFO |
|
|
Template output dir |
Current dir |
|
Include query templates |
false |
|
|
Results directory |
From config |
Log Levels¶
DEBUG: Detailed diagnostic information
INFO: General informational messages
WARNING: Warning messages
ERROR: Error messages
CRITICAL: Critical errors
Configuration Field Reference¶
Required Fields¶
source_platformtarget_platformoutput_directory_pathsource_connectiontarget_connectiontables
Optional Fields¶
max_threads(default: “auto”)target_database(required for Teradata)validation_configurationcomparison_configurationlogging_configurationdatabase_mappingsschema_mappings
Table Configuration Fields¶
Field |
Required |
Type |
Description |
|---|---|---|---|
|
âś“ |
String |
Full table identifier |
|
âś“ |
Boolean |
Include/exclude mode |
|
âś“ |
List |
Columns to include/exclude |
|
List |
Primary key columns |
|
|
String |
Source filter |
|
|
String |
Target filter |
|
|
Integer |
Number of chunks (0=off) |
|
|
Integer |
Max failures to report |
|
|
Dict |
Source→Target mappings |
|
|
Boolean |
Case-sensitive matching |
Performance Tips¶
For Large Tables¶
Enable chunking:
chunk_number: 100
Increase threads:
max_threads: 32
Filter data:
where_clause: "date >= '2024-01-01'"
Exclude large columns:
use_column_selection_as_exclude_list: true column_selection_list: - large_blob - large_text
Skip row validation initially:
validation_configuration: schema_validation: true metrics_validation: true row_validation: false # Enable after initial validation
Common Issues¶
Connection Failed¶
# SQL Server SSL issues
trust_server_certificate: "yes"
encrypt: "optional"
Out of Memory¶
# Reduce parallelism
max_threads: 4
# Enable chunking
chunk_number: 50
Tolerance for Numerical Differences¶
# Increase tolerance
comparison_configuration:
tolerance: 0.05 # 5%
YAML Syntax Errors¶
Use spaces, not tabs
Quote special characters in YAML: :code:
password: "p@ssw0rd!"If a string value starts or ends with a double quote, escape the double quotes “table #1”
Escape quotes:
name = 'O''Brien'
Asynchronous Workflow¶
The asynchronous workflow allows you to decouple script generation from execution, which is useful when you need to run validation queries manually on the source database or when you have restricted access that requires scheduled execution.
Generate the validation scripts:
sdv sqlserver generate-validation-scripts --data-validation-config-file config.yaml
Execute the generated scripts manually on your source database and save the results to CSV files in the output directory.
Run the async validation to compare the saved results:
sdv sqlserver run-async-validation --data-validation-config-file config.yaml
Example Workflows¶
Basic Validation¶
Get the configuration templates:
sdv sqlserver get-configuration-files
Edit the generated
config.yamlfile to configure your source and target connections, validation settings, and tables.Run the validation:
sdv sqlserver run-validation --data-validation-config-file config.yaml
Interactive Setup¶
Generate a configuration file interactively by answering prompts:
sdv sqlserver auto-generated-configuration-file
Run the validation using the generated configuration:
sdv sqlserver run-validation --data-validation-config-file generated_config.yaml
Debug Mode¶
To troubleshoot issues or get detailed execution information, run validation with debug logging:
sdv sqlserver run-validation \
--data-validation-config-file config.yaml \
--log-level DEBUG
Large Table Partitioning¶
For validating very large tables, use the partitioning helper to divide tables into smaller segments:
Create a configuration file with your table definitions
Run the partitioning helper:
sdv sqlserver table-partitioning-helper
Follow the prompts to specify partition columns and counts
Run validation with the partitioned configuration
Output Files¶
Generated in output_directory_path:
Validation reports: Schema, metrics, row comparison results
Log files:
data_validation_YYYY-MM-DD_HH-MM-SS.logDifference files:
differencesL1.csv,differencesL2.csvGenerated scripts: (when using
generate-validation-scripts)
Environment Variables¶
For Snowflake connections using default mode, configure:
export SNOWFLAKE_ACCOUNT="account_name"
export SNOWFLAKE_USER="username"
export SNOWFLAKE_PASSWORD="password"
export SNOWFLAKE_DATABASE="database"
export SNOWFLAKE_SCHEMA="schema"
export SNOWFLAKE_WAREHOUSE="warehouse"
export SNOWFLAKE_ROLE="role"
Help Commands¶
# Main help
sdv --help
# Dialect-specific help
sdv sqlserver --help
sdv teradata --help
sdv redshift --help
# Command-specific help
sdv sqlserver run-validation --help
sdv sqlserver generate-validation-scripts --help
sdv sqlserver table-partitioning-helper --help
Resources¶
Full Documentation: CLI_USAGE_GUIDE.md