Snowflake Data Validation CLI - Quick Reference¶

This quick reference guide provides a condensed overview of commands, configuration options, and common usage patterns for the Snowflake Data Validation CLI tool, designed for easy lookup during validation tasks.


Installation¶

Prerequisites¶

Before running the commands below, ensure that Python 3.10 or later and pip are installed on your system.

# Base installation
pip install snowflake-data-validation

# With source-specific drivers
pip install "snowflake-data-validation[sqlserver]"
pip install "snowflake-data-validation[teradata]"
pip install "snowflake-data-validation[redshift]"
Copy

Command Structure¶

snowflake-data-validation <dialect> <command> [options]
# or
sdv <dialect> <command> [options]
Copy

Dialects: sqlserver | teradata | redshift


Common Commands¶

run-validation¶

# SQL Server
sdv sqlserver run-validation --data-validation-config-file config.yaml

# Teradata  
sdv teradata run-validation --data-validation-config-file config.yaml

# Redshift
sdv redshift run-validation --data-validation-config-file config.yaml
Copy

generate-validation-scripts¶

sdv <dialect> generate-validation-scripts --data-validation-config-file config.yaml
Copy

run-async-validation¶

sdv <dialect> run-async-validation --data-validation-config-file config.yaml
Copy

get-configuration-files¶

sdv <dialect> get-configuration-files --templates-directory ./templates
Copy

auto-generated-configuration-file¶

sdv <dialect> auto-generated-configuration-file
Copy

table-partitioning-helper¶

sdv <dialect> table-partitioning-helper
Copy

Interactive command to partition large tables for more efficient validation.


Configuration Template¶

This template provides the core structure for configuring data validation jobs, defining source and target connections, validation rules, and table-specific settings that control how data is compared between your source database and Snowflake.

# GLOBAL
source_platform: SqlServer  # SqlServer | Teradata | Redshift
target_platform: Snowflake
output_directory_path: ./output
max_threads: auto  # "auto" or 1-32
target_database: teradataTargetDatabase # For Teradata sources only - specify target database

# SOURCE CONNECTION
source_connection:
  mode: credentials
  host: "hostname"
  port: 1433
  username: "user"
  password: "pass"
  database: "db"
  # SQL Server only:
  trust_server_certificate: "no"  # yes | no
  encrypt: "yes"  # yes | no | optional

# TARGET CONNECTION
target_connection:
  mode: name  # name | default
  name: "connection_name"  # if mode=name

# VALIDATION
validation_configuration:
  schema_validation: true
  metrics_validation: true
  row_validation: false
  max_failed_rows_number: 100
  exclude_metrics: false
  apply_metric_column_modifier: false

# COMPARISON
comparison_configuration:
  tolerance: 0.01  # 1% tolerance

# LOGGING (optional)
logging_configuration:
  level: INFO  # DEBUG | INFO | WARNING | ERROR | CRITICAL
  console_level: WARNING
  file_level: DEBUG

# MAPPINGS (optional)
database_mappings:
  source_db: target_db

schema_mappings:
  source_schema: target_schema

# TABLES
tables:
  - fully_qualified_name: database.schema.table1
    use_column_selection_as_exclude_list: false
    column_selection_list: []
    index_column_list: []
    where_clause: ""
    target_where_clause: ""
    chunk_number: 0
    column_mappings: {}
Copy

Table Configuration Examples¶

Include All Columns¶

- fully_qualified_name: db.schema.table
  use_column_selection_as_exclude_list: false
  column_selection_list: []
Copy

Include Specific Columns¶

- fully_qualified_name: db.schema.table
  use_column_selection_as_exclude_list: false
  column_selection_list:
    - column1
    - column2
    - column3
Copy

Exclude Specific Columns¶

- fully_qualified_name: db.schema.table
  use_column_selection_as_exclude_list: true
  column_selection_list:
    - audit_timestamp
    - internal_notes
Copy

With Filtering¶

- fully_qualified_name: db.schema.table
  use_column_selection_as_exclude_list: false
  column_selection_list: []
  where_clause: "status = 'ACTIVE' AND created_date >= '2024-01-01'"
  target_where_clause: "status = 'ACTIVE' AND created_date >= '2024-01-01'"
Copy

With Column Mappings¶

- fully_qualified_name: db.schema.table
  use_column_selection_as_exclude_list: false
  column_selection_list: []
  column_mappings:
    source_col1: target_col1
    source_col2: target_col2
Copy

Large Table with Chunking¶

- fully_qualified_name: db.schema.large_table
  use_column_selection_as_exclude_list: false
  column_selection_list: []
  index_column_list:
    - primary_key
  chunk_number: 50
  max_failed_rows_number: 500
Copy

Connection Examples¶

SQL Server¶

source_connection:
  mode: credentials
  host: "sqlserver.company.com"
  port: 1433
  username: "sql_user"
  password: "sql_pass"
  database: "prod_db"
  trust_server_certificate: "no"
  encrypt: "yes"
Copy

Teradata¶

source_connection:
  mode: credentials
  host: "teradata.company.com"
  username: "td_user"
  password: "td_pass"
  database: "prod_db"
Copy

Redshift¶

source_connection:
  mode: credentials
  host: "cluster.region.redshift.amazonaws.com"
  port: 5439
  username: "rs_user"
  password: "rs_pass"
  database: "prod_db"
Copy

Snowflake (Named) (See more info here: https://docs.snowflake.com/en/developer-guide/snowflake-cli/connecting/configure-connections)¶

target_connection:
  mode: name
  name: "my_snowflake_connection"
Copy

Snowflake (Default)¶

target_connection:
  mode: default
Copy

Validation Levels¶

Level

Type

Description

Cost

1

Schema

Column names, types, nullability

Low

2

Metrics

Row counts, distinct values, min/max, avg

Medium

3

Row

Hash-based row comparison

High


Common CLI Options¶

Option

Short

Description

Default

--data-validation-config-file

-dvf

Config file path

Required

--log-level

-ll

Log level

INFO

--templates-directory

-td

Template output dir

Current dir

--query-templates

Include query templates

false

--output-directory

Results directory

From config


Log Levels¶

  • DEBUG: Detailed diagnostic information

  • INFO: General informational messages

  • WARNING: Warning messages

  • ERROR: Error messages

  • CRITICAL: Critical errors


Configuration Field Reference¶

Required Fields¶

  • source_platform

  • target_platform

  • output_directory_path

  • source_connection

  • target_connection

  • tables

Optional Fields¶

  • max_threads (default: “auto”)

  • target_database (required for Teradata)

  • validation_configuration

  • comparison_configuration

  • logging_configuration

  • database_mappings

  • schema_mappings


Table Configuration Fields¶

Field

Required

Type

Description

fully_qualified_name

âś“

String

Full table identifier

use_column_selection_as_exclude_list

âś“

Boolean

Include/exclude mode

column_selection_list

âś“

List

Columns to include/exclude

index_column_list

List

Primary key columns

where_clause

String

Source filter

target_where_clause

String

Target filter

chunk_number

Integer

Number of chunks (0=off)

max_failed_rows_number

Integer

Max failures to report

column_mappings

Dict

Source→Target mappings

is_case_sensitive

Boolean

Case-sensitive matching


Performance Tips¶

For Large Tables¶

  1. Enable chunking:

    chunk_number: 100
    
    Copy
  2. Increase threads:

    max_threads: 32
    
    Copy
  3. Filter data:

    where_clause: "date >= '2024-01-01'"
    
    Copy
  4. Exclude large columns:

    use_column_selection_as_exclude_list: true
    column_selection_list:
      - large_blob
      - large_text
    
    Copy
  5. Skip row validation initially:

    validation_configuration:
      schema_validation: true
      metrics_validation: true
      row_validation: false  # Enable after initial validation
    
    Copy

Common Issues¶

Connection Failed¶

# SQL Server SSL issues
trust_server_certificate: "yes"
encrypt: "optional"
Copy

Out of Memory¶

# Reduce parallelism
max_threads: 4

# Enable chunking
chunk_number: 50
Copy

Tolerance for Numerical Differences¶

# Increase tolerance
comparison_configuration:
  tolerance: 0.05  # 5%
Copy

YAML Syntax Errors¶

  • Use spaces, not tabs

  • Quote special characters in YAML: :code:password: "p@ssw0rd!"

  • If a string value starts or ends with a double quote, escape the double quotes “table #1”

  • Escape quotes: name = 'O''Brien'


Asynchronous Workflow¶

The asynchronous workflow allows you to decouple script generation from execution, which is useful when you need to run validation queries manually on the source database or when you have restricted access that requires scheduled execution.

  1. Generate the validation scripts:

    sdv sqlserver generate-validation-scripts --data-validation-config-file config.yaml
    
    Copy
  2. Execute the generated scripts manually on your source database and save the results to CSV files in the output directory.

  3. Run the async validation to compare the saved results:

    sdv sqlserver run-async-validation --data-validation-config-file config.yaml
    
    Copy

Example Workflows¶

Basic Validation¶

  1. Get the configuration templates:

    sdv sqlserver get-configuration-files
    
    Copy
  2. Edit the generated config.yaml file to configure your source and target connections, validation settings, and tables.

  3. Run the validation:

    sdv sqlserver run-validation --data-validation-config-file config.yaml
    
    Copy

Interactive Setup¶

  1. Generate a configuration file interactively by answering prompts:

    sdv sqlserver auto-generated-configuration-file
    
    Copy
  2. Run the validation using the generated configuration:

    sdv sqlserver run-validation --data-validation-config-file generated_config.yaml
    
    Copy

Debug Mode¶

To troubleshoot issues or get detailed execution information, run validation with debug logging:

sdv sqlserver run-validation \
  --data-validation-config-file config.yaml \
  --log-level DEBUG
Copy

Large Table Partitioning¶

For validating very large tables, use the partitioning helper to divide tables into smaller segments:

  1. Create a configuration file with your table definitions

  2. Run the partitioning helper:

    sdv sqlserver table-partitioning-helper
    
    Copy
  3. Follow the prompts to specify partition columns and counts

  4. Run validation with the partitioned configuration


Output Files¶

Generated in output_directory_path:

  • Validation reports: Schema, metrics, row comparison results

  • Log files: data_validation_YYYY-MM-DD_HH-MM-SS.log

  • Difference files: differencesL1.csv, differencesL2.csv

  • Generated scripts: (when using generate-validation-scripts)


Environment Variables¶

For Snowflake connections using default mode, configure:

export SNOWFLAKE_ACCOUNT="account_name"
export SNOWFLAKE_USER="username"
export SNOWFLAKE_PASSWORD="password"
export SNOWFLAKE_DATABASE="database"
export SNOWFLAKE_SCHEMA="schema"
export SNOWFLAKE_WAREHOUSE="warehouse"
export SNOWFLAKE_ROLE="role"
Copy

Help Commands¶

# Main help
sdv --help

# Dialect-specific help
sdv sqlserver --help
sdv teradata --help
sdv redshift --help

# Command-specific help
sdv sqlserver run-validation --help
sdv sqlserver generate-validation-scripts --help
sdv sqlserver table-partitioning-helper --help
Copy

Resources¶