Snowflake Data Validation CLI - Quick Reference¶
This quick reference guide provides a condensed overview of commands, configuration options, and common usage patterns for the Snowflake Data Validation CLI tool, designed for easy lookup during validation tasks.
Installation¶
Prerequisites¶
Before running the commands below, ensure that Python 3.10 or later and pip are installed on your system.
Command Structure¶
Dialects: sqlserver | teradata | redshift | snowflake
Common Commands¶
run-validation¶
generate-validation-scripts¶
run-async-validation¶
get-configuration-files¶
auto-generated-configuration-file¶
row-partitioning-helper¶
Interactive command to partition large tables by rows for more efficient validation.
column-partitioning-helper¶
Interactive command to partition wide tables by columns for more efficient validation.
Configuration Template¶
This template provides the core structure for configuring data validation jobs, defining source and target connections, validation rules, and table-specific settings that control how data is compared between your source database and Snowflake.
Table Configuration Examples¶
Include All Columns¶
Include Specific Columns¶
Exclude Specific Columns¶
With Filtering¶
With Column Mappings¶
Large Table with Chunking¶
View Configuration Examples¶
Views are validated similarly to tables but are configured in a separate views: section. View validation creates temporary tables internally to materialize view schema for comparison.
Basic View Validation¶
View with Column Selection¶
View with Filtering¶
View with Column Mappings¶
Combined Tables and Views Configuration¶
Connection Examples¶
SQL Server¶
Teradata¶
Redshift¶
Snowflake Target (Named) (See more info here: https://docs.snowflake.com/en/developer-guide/snowflake-cli/connecting/configure-connections)¶
Snowflake Target (Default)¶
Snowflake Source (for Snowflake-to-Snowflake validation)¶
Validation Levels¶
Level |
Type |
Description |
Cost |
|---|---|---|---|
1 |
Schema |
Column names, types, nullability |
Low |
2 |
Metrics |
Row counts, distinct values, min/max, avg |
Medium |
3 |
Row |
Hash-based row comparison |
High |
Common CLI Options¶
Option |
Short |
Description |
Default |
|---|---|---|---|
|
|
Config file path |
Required |
|
|
Log level |
INFO |
|
|
Template output dir |
Current dir |
|
Include query templates |
false |
|
|
Results directory |
From config |
Log Levels¶
DEBUG: Detailed diagnostic information
INFO: General informational messages
WARNING: Warning messages
ERROR: Error messages
CRITICAL: Critical errors
Configuration Field Reference¶
Required Fields¶
source_platformtarget_platformoutput_directory_pathsource_connectiontarget_connectiontablesorviews(at least one must have entries)
Optional Fields¶
max_threads(default: “auto”)target_database(required for Teradata)validation_configurationcomparison_configurationlogging_configurationdatabase_mappingsschema_mappings
Table Configuration Fields¶
Field |
Required |
Type |
Description |
|---|---|---|---|
|
✓ |
String |
Full table identifier |
|
✓ |
Boolean |
Include/exclude mode |
|
✓ |
List |
Columns to include/exclude |
|
List |
Primary key columns |
|
|
String |
Source filter |
|
|
String |
Target filter |
|
|
Integer |
Number of chunks (0=off) |
|
|
Integer |
Max failures to report |
|
|
Dict |
Source→Target mappings |
|
|
Boolean |
Case-sensitive matching |
View Configuration Fields¶
View configuration uses the same fields as table configuration. Views are defined in a separate views: section.
Field |
Required |
Type |
Description |
|---|---|---|---|
|
✓ |
String |
Full view identifier |
|
✓ |
Boolean |
Include/exclude mode |
|
✓ |
List |
Columns to include/exclude |
|
List |
Index columns for row validation |
|
|
String |
Source filter |
|
|
String |
Target filter |
|
|
Integer |
Number of chunks (0=off) |
|
|
Integer |
Max failures to report |
|
|
Dict |
Source→Target mappings |
|
|
Boolean |
Case-sensitive matching |
|
|
String |
Override target database |
|
|
String |
Override target schema |
|
|
String |
Override target view name |
Note: Views use temporary tables internally to materialize the schema of the view for validation.
Performance Tips¶
For Large Tables¶
Enable chunking:
Increase threads:
Filter data:
Exclude large columns:
Skip row validation initially:
Common Issues¶
Connection Failed¶
Out of Memory¶
Tolerance for Numerical Differences¶
YAML Syntax Errors¶
Use spaces, not tabs
Quote special characters in YAML: :code:
password: "p@ssw0rd!"If a string value starts or ends with a double quote, escape the double quotes “table #1”
Escape quotes:
name = 'O''Brien'
Asynchronous Workflow¶
The asynchronous workflow allows you to decouple script generation from execution, which is useful when you need to run validation queries manually on the source database or when you have restricted access that requires scheduled execution.
Generate the validation scripts:
Execute the generated scripts manually on your source database and save the results to CSV files in the output directory.
Run the async validation to compare the saved results:
Example Workflows¶
Basic Validation¶
Get the configuration templates:
Edit the generated
config.yamlfile to configure your source and target connections, validation settings, and tables.Run the validation:
Interactive Setup¶
Generate a configuration file interactively by answering prompts:
Run the validation using the generated configuration:
Debug Mode¶
To troubleshoot issues or get detailed execution information, run validation with debug logging:
Large Table Partitioning¶
For validating very large tables, use the partitioning helper to divide tables into smaller segments:
Create a configuration file with your table definitions
Run the partitioning helper:
Follow the prompts to specify partition columns and counts
Run validation with the partitioned configuration
Output Files¶
Generated in output_directory_path:
Validation reports: Schema, metrics, row comparison results
Log files:
data_validation_YYYY-MM-DD_HH-MM-SS.logDifference files:
differencesL1.csv,differencesL2.csvGenerated scripts: (when using
generate-validation-scripts)
Environment Variables¶
For Snowflake connections using default mode, configure:
Help Commands¶
Resources¶
Full Documentation: CLI_USAGE_GUIDE.md
SQL Server Commands: sqlserver_commands.md
Teradata Commands: teradata_commands.md
Redshift Commands: redshift_commands.md
Snowflake Commands: snowflake_commands.md
Configuration Examples: CONFIGURATION_EXAMPLES.md