Amazon Redshift Commands Reference¶
Overview¶
This page provides comprehensive reference documentation for Amazon Redshift-specific commands in the Snowflake Data Validation CLI. For SQL Server commands, see SQL Server Commands Reference. For Teradata commands, see Teradata Commands Reference. For Snowflake-to-Snowflake commands, see Snowflake Commands Reference.
Command Structure¶
All Amazon Redshift commands follow this consistent structure:
Where <command> is one of:
run-validation- Run synchronous validationrun-async-validation- Run asynchronous validationgenerate-validation-scripts- Generate validation scriptsget-configuration-files- Get configuration templatesauto-generated-configuration-file- Interactive config generationrow-partitioning-helper- Interactive row partitioning configurationcolumn-partitioning-helper- Interactive column partitioning configuration
Run Synchronous Validation¶
Validates data between Amazon Redshift and Snowflake in real-time.
Syntax¶
Options¶
--data-validation-config-file, -dvf (required)
- Type: String (path)
- Description: Path to YAML configuration file containing validation settings
- Example:
--data-validation-config-file ./configs/redshift_validation.yaml
--log-level, -ll (optional)
- Type: String
- Valid Values: DEBUG, INFO, WARNING, ERROR, CRITICAL
- Default: INFO
- Description: Logging level for validation execution
- Example:
--log-level DEBUG
Example Usage¶
Use Cases¶
- Real-time validation during Redshift migration
- Pre-cutover validation checks
- Post-migration verification
- Continuous validation in CI/CD pipelines
- Data lake migration validation
Run Asynchronous Validation¶
Performs validation using pre-generated metadata files without connecting to databases.
Syntax¶
Options¶
--data-validation-config-file, -dvf (required)
- Type: String (path)
- Description: Path to YAML configuration file
- Note: Configuration must specify paths to pre-generated metadata files
--log-level, -ll (optional)
- Type: String
- Valid Values: DEBUG, INFO, WARNING, ERROR, CRITICAL
- Default: INFO
- Description: Logging level for validation execution
- Example:
--log-level DEBUG
Example Usage¶
Prerequisites¶
Before running async validation:
- Generate validation scripts using
generate-validation-scripts - Execute the generated scripts on Redshift and Snowflake databases
- Save results to CSV/metadata files
- Ensure metadata files are available in the configured paths
Use Cases¶
- Validating in environments with restricted database access
- Separating metadata extraction from validation
- Batch validation workflows
- Scheduled validation jobs
- When database connections are intermittent
Generate Validation Scripts¶
Generates SQL scripts for Redshift and Snowflake metadata extraction.
Syntax¶
Options¶
--data-validation-config-file, -dvf (required)
- Type: String (path)
- Description: Path to YAML configuration file
--log-level, -ll (optional)
- Type: String
- Valid Values: DEBUG, INFO, WARNING, ERROR, CRITICAL
- Default: INFO
- Description: Logging level for script generation
- Example:
--log-level DEBUG
Example Usage¶
Output¶
The command generates SQL scripts in the output directory configured in your YAML file:
Use Cases¶
- Generating scripts for execution by DBAs
- Compliance requirements for query review
- Environments where direct CLI database access is restricted
- Manual execution and validation workflows
- Separating metadata extraction from validation
Get Configuration Templates¶
Retrieves Redshift configuration templates.
Syntax¶
Options¶
--templates-directory, -td (optional)
- Type: String (path)
- Default: Current directory
- Description: Directory to save template files
- Example:
--templates-directory ./templates
--query-templates (optional)
- Type: Flag (no value required)
- Description: Include J2 (Jinja2) query template files for advanced customization
- Example:
--query-templates
Example Usage¶
Output Files¶
Without --query-templates flag:
With --query-templates flag:
Use Cases¶
- Starting a new Redshift validation project
- Learning Redshift-specific configuration options
- Customizing validation queries for Redshift
- Creating organization-specific templates
Auto-Generate Configuration File¶
Interactive command for Redshift configuration generation.
Syntax¶
Options¶
This command has no command-line options. All input is provided through interactive prompts.
Interactive Prompts¶
The command will prompt for the following information:
-
Redshift host
- Hostname/endpoint of Redshift cluster
- Example:
redshift-cluster.region.redshift.amazonaws.com
-
Redshift port (default: 5439)
- Port number for Redshift connection
- Press Enter to accept default
-
Redshift username
- Authentication username
- Example:
migration_user
-
Redshift password
- Authentication password (hidden input)
- Not displayed on screen for security
-
Redshift database
- Name of the database to validate
- Example:
analytics_db
-
Redshift schema
- Schema name within the database
- Example:
public
-
Output directory path
- Where to save validation results
- Example:
./validation_results
Example Session¶
Generated Configuration¶
The command generates a basic YAML configuration file:
Next Steps After Generation¶
-
Edit the configuration file to add:
- Target connection details (if not using default)
- Tables to validate
- Validation options
- Column selections and mappings
-
Review security settings:
- Consider using environment variables for passwords
- Verify IAM authentication if applicable
-
Add table configurations:
- Specify fully qualified table names
- Configure column selections
- Set up filtering where clauses
-
Test the configuration:
Use Cases¶
- Quick setup for new Redshift users
- Generating baseline configurations
- Testing connectivity during setup
- Creating template configurations for teams
Row Partitioning Helper¶
Interactive command to generate partitioned table configurations for large tables. This helper divides tables into smaller row partitions based on a specified column, enabling more efficient validation of large datasets.
Syntax¶
Options¶
This command has no command-line options. All input is provided through interactive prompts.
How It Works¶
The table partitioning helper:
- Reads an existing configuration file with table definitions
- For each table, prompts whether to apply partitioning
- If partitioning is enabled, collects partition parameters
- Queries the source Redshift database to determine partition boundaries
- Generates new table configurations with
WHEREclauses for each partition - Saves the partitioned configuration to a new file
Interactive Prompts¶
The command will prompt for the following information:
-
Configuration file path
- Path to existing YAML configuration file
- Example:
./configs/redshift_validation.yaml
-
For each table in the configuration:
a. Apply partitioning? (yes/no)
- Whether to partition this specific table
- Default: yes
b. Partition column (if partitioning)
- Column name used to divide the table
- Should be indexed for performance
- Example:
transaction_id,created_date
c. Is partition column a string type? (yes/no)
- Determines quoting in generated WHERE clauses
- Default: no (numeric)
d. Number of partitions
- How many partitions to create
- Example:
10,50,100
Example Session¶
Generated Output¶
The command generates partitioned table configurations with WHERE clauses:
Use Cases¶
- Large table validation: Break multi-billion row tables into manageable chunks
- Parallel processing: Enable concurrent validation of different partitions
- Memory optimization: Reduce memory footprint by processing smaller data segments
- Incremental validation: Validate specific data ranges independently
- Performance tuning: Optimize validation for tables with uneven data distribution
Best Practices¶
-
Choose appropriate partition columns:
- Use indexed columns for better query performance
- Prefer columns with sequential values (IDs, timestamps)
- Avoid columns with highly skewed distributions
-
Determine optimal partition count:
- Consider table size and available resources
- Start with 10-20 partitions for tables with 10M+ rows
- Increase partitions for very large tables (100M+ rows)
-
String vs numeric columns:
- Numeric columns are generally more efficient
- String columns work but may have uneven distribution
-
After partitioning:
- Review generated WHERE clauses
- Adjust partition boundaries if needed
- Test with a subset before full validation
Column Partitioning Helper¶
Interactive command to generate partitioned table configurations for wide tables with many columns. This helper divides tables into smaller column partitions, enabling more efficient validation of tables with a large number of columns.
Syntax¶
Options¶
This command has no command-line options. All input is provided through interactive prompts.
How It Works¶
The column partitioning helper:
- Reads an existing configuration file with table definitions
- For each table, prompts whether to apply column partitioning
- If partitioning is enabled, collects the number of partitions
- Queries the source Redshift database to retrieve all column names for the table
- Divides the columns into the specified number of partitions
- Generates new table configurations where each partition validates only a subset of columns
- Saves the partitioned configuration to a new file
Interactive Prompts¶
The command will prompt for the following information:
-
Configuration file path
- Path to existing YAML configuration file
- Example:
./configs/redshift_validation.yaml
-
For each table in the configuration:
a. Apply column partitioning? (yes/no)
- Whether to partition this specific table by columns
- Default: yes
b. Number of partitions (if partitioning)
- How many column partitions to create
- Example:
3,5,10
Example Session¶
Generated Output¶
The command generates partitioned table configurations with column subsets:
Use Cases¶
- Wide table validation: Break tables with hundreds of columns into manageable chunks
- Memory optimization: Reduce memory footprint by validating fewer columns at a time
- Parallel processing: Enable concurrent validation of different column groups
- Targeted validation: Validate specific column groups independently
- Performance tuning: Optimize validation for tables with many LOB or complex columns
Best Practices¶
-
Determine optimal partition count:
- Consider the total number of columns in the table
- For tables with 50+ columns, start with 3-5 partitions
- For tables with 100+ columns, consider 5-10 partitions
-
Column ordering:
- Columns are divided alphabetically
- Related columns may end up in different partitions
-
After partitioning:
- Review generated column lists
- Verify all required columns are included
- Test with a subset before full validation
-
Combine with row partitioning:
- For very large, wide tables, consider using both row and column partitioning
- First partition by columns, then apply row partitioning to each column partition if needed
Amazon Redshift Connection Configuration¶
Redshift connections require specific configuration in the YAML file.
Connection Example¶
Connection Fields¶
mode (required)
- Type: String
- Valid Values:
credentials - Description: Connection mode for Redshift
host (required)
- Type: String
- Description: Redshift cluster endpoint
- Format:
<cluster-name>.<cluster-id>.<region>.redshift.amazonaws.com - Examples:
"redshift-cluster-1.abc123.us-east-1.redshift.amazonaws.com""analytics-cluster.xyz789.eu-west-1.redshift.amazonaws.com""data-warehouse.def456.ap-southeast-1.redshift.amazonaws.com"
port (required)
- Type: Integer
- Default: 5439
- Description: Redshift port number
- Note: Use the port configured for your Redshift cluster
username (required)
- Type: String
- Description: Redshift authentication username
- Example:
"migration_admin"
password (required)
- Type: String
- Description: Redshift authentication password
- Security Note: Consider using environment variables or IAM authentication
database (required)
- Type: String
- Description: Redshift database name
- Example:
"analytics_database"
Connection Examples¶
Production Connection:
Development Connection:
Data Lake Migration Connection:
Complete Amazon Redshift Examples¶
Example 1: Basic Redshift Configuration¶
Example 2: Redshift Data Lake Migration¶
Example 3: Redshift with Complex Filtering¶
Example 4: Redshift View Validation¶
Validate Amazon Redshift views alongside tables for comprehensive migration verification.
Note: View validation creates temporary tables internally to materialize view data for comparison between Amazon Redshift and Snowflake.
Troubleshooting Redshift Connections¶
Issue: Connection Timeout¶
Symptom:
Solutions:
-
Verify the cluster endpoint and port:
-
Check VPC security groups allow inbound connections on port 5439
-
Verify the cluster is publicly accessible (if connecting from outside VPC)
-
Check route tables and network ACLs
-
Verify the cluster is in “available” state in AWS console
Issue: Authentication Failed¶
Symptom:
Solutions:
-
Verify credentials are correct
-
Check user has necessary permissions:
-
Verify user account exists:
-
Check if password has expired or needs to be reset
Issue: Database Not Found¶
Symptom:
Solutions:
-
Verify database name is correct (case-sensitive)
-
List available databases:
-
Ensure user has access to the database
Issue: SSL/TLS Certificate Errors¶
Symptom:
Solutions:
-
Verify SSL is required for the cluster
-
Check AWS Redshift SSL/TLS settings
-
Ensure you’re using the correct endpoint (not VPC endpoint)
Issue: Network/VPC Configuration¶
Symptom:
Solutions:
-
Check cluster publicly accessible setting:
- In AWS Console, verify “Publicly accessible” is enabled if connecting externally
-
Verify VPC security group rules:
- Inbound rule: Type = Custom TCP, Port = 5439, Source = Your IP
-
Check VPC route table:
- Ensure proper routing to internet gateway (for public access)
-
Verify VPC Network ACLs:
- Allow inbound/outbound traffic on port 5439
Best Practices for Amazon Redshift¶
Security¶
-
Use IAM authentication when possible:
-
Store passwords securely:
-
Use read-only accounts:
-
Restrict VPC access:
- Configure security groups to allow access only from specific IPs
- Use VPC endpoints for internal AWS connectivity
Performance¶
-
Enable chunking for large tables:
-
Use WHERE clauses to filter data:
-
Optimize thread count:
-
Consider cluster size and workload:
- Run validations during off-peak hours
- Monitor cluster performance during validation
Data Quality¶
-
Handle distribution and sort keys:
- Be aware that Redshift distribution/sort keys may affect data ordering
- Use appropriate index columns that match distribution keys
-
Start with schema validation:
-
Progress to metrics validation:
-
Enable row validation selectively:
AWS-Specific Considerations¶
-
Monitor cluster performance:
- Use AWS CloudWatch metrics during validation
- Monitor query performance and WLM queues
-
Consider cluster maintenance windows:
- Avoid running validations during maintenance windows
- Check cluster status before starting validation
-
Use appropriate cluster endpoints:
- Use cluster endpoint for direct connections
- Use VPC endpoint for internal AWS connectivity
-
Handle AWS region-specific configurations: