DCM Projects for data pipelines¶
DCM Projects provide a full-lifecycle developer experience which includes capabilities tailored to managing data pipelines.
The pipeline-specific commands don’t apply to all object types. They extend the core commands for the following pipeline use cases:
REFRESH command for dynamic tables managed by a DCM project.
TEST command for data quality expectations attached to managed objects.
PREVIEW command for checking sample output from a dynamic table, view, or table before deploying.
REFRESH command for dynamic tables¶
After you deploy a pipeline definition change, you can refresh the dynamic tables inside the pipeline project before testing data quality expectations, so that any new transformation logic is applied end to end.
You can refresh all dynamic tables managed by the DCM project and their required upstream dynamic tables with one command. This command applies only to dynamic tables that are deployed and managed by the referenced project, independent of any definition files. Other object types, such as tasks, are not affected.
See TEST command for data quality expectations for usage examples that combine REFRESH and TEST.
The command runs until all dynamic table refreshes are complete and returns a summary of the row changes or errors for each dynamic table.
To run the REFRESH command:
The JSON output contains the results of the dynamic table refresh operation in the following format:
Property |
Description |
|---|---|
|
Contains the results of the dynamic table refresh operation. |
|
An array of entries, one for each dynamic table that was refreshed. |
|
Fully qualified name of the dynamic table that was refreshed. |
|
Refresh statistics for the table. |
|
Number of rows inserted during the refresh. |
|
Number of rows deleted during the refresh. |
|
ISO 8601 timestamp representing the point-in-time freshness of the data after the refresh. |
An example of the JSON output for a dynamic table refresh:
TEST command for data quality expectations¶
You can set data quality expectations as quality gates on all stages of your data transformation:
Attach expectations to raw data in your bronze layer landing tables to ensure your raw input meets expectations and does not cause errors during transformation.
Attach expectations as quality gates to your silver layer to make it easier to debug data issues by having checkpoints at different transformation stages.
Attach expectations to your gold layer to ensure the output quality of your data product.
Attach expectations from downstream consumers of your data product to your gold layer so you can validate those expectations before deploying breaking changes.
See Data metric function for how to attach expectations in DCM projects.
You can test all data quality expectations attached to tables, dynamic tables, or views that are managed by the DCM project with one command.
Data metric functions that are attached without expectations are not checked.
You can use the CLI commands to set up automated testing as part of your CI/CD workflow. For example, if you have production-like data on a QA, test, or staging environment, you can follow these steps:
PLAN against QA to verify the expected project definition changes.
DEPLOY to QA.
REFRESH ALL dynamic tables on QA to update data based on any new transformation logic and updated definitions, so that expectations are not tested against outdated data.
TEST ALL data quality expectations attached to table objects on the QA environment to verify that the newly deployed logic works as expected and has no negative side effects on the expected shape of your data output.
If all expectations are met on QA, continue with PLAN and DEPLOY to your production environment.
To run the TEST command:
The TEST output contains the overall status and expectations with their values in the following format:
Important
During the preview phase, the exact output format might change.
Property |
Description |
|---|---|
|
Overall result of the test run. Possible values: |
|
An array of expectation results, one for each data quality expectation evaluated. |
|
Fully qualified name of the table or view on which the expectation was evaluated. |
|
Database that contains the data metric function. |
|
Schema that contains the data metric function. |
|
Name of the data metric function (for example, |
|
Name of the expectation as defined in the project. |
|
Boolean expression that the metric value is evaluated against (for example, |
|
The result of the data metric function evaluation. Present only when |
|
Whether the expectation was violated. |
|
An array of column names on which the data metric function was evaluated. |
An example of the JSON output for a data quality test:
PREVIEW command¶
When you write or alter the SELECT statement of a dynamic table or view, a sample output helps validate the shape of the data. For complex lineage graphs with multiple transformation steps, you can check the output of a downstream view or dynamic table when making changes further upstream.
To validate that the transformation in your code results in the expected data output before deploying, run the PREVIEW command.
The PREVIEW command runs PLAN to compile the current definitions, independent of any deployed state, and then returns a data sample for a specified dynamic table, view, or regular table.
Keep the following requirements and considerations in mind:
The PREVIEW command must always reference a fully qualified name of a table object, without Jinja variables.
To see sample data in the output, you must ensure that data is already available in the source tables.
PREVIEW queries all SELECT statements of referenced dynamic tables and views, but it does not run tasks or CREATE TABLE AS SELECT statements.
To run the PREVIEW command: