Data Lineage in Snowsight¶
Snowflake tracks how data flows from source to target objects, for example from a table to a view, and lets you see where the data in an object came from or where it goes. This information is called data lineage, and it helps you understand the relationships between your Snowflake objects.
Data lineage captures two types of relationship:
Data movement, such as when data is copied or materialized from one object to another. For example, CREATE TABLE AS SELECT (CTAS), INSERT, or MERGE operations on tables result in data movement.
Object dependencies, when an object references a base object but does not materialize or copy data, such as when a view references a table.
You can use the Lineage tab in Snowsight to view the lineage of objects and columns. Snowflake also provides a SQL function for retrieving lineage information programmatically. A related feature, ML Lineage, provides lineage information for machine learning models and related objects.
Snowflake data lineage provides these benefits:
Provides impact analysis by understanding the relationship between different objects.
Enhances monitoring and troubleshooting by viewing data movement lineage and object dependencies.
Facilitates compliance by tracking the flow of sensitive data.
Enhances trust in the data by understanding the source and target objects and columns.
Allows administration for viewing lineage to be delegated. For more information, see Access lineage information in Snowsight.
This topic provides information on how to use Snowsight to view object and column lineage, and set tags on columns that appear in either a downstream or upstream table.
About data lineage in Snowsight¶
You can use the Lineage tab in Snowsight to understand the relationships of an object in terms of source and target objects. In lineage terminology, the source object is “upstream” of the target object, and the target object is “downstream” of the source object. Snowsight reveals objects incrementally, one step at a time upstream or downstream from your selection.
For example, in this SQL statement:
CREATE TABLE table2 AS SELECT col1 FROM table1;
table2
is the target table, and is downstream of the source table, table1
. Column col1
, which originates
in table table1
, is included in table table2
; this is also a downstream lineage relationship. If you view the
details of table table1
in Snowsight, the Lineage tab displays an arrow pointing from table1
to
table2
to indicate the downstream lineage relationship. If you instead start at table table2
, an arrow points from
table2
upstream to table1
. You can also see the similar relationships for the column col1
in both tables.
Using this information, you can understand how sensitive data flows from source table to the target table and develop strategies to protect sensitive columns, for example with a masking policy, and tag the sensitive columns.
Supported operations¶
Snowflake tracks data lineage for these operations on tables and table-like objects, including dynamic tables, external tables, views, and materialized views:
CREATE TABLE … AS SELECT (CTAS)
UPDATE, for example:
UPDATE mydb.schema1.table1 FROM mydb.schema2.table2 SET table1.col1 = table2.col1;
Supported objects¶
Data lineage supports these objects:
Table
Dynamic table
External table
View
Materialized view
Stage
Feature view
Dataset
Model
Data lineage is supported between tables and table-like objects, including dynamic tables, external tables, views, and materialized views.
Column lineage is supported between columns in tables and table-like objects. You can select a table and a column in it to view downstream column lineage, which shows the other tables in which the selected column appears.
Additionally, you can view tag and masking policy associations if you are using a role that has privileges for managing tags and masking policies.
ML Lineage¶
ML Lineage supports the following types of objects:
Feature Views (which is actually a dynamic table or a model inside Snowflake)
Renamed and deleted table nodes¶
Renamed and deleted tables are shown in the graph; deleted tables are shown in gray.
Lineage retention and backfilling¶
The Lineage graph provides lineage information in real time; tracked operations appear without delay in the Snowsight UI. Historical information is retained as follows:
Column lineage is retained for one year.
Object lineage is retained for one year.
Lineage for deleted objects is retained for 14 days.
For this preview, dependencies that do not involve data movement are backfilled up to the retention limit from your Snowflake activity history. For example, if a view was created from a table six months ago, this dependency is visible as a downstream lineage relationship from the table, and an upstream relationship from the view. Table-to-table (data movement) relationships that happened prior to this preview (which launched in November, 2024) might not be captured.
Note
A few accounts might not see full historical lineage data immediately at the launch of this preview. Lineage data is still being backfilled for these accounts and should be available within the next few weeks.
Limitations¶
The Lineage tab is not available for the following kinds of objects:
Objects in a shared database.
Objects in the shared SNOWFLAKE database.
Objects in the INFORMATION_SCHEMA of a database.
Dynamic tables appear in the lineage graph for other objects, but the Lineage tab does not appear for dynamic tables themselves.
Query information is available only for data movement relationships, not for object dependencies. Only a user with the ACCOUNTADMIN role can view query information. Query information is not available to other roles.
Access lineage information in Snowsight¶
To access the Lineage tab and view all of an object’s upstream and downstream lineage objects and dependencies, use a role with the following privileges:
Any privilege on the objects for which you want to evaluate the lineage, such as SELECT on a table.
USAGE on the database and schema that contains the object.
VIEW LINEAGE on the account.
The following example creates a custom role and grants the necessary privileges to it:
USE ROLE ACCOUNTADMIN;
CREATE ROLE lineage_role;
GRANT VIEW LINEAGE ON ACCOUNT TO ROLE lineage_role;
GRANT SELECT ON mydb.sch.mytable TO ROLE lineage_role;
If you do not have privileges on an upstream or downstream object in the lineage graph, the object appears gray with a message stating that you have insufficient privileges to view the object. Ask your Snowflake account administrator to grant you privileges on the object. If necessary, also request the global VIEW LINEAGE privilege.
The gray object does not imply a terminal node in the lineage graph; it merely indicates that you cannot view lieage any further upstream or downstream from that point because you don’t have the privileges to retrieve that object’s lineage. This behavior also applies to objects and columns protected by other access policies.
Monitor data lineage and tag columns¶
Note
Tagging requires Snowflake Enterprise Edition.
To monitor the data lineage for tables and tag sensitive columns, do the following:
Use a role that has the necessary privileges to view the upstream and downstream lineage of an object.
In Snowsight, navigate to Data » Databases.
Select a database, schema, and a supported object.
Select the Lineage tab.
Select a warehouse if a warehouse is not already in use. Generating a lineage graph requires a warehouse.
The lineage graph begins with the object and one level of downstream and upstream objects, as applicable. An arrow points from left to right to indicate that a column in a target table includes a column from a source table. You can select a table and view its columns.
If a masking policy is set on a column and there is an error with the policy, you can finish the tag assignments in this procedure and switch to the Columns tab (for the table or view) to edit the masking policy assignment. If you hover over the Policy Error, an explanation of the error appears.
To see more details about a specific table column in the lineage graph, select a table. In the details panel that appears, hover over a column and select View Lineage or View Dependencies based on your graph selection.
If your table has many columns, you can use the search filter to locate the column.
In the View Column Lineage dialog, select Downstream Lineage or Upstream Lineage.
Evaluate the column and table details, as well as the tags assigned to the column.
The tags on the column are associated with the source column and whether or not the target column has tags.
If a table includes a column that should not be there, modify the SQL statement that creates the table.
If a table has a tag assigned to a column that should not have that tag, update the SQL statement that assigns the tag to the column or use Snowsight to update the tag assignment.
If you are using the ACCOUNTADMIN role or a role that is granted the global APPLY TAG privilege, the Review Columns and Apply Tags banner appears. To apply tags to columns, do the following:
In the banner, select Review Columns and Apply Tags.
Changes that have not yet been applied are highlighted with different colors and patterns:
New tags to set on the column have a dashed blue border.
Tag values to update have a yellow border.
If tags are not already set on the column based on the table you selected in the graph, the banner does not appear.
Select Apply Tags to apply the tags and update the tag values in the columns.
If any tags cannot be set on the columns, an error message displays the number of columns for tags that could not be set. Select Review Columns and Apply Tags to repeat the previous step to assign tags.
After you assign the tags to the columns, the banner disappears.