Data Lineage¶
Snowflake tracks how data flows from source to target objects, for example from a table to a view, and lets you see where the data in an object came from or where it goes. This information is called data lineage, and it helps you understand the relationships between your Snowflake objects.
Data lineage captures two types of relationship:
Data movement, such as when data is copied or materialized from one object to another. For example, CREATE TABLE AS SELECT (CTAS), INSERT, or MERGE operations on tables result in data movement.
Object dependencies, when an object references a base object but does not materialize or copy data, such as when a view references a table.
Snowflake data lineage provides these benefits:
Provides impact analysis by understanding the relationship between different objects.
Enhances monitoring and troubleshooting by viewing data movement lineage and object dependencies.
Facilitates compliance by tracking the flow of sensitive data.
Helps you work with tags and masking policies on columns to protect sensitive data.
Enhances trust in the data by understanding the source and target objects and columns.
Allows administration for viewing lineage to be delegated. For more information, see Access control for lineage information.
About upstream and downstream relationships¶
Data lineage helps you understand the relationships of an object in terms of source and target objects. In lineage terminology, the source object is “upstream” of the target object, and the target object is “downstream” of the source object. Snowsight reveals objects incrementally, one step at a time upstream or downstream from your selection.
For example, in this SQL statement:
CREATE TABLE table2 AS SELECT col1 FROM table1;
table2
is the target table, and is downstream of the source table, table1
. Column col1
, which originates
in table table1
, is included in table table2
; this is also a downstream lineage relationship.
If you view the details of table table1
in Snowsight, the Lineage tab displays an arrow pointing from table1
to
table2
to indicate the downstream lineage relationship. If you instead start at table table2
, an arrow points from
table2
upstream to table1
.
Get started¶
To start using data lineage in Snowsight, do the following:
Sign in to Snowsight with the necessary privileges.
Use the Data » Databases menu to select a supported object such as a table or view.
Select the Lineage tab.
Basic actions on the Lineage tab include the following:
A. Select an object to show additional details about it, including columns and tags on those columns.
B. Select +/- to show or hide objects that are further upstream or downstream.
C. Select the arrow that connects two objects to show information about how the downstream object was created (for example, the SQL statement that created an object). Your access control privileges determine what information appears.
D. Opens a new lineage diagram that focuses on the lineage of the selected object.
To learn about using the Lineage tab to perform other actions, see the following:
Column lineage¶
You can use Snowsight to trace the relationship between columns in a source object and columns in a target object. For a given column, you can determine all upstream and downstream columns that share lineage with the column.
To determine the lineage of a column:
Sign in to Snowsight with the necessary privileges.
Use the Data » Databases menu to select a supported object such as a table or view.
Select the Lineage tab.
Select the object that contains the column you are interested in tracing. A side panel opens.
Hover over the column name in the side panel, and select View Lineage.
Select Upstream Lineage or Downstream Lineage to list the columns in upstream or downstream objects.
You can use the Distance column to determine how far away a column is in the lineage. For example, if the downstream distance is 1, then the column is in an object that was created directly from the current object. If the downstream distance is 2, then the column exists in an object that was created from an object that was created from the current object.
Identify masking policies¶
Sign in to Snowsight with the necessary privileges.
Use the Data » Databases menu to select a supported object such as a table or view.
Select the Lineage tab.
Select the object you are interested in. A side panel opens.
To view the masking policy on a column of the object, find the column in the Columns section. If the column is protected by a masking policy, a symbol appears next to the column name. Hover over the symbol to see the masking policy name and details.
If there’s a problem with the masking policy, for example there are multiple masking policies assigned to the same column, Policy Error appears instead of the mask symbol. If you hover over Policy Error, an explanation of the error appears. For additional help identifying why the error might have occurred, see Tag and policy discovery and Troubleshoot tag-based masking policies.
Retrieve lineage programmatically¶
You can use the GET_LINEAGE (SNOWFLAKE.CORE) function to retrieve lineage information programmatically. This function returns a subset of the information provided by the Lineage tab in Snowsight.
Supported operations¶
The following operations create upstream and downstream relationships between a source object and a target object:
CREATE TABLE … AS SELECT (CTAS)
UPDATE, for example:
UPDATE mydb.schema1.table1 FROM mydb.schema2.table2 SET table1.col1 = table2.col1;
Supported objects¶
Data lineage supports data movement and dependency between table-like objects. A “table-like” object is any object that can be queried like a table, including tables (nothing is more table-like than a table). Table-like objects include:
Tables
Dynamic tables
External tables
Iceberg tables
Views
Materialized views
Stages can also participate in data lineage relationships, as can the following machine learning objects.
Feature Views (which are actually a dynamic tables or views inside Snowflake)
Column lineage is supported between columns in any two table-like objects. You can, for example, select a column in a table to view downstream column lineage, which shows the other table-like objects where that column appears.
Additionally, you can see tag and masking policy associations if you are using a role that has privileges for managing tags and masking policies.
ML Lineage¶
ML Lineage specifically supports machine learning relationships, which focus on how data is used and transformed in machine learning workflows, rather than on simpler movement or dependency relationships. Relationships between the following types of objects are supported:
Feature Views (which is actually a dynamic table or a view inside Snowflake)
Access control for lineage information¶
A role with the following privileges can access the Lineage tab and view an object’s upstream and downstream lineage objects and dependencies:
VIEW LINEAGE on the account.
Any privilege on the objects for which you want to evaluate the lineage, such as SELECT on a table.
USAGE on the database and schema that contains the object.
Tip
You can configure a role to view all objects in the lineage, even if it does not have privileges on the objects, database, or schema.
Simply grant the role the RESOLVE ALL privilege on the account, for example, GRANT RESOLVE ALL ON ACCOUNT TO ROLE lineage_role;
. The
role still requires the VIEW LINEAGE privilege.
The following example creates a custom role and grants the necessary privileges to it:
USE ROLE ACCOUNTADMIN;
CREATE ROLE lineage_role;
GRANT VIEW LINEAGE ON ACCOUNT TO ROLE lineage_role;
GRANT SELECT ON mydb.sch.mytable TO ROLE lineage_role;
If you do not have privileges on an upstream or downstream object in the lineage graph, the object appears gray with a message stating that you have insufficient privileges to view the object. The gray object does not imply a terminal node in the lineage graph; it merely indicates that you cannot view lineage any further upstream or downstream from that point because you don’t have the privileges to retrieve that object’s lineage. This behavior also applies to objects and columns protected by other access policies.
Renamed and deleted table nodes¶
Renamed tables are shown in the graph, but deleted tables are not shown.
Lineage retention¶
The Lineage graph provides lineage information in real time; tracked operations appear without delay in the Snowsight UI. Historical information is retained as follows:
Column lineage is retained for one year.
Object lineage is retained for one year.
Limitations¶
The Lineage tab is not available for the following kinds of objects:
Objects in a shared database.
Objects in the shared SNOWFLAKE database.
Objects in the INFORMATION_SCHEMA of a database.
Dynamic tables appear in the lineage graph for other objects, but the Lineage tab does not appear for dynamic tables themselves.