Data Lineage¶
Snowflake tracks how data flows from source to target objects, for example from a table to a view, and lets you see where the data in an object came from or where it goes. This information is called data lineage, and it helps you understand the relationships between your Snowflake objects.
Data lineage captures two types of relationship:
- Data movement, such as when data is copied or materialized from one object to another. For example, CREATE TABLE AS SELECT (CTAS), INSERT, or MERGE operations on tables result in data movement. 
- Object dependencies, when an object references a base object but does not materialize or copy data, such as when a view references a table. 
Snowflake data lineage provides these benefits:
- Provides impact analysis by understanding the relationship between different objects. 
- Enhances monitoring and troubleshooting by viewing data movement lineage and object dependencies. 
- Facilitates compliance by tracking the flow of sensitive data. 
- Helps you work with tags and masking policies on columns to protect sensitive data. 
- Enhances trust in the data by understanding the source and target objects and columns. 
- Allows administration for viewing lineage to be delegated. For more information, see Access control for lineage information. 
About upstream and downstream relationships¶
Data lineage helps you understand the relationships of an object in terms of source and target objects. In lineage terminology, the source object is “upstream” of the target object, and the target object is “downstream” of the source object. Snowsight reveals objects incrementally, one step at a time upstream or downstream from your selection.
For example, in this SQL statement:
CREATE TABLE table2 AS SELECT col1 FROM table1;
table2 is the target table, and is downstream of the source table, table1. Column col1, which originates
in table table1, is included in table table2; this is also a downstream lineage relationship.
If you view the details of table table1 in Snowsight, the Lineage tab displays an arrow pointing from table1 to
table2 to indicate the downstream lineage relationship. If you instead start at table table2, an arrow points from
table2 upstream to table1.
Get started¶
To start using data lineage in Snowsight, do the following:
- Sign in to Snowsight with the necessary privileges. 
- In the navigation menu, select Catalog » Database Explorer, and then select a supported object such as a table or view. 
- Select the Lineage tab. 
Basic actions on the Lineage tab include the following:
- A. Select an object to show additional details about it, including columns and tags on those columns. 
- B. Select +/- to show or hide objects that are further upstream or downstream. 
- C. Select the arrow that connects two objects to show information about how the downstream object was created (for example, the SQL statement that created an object). Your access control privileges determine what information appears. 
- D. Opens a new lineage diagram that focuses on the lineage of the selected object. 
To learn about using the Lineage tab to perform other actions, see the following:
Column lineage¶
You can use Snowsight to trace the relationship between columns in a source object and columns in a target object. For a given column, you can determine all upstream and downstream columns that share lineage with the column.
To determine the lineage of a column:
- Open the Lineage tab and select the object that contains the column you are interested in tracing. A side panel opens. 
- Hover over the column name in the side panel, and select View Lineage. 
- Select Upstream Lineage or Downstream Lineage to list the columns in upstream or downstream objects. - You can use the Distance column to determine how far away a column is in the lineage. For example, if the downstream distance is 1, then the column is in an object that was created directly from the current object. If the downstream distance is 2, then the column exists in an object that was created from an object that was created from the current object. 
Identify masking policies¶
- Open the Lineage tab, and select the object you are interested in. A side panel opens. 
- To view the masking policy on a column of the object, find the column in the Columns section. If the column is protected by a masking policy, a symbol appears next to the column name. Hover over the symbol to see the masking policy name and details. - If there’s a problem with the masking policy, for example there are multiple masking policies assigned to the same column, Policy Error appears instead of the mask symbol. If you hover over Policy Error, an explanation of the error appears. For additional help identifying why the error might have occurred, see Tag and policy discovery and Troubleshoot tag-based masking policies. 
Lineage created by a stored procedure or task¶
A stored procedure or task can result in lineage between an upstream object and a downstream object. You can select the arrow that connects the objects to obtain more information about the stored procedure or task. You must have privileges to access the stored procedure or task to view this information.
If the downstream object was created by a stored procedure, the Stored Procedures section contains the following information:
- Direct — Displays the name of the stored procedure that, when executed, resulted in the downstream object. 
- Root — If the direct stored procedure is nested within other stored procedures, this field displays the name of the stored procedure that is at the top of the hierarchy of nested procedures. 
To view additional information about a stored procedure, select the Go to procedure icon next to its name.
Keep in mind the following:
- If you call a stored procedure anonymously, details about the stored procedure do not appear in the lineage. 
- Details about stored procedures and tasks are not backfilled. Lineage that occurred before the introduction of support for stored procedures and tasks doesn’t include details about the stored procedure or task. 
Retrieve lineage programmatically¶
You can use the GET_LINEAGE (SNOWFLAKE.CORE) function to retrieve lineage information programmatically. This function returns a subset of the information provided by the Lineage tab in Snowsight.
Supported operations for data lineage¶
The following operations create upstream and downstream relationships between a source object and a target object:
- CREATE TABLE … AS SELECT (CTAS) 
- UPDATE, for example: - UPDATE mydb.schema1.table1 FROM mydb.schema2.table2 SET table1.col1 = table2.col1; 
Supported objects¶
Data lineage supports data movement and dependency between table-like objects. A “table-like” object is any object that can be queried like a table, including tables (nothing is more table-like than a table). Table-like objects include:
- Tables 
- Dynamic tables 
- External tables 
- Iceberg tables 
- Views 
- Materialized views 
Stages can also participate in data lineage relationships, as can the following machine learning objects.
- Feature Views (which are actually a dynamic tables or views inside Snowflake) 
Column lineage is supported between columns in any two table-like objects. You can, for example, select a column in a table to view downstream column lineage, which shows the other table-like objects where that column appears.
Additionally, you can see tag and masking policy associations if you are using a role that has privileges for managing tags and masking policies.
ML Lineage¶
ML Lineage specifically supports machine learning relationships, which focus on how data is used and transformed in machine learning workflows, rather than on simpler movement or dependency relationships. Relationships between the following types of objects are supported:
- Feature Views (which is actually a dynamic table or a view inside Snowflake) 
Access control for lineage information¶
A role with the following privileges can access the Lineage tab and view an object’s upstream and downstream lineage objects and dependencies:
- VIEW LINEAGE on the account. 
- Any privilege on the objects for which you want to evaluate the lineage, such as SELECT on a table. If you want to let users view the lineage of an object without being able to access its data, you can grant the REFERENCES privilege on the object. 
- USAGE on the database and schema that contains the object. 
The VIEW LINEAGE privilege controls whether a user can view data lineage for their objects. By default, the PUBLIC role has this privilege, which means everyone has the ability to view lineage. To narrow who can view lineage, you can revoke the VIEW LINEAGE privilege from the PUBLIC role and grant it to custom roles instead.
You can configure a role to view the full lineage of all objects, even if the role doesn’t have privileges on the objects, database, or schema.
Simply grant the role the RESOLVE ALL privilege on the account, for example, GRANT RESOLVE ALL ON ACCOUNT TO ROLE lineage_role;. The
role still requires the VIEW LINEAGE privilege.
If a user does not have privileges on an upstream or downstream object in the lineage graph, the object appears gray with a message stating that they have insufficient privileges to view the object. The gray object does not imply a terminal node in the lineage graph; it merely indicates that the user cannot view lineage any further upstream or downstream from that point because they don’t have the privileges to retrieve that object’s lineage. This behavior also applies to objects and columns protected by other access policies.
A user must have privileges to access the QUERY_HISTORY view to see the SQL statement that resulted in the target object.
Lineage history and retention¶
Lineage was introduced to Snowflake in November 2024. Lineage information is available as follows:
- Lineage for an object dependency (for example, a view based on a table) that occurred before this date is available. 
- Lineage for data movement (for example, using a CTAS statement to create a table from another table) that occurred before this date is not available. 
Historical information is retained as follows:
- Column lineage is retained for one year. 
- Object lineage is retained for one year. 
Limitations and considerations¶
- Lineage is not available for the following kinds of objects: - Objects in a shared database. 
- Objects in the shared SNOWFLAKE database. 
- Objects in the INFORMATION_SCHEMA of a database. 
 
- Dynamic tables appear in the lineage graph for other objects, but the Lineage tab does not appear for dynamic tables themselves. 
- Deleted tables are not shown in the lineage graph, but renamed tables are shown. 
- Temporary tables are not shown in the lineage graph. 
- Lineage does not include a table that was used for filtering or joining when data did not move from the table to the downstream object. In the following example, table - t2is not considered part of the lineage of table- target_table:- CREATE TABLE target_table AS SELECT t1.c1, t1.c2 FROM t1, t2 WHERE t1.c3 = t2.c3; 
- Lineage cannot track the movement of data that results from separate, disjointed queries. For example, the following set of queries does not result in lineage from table - sourceTable1to table- target_table.- SET read_output1 = (SELECT c1 FROM sourceTable1); INSERT INTO target_table(c1) VALUES ($read_output1); - This limitation applies to anything that caused the data movement, including stored procedures. 
- You cannot use the GET_LINEAGE (SNOWFLAKE.CORE) function to obtain lineage information related to a stored procedure. 
