Data Lineage

Snowflake tracks how data flows from source to target objects, for example from a table to a view, and lets you see where the data in an object came from or where it goes. This information is called data lineage, and it helps you understand the relationships between your Snowflake objects.

Data lineage captures two types of relationship:

  • Data movement, such as when data is copied or materialized from one object to another. For example, CREATE TABLE AS SELECT (CTAS), INSERT, or MERGE operations on tables result in data movement.

  • Object dependencies, when an object references a base object but does not materialize or copy data, such as when a view references a table.

Snowflake data lineage provides these benefits:

  • Provides impact analysis by understanding the relationship between different objects.

  • Enhances monitoring and troubleshooting by viewing data movement lineage and object dependencies.

  • Facilitates compliance by tracking the flow of sensitive data.

  • Helps you work with tags and masking policies on columns to protect sensitive data.

  • Enhances trust in the data by understanding the source and target objects and columns.

  • Allows administration for viewing lineage to be delegated. For more information, see Access control for lineage information.

About upstream and downstream relationships

Data lineage helps you understand the relationships of an object in terms of source and target objects. In lineage terminology, the source object is “upstream” of the target object, and the target object is “downstream” of the source object. Snowsight reveals objects incrementally, one step at a time upstream or downstream from your selection.

For example, in this SQL statement:

CREATE TABLE table2 AS SELECT col1 FROM table1;
Copy

table2 is the target table, and is downstream of the source table, table1. Column col1, which originates in table table1, is included in table table2; this is also a downstream lineage relationship.

If you view the details of table table1 in Snowsight, the Lineage tab displays an arrow pointing from table1 to table2 to indicate the downstream lineage relationship. If you instead start at table table2, an arrow points from table2 upstream to table1.

Get started

To start using data lineage in Snowsight, do the following:

  1. Sign in to Snowsight with the necessary privileges.

  2. Use the Data » Databases menu to select a supported object such as a table or view.

  3. Select the Lineage tab.

Basic actions on the Lineage tab include the following:

Example screenshot of data lineage in Snowsight
  • A. Select an object to show additional details about it, including columns and tags on those columns.

  • B. Select +/- to show or hide objects that are further upstream or downstream.

  • C. Select the arrow that connects two objects to show information about how the downstream object was created (for example, the SQL statement that created an object). Your access control privileges determine what information appears.

  • D. Opens a new lineage diagram that focuses on the lineage of the selected object.

To learn about using the Lineage tab to perform other actions, see the following:

Column lineage

You can use Snowsight to trace the relationship between columns in a source object and columns in a target object. For a given column, you can determine all upstream and downstream columns that share lineage with the column.

To determine the lineage of a column:

  1. Sign in to Snowsight with the necessary privileges.

  2. Use the Data » Databases menu to select a supported object such as a table or view.

  3. Select the Lineage tab.

  4. Select the object that contains the column you are interested in tracing. A side panel opens.

  5. Hover over the column name in the side panel, and select View Lineage.

  6. Select Upstream Lineage or Downstream Lineage to list the columns in upstream or downstream objects.

    You can use the Distance column to determine how far away a column is in the lineage. For example, if the downstream distance is 1, then the column is in an object that was created directly from the current object. If the downstream distance is 2, then the column exists in an object that was created from an object that was created from the current object.

Work with tags

The Lineage tab provides an integrated governance experience that lets you view the lineage of columns, identify columns that should have tags, and apply new tags all in the same workflow.

Whether you can see and apply tags depends on the access control privileges of the role you are using to view the Lineage tab. For information about the privileges required to work with tags, see Summary of DDL commands, operations, and privileges.

Find tags on an object and its columns

  1. Sign in to Snowsight with the necessary privileges.

  2. Use the Data » Databases menu to select a supported object such as a table or view.

  3. Select the Lineage tab.

  4. Select the object you’re interested in. A side panel opens.

  5. To view the tags on the object itself, look under the Details section of the side panel.

  6. To view the tags on a column of the object, find the column in the Columns section. If there is a tag, a tag symbol appears next to the column name. Hover over the symbol to see the tag name and value.

Identify and remedy missing tags or incorrect tag values

If there’s a tag on one column, there’s a good chance the same tag should be applied to upstream columns and downstream columns that share lineage with the column. Similarly, the value of a tag on upstream columns and downstream columns often needs to be the same.

The data lineage workflow identifies tags that are missing from upstream and downstream columns and tags that have a different value. It then helps you apply the missing tags or change the tag value on those columns.

  1. Sign in to Snowsight with the necessary privileges.

  2. Use the Data » Databases menu to select a supported object such as a table or view.

  3. Select the Lineage tab.

  4. Select the object that contains the column you’re interested in tracing. A side panel opens.

  5. Hover over the column name in the side panel, and select View Lineage.

  6. In the View Column Lineage dialog, select Downstream Lineage or Upstream Lineage.

    If there are missing tags or mismatched tag values on the downstream or upstream column, a banner appears. You can use the color coding in the Tags column to identify what is wrong with the tag:

    • If a tag has a dashed border, the column does not have the tag applied.

    • If a tag has a yellow border, the value of the tag doesn’t match.

  7. To remedy these missing or mismatched tags, do the following:

    1. Select Review and Apply.

    2. After confirming you’d like to accept the proposed changes, select Apply.

Identify masking policies

  1. Sign in to Snowsight with the necessary privileges.

  2. Use the Data » Databases menu to select a supported object such as a table or view.

  3. Select the Lineage tab.

  4. Select the object you are interested in. A side panel opens.

  5. To view the masking policy on a column of the object, find the column in the Columns section. If the column is protected by a masking policy, a symbol appears next to the column name. Hover over the symbol to see the masking policy name and details.

    If there’s a problem with the masking policy, for example there are multiple masking policies assigned to the same column, Policy Error appears instead of the mask symbol. If you hover over Policy Error, an explanation of the error appears. For additional help identifying why the error might have occurred, see Tag and policy discovery and Troubleshoot tag-based masking policies.

Retrieve lineage programmatically

You can use the GET_LINEAGE (SNOWFLAKE.CORE) function to retrieve lineage information programmatically. This function returns a subset of the information provided by the Lineage tab in Snowsight.

Supported operations

The following operations create upstream and downstream relationships between a source object and a target object:

Supported objects

Data lineage supports data movement and dependency between table-like objects. A “table-like” object is any object that can be queried like a table, including tables (nothing is more table-like than a table). Table-like objects include:

  • Tables

  • Dynamic tables

  • External tables

  • Iceberg tables

  • Views

  • Materialized views

Stages can also participate in data lineage relationships, as can the following machine learning objects.

Column lineage is supported between columns in any two table-like objects. You can, for example, select a column in a table to view downstream column lineage, which shows the other table-like objects where that column appears.

Additionally, you can see tag and masking policy associations if you are using a role that has privileges for managing tags and masking policies.

ML Lineage

ML Lineage specifically supports machine learning relationships, which focus on how data is used and transformed in machine learning workflows, rather than on simpler movement or dependency relationships. Relationships between the following types of objects are supported:

Access control for lineage information

A role with the following privileges can access the Lineage tab and view an object’s upstream and downstream lineage objects and dependencies:

  • VIEW LINEAGE on the account.

  • Any privilege on the objects for which you want to evaluate the lineage, such as SELECT on a table.

  • USAGE on the database and schema that contains the object.

Tip

You can configure a role to view all objects in the lineage, even if it does not have privileges on the objects, database, or schema. Simply grant the role the RESOLVE ALL privilege on the account, for example, GRANT RESOLVE ALL ON ACCOUNT TO ROLE lineage_role;. The role still requires the VIEW LINEAGE privilege.

The following example creates a custom role and grants the necessary privileges to it:

USE ROLE ACCOUNTADMIN;
CREATE ROLE lineage_role;
GRANT VIEW LINEAGE ON ACCOUNT TO ROLE lineage_role;
GRANT SELECT ON mydb.sch.mytable TO ROLE lineage_role;
Copy

If you do not have privileges on an upstream or downstream object in the lineage graph, the object appears gray with a message stating that you have insufficient privileges to view the object. The gray object does not imply a terminal node in the lineage graph; it merely indicates that you cannot view lineage any further upstream or downstream from that point because you don’t have the privileges to retrieve that object’s lineage. This behavior also applies to objects and columns protected by other access policies.

Renamed and deleted table nodes

Renamed tables are shown in the graph, but deleted tables are not shown.

Lineage retention

The Lineage graph provides lineage information in real time; tracked operations appear without delay in the Snowsight UI. Historical information is retained as follows:

  • Column lineage is retained for one year.

  • Object lineage is retained for one year.

Limitations

  • The Lineage tab is not available for the following kinds of objects:

    • Objects in a shared database.

    • Objects in the shared SNOWFLAKE database.

    • Objects in the INFORMATION_SCHEMA of a database.

  • Dynamic tables appear in the lineage graph for other objects, but the Lineage tab does not appear for dynamic tables themselves.