External lineage¶
External lineage extends Snowflake’s native lineage to include external data sources and destinations, providing you with visibility into data flows across your entire data ecosystem. It captures lineage from external ETL tools and source databases to create a unified view of how data moves through your data pipeline.
OpenLineage is an open standard for capturing and sharing data lineage information across diverse data tools and platforms. Snowflake leverages this framework by accepting OpenLineage-compatible events through a REST endpoint. External tools like dbt and Apache Airflow can use the endpoint to send lineage metadata to Snowflake, which then incorporates this information into the native lineage graph displayed in Snowsight.
- External lineage REST endpoint
/api/v2/lineage/external-lineage
- Snowflake base URL for REST endpoints
https://<account_identifier>.snowflakecomputing.com
Where
account-identifieris the account identifier of your Snowflake account. You can use either the account name format or the account locator format as your account identifier.For example, if your account identifier is
myorg-dev_account, then the base URL of the external lineage endpoint is:https://myorg-dev_account.snowflakecomputing.com
External lineage workflow¶
Implementing external lineage for a data tool consists of the following tasks:
Grant the necessary privileges to the user who is authenticating to the external lineage endpoint.
Configure your data tool to send OpenLineage events to the Snowflake REST endpoint.
Choose an authentication method that works for Snowflake REST APIs, and then configure your data tool to use it to authenticate its requests to the external lineage endpoint.
Use your data tool as usual. OpenLineage events are sent to Snowflake automatically and appear in the native lineage graph in Snowsight.
If you want to test the external lineage endpoint before you configure a data tool to emit OpenLineage events, see Send manual requests to establish lineage.
View your data lineage¶
To view data lineage in Snowsight, complete the following steps:
Sign in to Snowsight with the necessary privileges.
In the navigation menu, select Catalog » Database Explorer, and then select a supported object such as a table or view.
Select the Lineage tab.
When a data tool sends lineage information to Snowflake, external objects appear in the Snowsight lineage graph and are labeled as an external node. For example:
You can select an external object or the line connecting objects to obtain additional information just like you can with native lineage.
Grant Snowflake privileges¶
After a REST request is authenticated, Snowflake checks whether the user associated with the request is authorized to use external lineage. The user associated with the request must have a role that is granted the INGEST LINEAGE privilege on the account.
For example, suppose you want requests sent by the service user dbt_integration_user to show up in Snowsight lineage. As an
administrator, run the following commands to create a dedicated role, grant it the necessary privilege, and then grant the role to the user:
CREATE ROLE dbt_lineage_role;
GRANT INGEST LINEAGE ON ACCOUNT TO ROLE dbt_lineage_role;
GRANT ROLE dbt_lineage_role TO USER dbt_integration_user;
Configure your data tool¶
Note
Any data tool with an OpenLineage integration can be configured to send lineage data to Snowflake. For a full list of tools that have an integration, see OpenLineage Integrations.
The following sections provide basic instructions for using external lineage with dbt and Apache AirFlow.
Configure dbt to send lineage data to Snowflake¶
Note
Configuring dbt to emit OpenLineage events isn’t unique to Snowflake; the only thing specific to Snowflake is the endpoint and base URL of external lineage.
The following steps provide the minimum configuration you need to set up your dbt environment. Consult the OpenLineage dbt documentation and the OpenLineage specification to configure your OpenLineage-dbt integration.
Install the OpenLineage-dbt integration:
pip3 install openlineage-dbt
Set your transport variables to specify the base URL, endpoint, and security token for external lineage.
For example, if the account identifier of your account is
MYORG-DEV_ACCOUNT, define the following code in your YAML configuration file:transport: type: http url: https://MYORG-DEV_ACCOUNT.snowflakecomputing.com endpoint: /api/v2/lineage/external-lineage auth: type: api_key apiKey: eyJ0eXAiOiJKV1QiLsecuritytoken... compression: gzip
Replace
dbtcommands withdbt-ol. For example, change thedbt runcommand todbt-ol run.These
dbt-olcommands are required by the OpenLineage-dbt integration, and aren’t unique to Snowflake.
For more information about OpenLineage-dbt integrations, including other methods of setting variables, see the OpenLineage dbt documentation.
Configure Airflow to send lineage data to Snowflake¶
Note
Configuring Apache Airflow to emit OpenLineage events isn’t unique to Snowflake; the only thing specific to Snowflake is the endpoint and base URL of external lineage.
The following steps provide the minimum configuration you need to set up your Airflow environment for Airflow version 2.7+, which is the preferred version for OpenLineage. Consult the OpenLineage Airflow documentation and the OpenLineage specification to configure your OpenLineage-Airflow integration.
Install the OpenLineage Airflow integration for version 2.7+:
pip install apache-airflow-providers-openlineage
If you use an older version of Airflow, install
openlineage-airflowinstead.Set your transport variables to specify the base URL, endpoint, and security token for external lineage.
For example, if the account identifier of your account is
MYORG-DEV_ACCOUNT, define the following code in your YAML configuration file:transport: type: http url: https://MYORG-DEV_ACCOUNT.snowflakecomputing.com endpoint: /api/v2/lineage/external-lineage auth: type: api_key apiKey: eyJ0eXAiOiJKV1QiLsecuritytoken... compression: gzip
For more information about OpenLineage-Airflow integrations, including other methods of setting variables, see the OpenLineage Airflow documentation.
Choose an authentication method¶
Snowflake provides multiple ways to authenticate requests to a Snowflake REST endpoint like the one used by external lineage. For a complete list of authentication methods, see Authenticating Snowflake REST APIs with Snowflake.
After you select your preferred authentication method, you must generate a security token for a specific user. The token is used to associate a user with the REST request so that Snowflake can authenticate the user and verify that the user is authorized to use external lineage.
After successfully associating a user with a security token in Snowflake, you need to configure your data tool to authenticate its requests with this token. For example, if you use a YAML configuration file to set OpenLineage transport variables, use the following code to specify the security token that is sent in the header of the request:
transport:
auth:
type: api_key
apiKey: eyJ0eXAiOiJKV1QiLsecuritytoken...
For other methods of specifying a security token, see the OpenLineage documentation for your data tool.
Send manual requests to establish lineage¶
External lineage works by accepting JSON payloads that conform to the OpenLineage specification for COMPLETE events. When integrated with a data tool, the tool emits these COMPLETE events. But you can also construct a COMPLETE event, then send it to the endpoint by using any tool or language that can send POST requests to an endpoint.
A valid request consists of the following method, base URL, and endpoint:
POST https://<account_identifier>.snowflakecomputing.com/api/v2/lineage/external-lineage
Where account_identifier is the account identifier of your Snowflake account.
The following example shows how to use curl to send lineage information to external lineage:
curl -i -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJ0eXAiOiJKV1QiLsecuritytoken..." \
-H "Accept: application/json" \
-H "User-Agent: myApplicationName/1.0" \
-H "X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT" \
-d "@request_body.json" \
"https://MYORG-DEV_ACCOUNT.snowflakecomputing.com/api/v2/lineage/external-lineage"
Where request_body.json conforms to the OpenLineage specification for COMPLETE events. For more information about this JSON payload, see Payload requirements.
Payload requirements¶
When you send the JSON payload in a manual request to the external lineage endpoint, the payload must meet the following requirements:
Must conform to the OpenLineage specification.
Must be a COMPLETE event. That is, the
eventTypeproperty must beCOMPLETE. Other types of events are ignored.The
inputsproperty andoutputsproperty must be a mix of Snowflake and external objects. You cannot use external lineage to establish lineage between two external objects or between two Snowflake objects. If both properties specify the same type of object (Snowflake or external), then the request returns a 404 HTTP status code.Must contain the following properties:
inputsoutputseventTypeeventTimejob
You can optionally include the
runproperty, which is useful in identifying the job. The payload can contain additional properties, but Snowflake ignores them.
Minimal payload example¶
The following example shows a minimal payload that you can send to the external lineage endpoint:
{
"eventType": "COMPLETE",
"eventTime": "2025-03-12T06:51:12.000Z",
"job": {"namespace": "exampleNamespace", "name": "exampleJob"},
"run": {"runId": "123e4567-e89b-12d3-a456-426614174000"},
"producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
"schemaURL": "https://openlineage.io/spec/0-0-1/OpenLineage.json",
"inputs": [{"namespace": "snowflake://AXORG-AX_TEST_PP8", "name": "OL_TEST.OL_TEST_SCH.TEST_DEMO"}],
"outputs": [{"namespace": "postgres://localhost:5432", "name": "PDB.SCH.OUTPUT"}]
}
Specifying object types¶
Within the outputs array of the payload, you can use the facets field to specify the type of the object, which can be any
user-defined string. For example, the following snippet of the payload specifies that the object is of type VIEW:
"outputs": [
{
"namespace": "postgres://db.company.com:5432",
"name": "db.schema.view",
"facets": {"datasetType": {"datasetType": "VIEW"}},
},
],
If you don’t specify a facets field, the type of object defaults to External Node.
Specifying multiple inputs¶
If a payload includes more than one input, the resulting lineage shows the output as a downstream object of both inputs. For example, if a payload has input A and B along with an output C, then the lineage shows both A-C and B-C.
Send requests to remove lineage¶
You can send a DELETE request to the external lineage endpoint to remove lineage that was established between a Snowflake object and an external object.
To break lineage between the source object and target object, use URL query parameters to specify details about the two objects.
To break lineage between an object and all of its downstream objects, specify the source object without specifying a target object.
To remove a target object from the lineage graph regardless of how many objects are upstream of it, specify the target object without specifying a source object.
A valid request to remove lineage consists of the following method, base URL, and endpoint:
DELETE https://<account_identifier>.snowflakecomputing.com/api/v2/lineage/external-lineage
Query parameter |
Description |
|---|---|
|
Namespace of the source dataset. |
|
Fully qualified name of the source dataset. |
|
Type of the source dataset (for example, TABLE, VIEW, DATASET). By default, the value should be External node.
If you provided a value in the |
|
Namespace of the target dataset. |
|
Fully qualified name of the target dataset. |
|
Type of the target dataset (for example, TABLE, VIEW, DATASET). By default, the value should be External node.
If you provided a value in the |
Access control for removing lineage¶
The user sending a request to remove lineage between objects must have the DELETE LINEAGE privilege on the account.
Limitations and considerations¶
A Snowflake object must be either the INPUT or the OUTPUT of a COMPLETE event. That is, external lineage doesn’t ingest lineage events when neither the input data nor the output data is a Snowflake object.
Snowflake doesn’t support OpenLineage version 1.
The retention period for external lineage events is one year.
Snowflake only recognizes COMPLETE lineage events. All other events emitted by a data tool are ignored.
Lineage from external sources doesn’t appear in the output of the GET_LINEAGE function.
External lineage doesn’t support the lineage of columns.
The fully qualified name of a dataset — that is, the input or output — can’t exceed 1000 characters.
You can’t store more than 10,000 events in the same account. If you reach this limit, you’ll have to delete events before adding new ones.