Access History

This topic provides concepts on the user access history in Snowflake.

In this Topic:

Overview

Access History in Snowflake refers to when the user query reads column data and when the SQL statement performs a data write operation, such as INSERT, UPDATE, and DELETE along with variations of the COPY command, from the source data object to the target data object. The user access history can be found by querying the Account Usage ACCESS_HISTORY view.

Each row in the ACCESS_HISTORY view contains a single record per SQL statement. The record describes the columns the query accessed directly and indirectly (i.e. the underlying tables that the data for the query comes from). These records facilitate regulatory compliance auditing and provide insights on popular and frequently accessed tables and columns since there is a direct link between the user (i.e. query operator), the query, the table or view, the column, and the data.

Tracking Read and Write Operations

The ACCESS_HISTORY view contains these columns:

QUERY_ID | QUERY_START_TIME | USER_NAME | DIRECT_OBJECTS_ACCESSED | BASE_OBJECTS_ACCESSED | OBJECTS_MODIFIED

Read operations are tracked through the first five columns, while the last column, OBJECTS_MODIFIED, specifies the data write information that involved Snowflake columns, tables, and stages.

The query in Snowflake and how the database objects were created determines the information Snowflake returns in the DIRECT_OBJECTS_ACCESSED, BASE_OBJECTS_ACCESSED, and OBJECTS_MODIFIED columns.

For column details, see the Columns section in the ACCESS_HISTORY view.

Read

Consider the following scenario to understand a read query and how the ACCESS_HISTORY view records this information:

  • A series of objects: base_table » view_1 » view_2 » view_3.

  • A read query on view_2, such as:

    select * from view_2;
    

In this example, Snowflake returns:

  • view_2 as the DIRECT_OBJECTS_ACCESSED because the query specifies view_2.

  • base_table as the BASE_OBJECTS_ACCESSED because that is the original source of the data in view_2.

Note that view_1 and view_3 are not included in the DIRECT_OBJECTS_ACCESSED and BASE_OBJECTS_ACCESSED because neither of those views were included in the query and they are not the base object that serves as the source for the data in view_2.

Write

Consider the following scenario to understand a write operation and how the ACCESS_HISTORY view records this information:

  • A data source: base_table

  • Create a table from the data source (i.e. CTAS):

    create table table_1 as select * from base_table;
    

In this example, Snowflake returns:

  • base_table in the BASE_OBJECTS_ACCESSED and DIRECT_OBJECTS_ACCESSED columns because the table was accessed directly and is the source of the data.

  • table_1 in the OBJECTS_MODIFIED with the columns that were written to when creating the table.

Column Lineage

Column lineage (i.e. access history for columns) extends the Account Usage ACCESS_HISTORY view to specify how data flows from the source column to the target column in a write operation. Snowflake tracks the data from the source columns through all subsequent table objects that reference data from the source columns (e.g. INSERT, MERGE, CTAS) provided that objects in the lineage chain are not dropped. Snowflake makes column lineage accessible by enhancing the OBJECTS_MODIFIED column in the ACCESS_HISTORY view.

For additional details, see:

Supported Operations

For a complete description of the read and write operations the ACCESS_HISTORY view supports, see the usage notes sections in the ACCESS_HISTORY View.

Benefits

Access history in Snowflake provides the following benefits pertaining to read and write operations:

Data discovery

Discover unused data to determine whether to archive or delete the data.

Track how sensitive data moves

Track data movement from an external cloud storage location (e.g. Amazon S3 bucket) to the target Snowflake table, and vice versa.

Track internal data movement from a Snowflake table to a different Snowflake table.

After tracing the movement of sensitive data, apply policies (masking and row access) to protect data, update access control settings to further regulate access to the stage and table, and set tags to ensure stages, tables, and columns with sensitive data can be tracked for compliance requirements.

Data validation

The accuracy and integrity of reports, dashboards, and data visualization products such as charts and graphs are validated since the data can be traced to its original source.

Data stewards can also notify users prior to dropping or altering a given table or view.

Compliance auditing

Identify the Snowflake user who performed a write operation on a table or stage and when the write operation occurred to meet compliance regulations, such as GDPR and CCPA.

Enhance overall data governance

The ACCESS_HISTORY view provides a unified picture of what data was accessed, when the data access took place, and how the accessed data moved from the data source object to the data target object.

Column lineage provides these additional benefits:

Protect Derived Objects

Data stewards can easily tag sensitive source columns without having to do additional work after creating derived objects (e.g. CTAS). Subsequently, the data steward can protect tables containing sensitive columns with a row access policy or protect the sensitive columns themselves with either a masking policy or a tag-based masking policy.

Sensitive Column Copy Frequency

Data privacy officers can quickly determine the object count (e.g. 1 table, 2 views) of a column containing sensitive data. By knowing how many times a column with sensitive data appears in a table object, data privacy officers can prove how they satisfy regulatory compliance standards (e.g. to meet General Data Protection Regulation (GDPR) standards in the European Union).

Root Cause Analysis

Column lineage provides a mechanism to trace the data to its source, which can help to pinpoint points of failure resulting from poor data quality and reduce the number of columns to analyze during the troubleshooting process.

Querying the ACCESS_HISTORY View

The following subsections provide example queries for the ACCESS_HISTORY view.

Note that the some of the example queries filter on the QUERY_START_TIME column to increase query performance. Another option to increase performance is to query over narrower time ranges.

Examples: Read Queries

The subsections below detail how to query the ACCESS_HISTORY view for read operations for the following use cases:

  • Obtain the access history for a specific user.

  • Facilitate compliance audits for sensitive data access in the last 30 days, based on object_id (e.g. a table id), to answer the following questions:

    • Who accessed the data?

    • When was the data accessed?

    • What columns were accessed?

Return the User Access History

Return the user access history, ordered by user and query start time, starting from the most recent access.

SELECT user_name
       , query_id
       , query_start_time
       , direct_objects_accessed
       , base_objects_accessed
FROM access_history
ORDER BY 1, 3 desc
;

Facilitate Compliance Audits

The following examples help to facilitate compliance audits:

  • Add the object_id value to determine who accessed a sensitive table in the last 30 days:

    SELECT distinct user_name
    FROM access_history
         , lateral flatten(base_objects_accessed) f1
    WHERE f1.value:"objectId"::int=<fill_in_object_id>
    AND f1.value:"objectDomain"::string='Table'
    AND query_start_time >= dateadd('day', -30, current_timestamp())
    ;
    
  • Using the object_id value of 32998411400350, determine when the access occurred in the last 30 days:

    SELECT query_id
           , query_start_time
    FROM access_history
         , lateral flatten(base_objects_accessed) f1
    WHERE f1.value:"objectId"::int=32998411400350
    AND f1.value:"objectDomain"::string='Table'
    AND query_start_time >= dateadd('day', -30, current_timestamp())
    ;
    
  • Using the object_id value of 32998411400350, determine which columns were accessed in the last 30 days:

    SELECT distinct f4.value AS column_name
    FROM access_history
         , lateral flatten(base_objects_accessed) f1
         , lateral flatten(f1.value) f2
         , lateral flatten(f2.value) f3
         , lateral flatten(f3.value) f4
    WHERE f1.value:"objectId"::int=32998411400350
    AND f1.value:"objectDomain"::string='Table'
    AND f4.key='columnName'
    ;
    

Examples: Write Operations

The subsections below detail how to query the ACCESS_HISTORY view for write operations for the following use cases:

  • Load data from a stage to a table.

  • Unload data from a table to a stage.

  • Use the PUT command to upload a local file to a stage.

  • Use the GET command to retrieve data files from a stage to a local directory.

  • Tracking sensitive stage data movement.

Load Data from a Stage to a Table

Load a set of values from a data file in external cloud storage into columns in a target table.

copy into table1(col1, col2)
from (select t.$1, t.$2 from @mystage1/data1.csv.gz);

The DIRECT_OBJECTS_ACCESSED and BASE_OBJECTS_ACCESSED column specify that an external named stage was accessed:

{
  "objectDomain": STAGE
  "objectName": "mystage1",
  "objectId": 1,
  "stageKind": "External Named"
}

The OBJECTS_MODIFIED column specifies that data was written to two columns of the table:

{
  "columns": [
     {
       "columnName": "col1",
       "columnId": 1
     },
     {
       "columnName": "col2",
       "columnId": 2
     }
  ],
  "objectId": 1,
  "objectName": "TEST_DB.TEST_SCHEMA.TABLE1",
  "objectDomain": TABLE
}

Unload Data from a Table to a Stage

Unload a set of values from a Snowflake table into cloud storage.

copy into @mystage1/data1.csv
from table1;

The DIRECT_OBJECTS_ACCESSED and BASE_OBJECTS_ACCESSED columns specify the table columns that were accessed:

{
  "objectDomain": TABLE
  "objectName": "TEST_DB.TEST_SCHEMA.TABLE1",
  "objectId": 123,
  "columns": [
     {
       "columnName": "col1",
       "columnId": 1
     },
     {
       "columnName": "col2",
       "columnId": 2
     }
  ]
}

The OBJECTS_MODIFIED column specifies the stage to which the accessed data was written:

{
  "objectId": 1,
  "objectName": "mystage1",
  "objectDomain": STAGE,
  "stageKind": "External Named"
}

Use the PUT Command to Upload a Local File to a Stage

Copy a data file to an internal (i.e. Snowflake) stage.

put file:///tmp/data/mydata.csv @my_int_stage;

The DIRECT_OBJECTS_ACCESSED and BASE_OBJECTS_ACCESSED columns specify the local path to the file that was accessed:

{
  "location": "file:///tmp/data/mydata.csv"
}

The OBJECTS_MODIFIED column specifies the stage where the accessed data was written:

{
  "objectId": 1,
  "objectName": "my_int_stage",
  "objectDomain": STAGE,
  "stageKind": "Internal Named"
}

Use the GET Command to Retrieve Data Files from a Stage to a Local Directory

Retrieve a data file from an internal stage to a directory on the local machine.

get @%mytable file:///tmp/data/;

The DIRECT_OBJECTS_ACCESSED and BASE_OBJECTS_ACCESSED columns specify the stage and local directory that were accessed:

{
  "objectDomain": Stage
  "objectName": "mytable",
  "objectId": 1,
  "stageKind": "Table"
}

The OBJECTS_MODIFIED column specifies the directory to which the accessed data was written:

{
  "location": "file:///tmp/data/"
}

Tracking Sensitive Stage Data Movement

Track sensitive stage data as it moves through a series of queries executed in chronological order.

Execute the following queries. Note that five of the statements access stage data. Therefore, when you query the ACCESS_HISTORY view for stage access, the result set should include five rows.

use test_db.test_schema;
create or replace table T1(content variant);
insert into T1(content) select parse_json('{"name": "A", "id":1}');

-- T1 -> T6
insert into T6 select * from T1;

-- S1 -> T1
copy into T1 from @S1;

-- T1 -> T2
create table T2 as select content:"name" as name, content:"id" as id from T1;

-- T1 -> S2
copy into @S2 from T1;

-- S1 -> T3
create or replace table T3(customer_info variant);
copy into T3 from @S1;

-- T1 -> T4
create or replace table T4(name string, id string, address string);
insert into T4(name, id) select content:"name", content:"id" from T1;

-- T6 -> T7
create table T7 as select * from T6;

Where:

  • T1, T2T7 specify the names of tables.

  • S1 and S2 specify the names of stages.

Query the access history to determine the access to stage S1.

The data for the DIRECT_OBJECTS_ACCESSED, BASE_OBJECTS_ACCESSED, and OBJECTS_MODIFIED columns are shown in the following table.

DIRECT_OBJECTS_ACCESSED

BASE_OBJECTS_ACCESSED

OBJECTS_MODIFIED

[
  {
    "columns": [
      {
        "columnId": 68610,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66564,
    "objectName": "TEST_DB.TEST_SCHEMA.T1"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68610,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66564,
    "objectName": "TEST_DB.TEST_SCHEMA.T1"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68611,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66566,
    "objectName": "TEST_DB.TEST_SCHEMA.T6"
  }
]
[
  {
    "objectDomain": "Stage",
    "objectId": 117,
    "objectName": "TEST_DB.TEST_SCHEMA.S1",
    "stageKind": "External Named"
  }
]
[
  {
    "objectDomain": "Stage",
    "objectId": 117,
    "objectName": "TEST_DB.TEST_SCHEMA.S1",
    "stageKind": "External Named"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68610,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66564,
    "objectName": "TEST_DB.TEST_SCHEMA.T1"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68610,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66564,
    "objectName": "TEST_DB.TEST_SCHEMA.T1"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68610,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66564,
    "objectName": "TEST_DB.TEST_SCHEMA.T1"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68613,
        "columnName": "ID"
      },
      {
        "columnId": 68612,
        "columnName": "NAME"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66568,
    "objectName": "TEST_DB.TEST_SCHEMA.T2"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68610,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66564,
    "objectName": "TEST_DB.TEST_SCHEMA.T1"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68610,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66564,
    "objectName": "TEST_DB.TEST_SCHEMA.T1"
  }
]
[
  {
    "objectDomain": "Stage",
    "objectId": 118,
    "objectName": "TEST_DB.TEST_SCHEMA.S2",
    "stageKind": "External Named"
  }
]
[
  {
    "objectDomain": "Stage",
    "objectId": 117,
    "objectName": "TEST_DB.TEST_SCHEMA.S1",
    "stageKind": "External Named"
  }
]
[
  {
    "objectDomain": "Stage",
    "objectId": 117,
    "objectName": "TEST_DB.TEST_SCHEMA.S1",
    "stageKind": "External Named"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68614,
        "columnName": "CUSTOMER_INFO"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66570,
    "objectName": "TEST_DB.TEST_SCHEMA.T3"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68610,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66564,
    "objectName": "TEST_DB.TEST_SCHEMA.T1"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68610,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66564,
    "objectName": "TEST_DB.TEST_SCHEMA.T1"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68615,
        "columnName": "NAME"
      },
      {
        "columnId": 68616,
        "columnName": "ID"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66572,
    "objectName": "TEST_DB.TEST_SCHEMA.T4"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68611,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66566,
    "objectName": "TEST_DB.TEST_SCHEMA.T6"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68611,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66566,
    "objectName": "TEST_DB.TEST_SCHEMA.T6"
  }
]
[
  {
    "columns": [
      {
        "columnId": 68618,
        "columnName": "CONTENT"
      }
    ],
    "objectDomain": "Table",
    "objectId": 66574,
    "objectName": "TEST_DB.TEST_SCHEMA.T7"
  }
]

Note that the query in the following example uses a recursive common table expression.

with access_history_flatten as (
    select
        r.value:"objectId" as source_id,
        r.value:"objectName" as source_name,
        r.value:"objectDomain" as source_domain,
        w.value:"objectId" as target_id,
        w.value:"objectName" as target_name,
        w.value:"objectDomain" as target_domain,
        c.value:"columnName" as target_column,
        t.query_start_time as query_start_time
    from
        (select * from TEST_DB.ACCOUNT_USAGE.ACCESS_HISTORY) t,
        lateral flatten(input => t.BASE_OBJECTS_ACCESSED) r,
        lateral flatten(input => t.OBJECTS_MODIFIED) w,
        lateral flatten(input => w.value:"columns", outer => true) c
        ),
    sensitive_data_movements(path, target_id, target_name, target_domain, target_column, query_start_time)
    as
      -- Common Table Expression
      (
        -- Anchor Clause: Get the objects that access S1 directly
        select
            f.source_name || '-->' || f.target_name as path,
            f.target_id,
            f.target_name,
            f.target_domain,
            f.target_column,
            f.query_start_time
        from
            access_history_flatten f
        where
        f.source_domain = 'Stage'
        and f.source_name = 'TEST_DB.TEST_SCHEMA.S1'
        and f.query_start_time >= dateadd(day, -30, date_trunc(day, current_date))
        union all
        -- Recursive Clause: Recursively get all the objects that access S1 indirectly
        select sensitive_data_movements.path || '-->' || f.target_name as path, f.target_id, f.target_name, f.target_domain, f.target_column, f.query_start_time
          from
             access_history_flatten f
            join sensitive_data_movements
            on f.source_id = sensitive_data_movements.target_id
                and f.source_domain = sensitive_data_movements.target_domain
                and f.query_start_time >= sensitive_data_movements.query_start_time
      )
select path, target_name, target_id, target_domain, array_agg(distinct target_column) as target_columns
from sensitive_data_movements
group by path, target_id, target_name, target_domain;

The query produces the following result set related to stage S1 data movement:

PATH

TARGET_NAME

TARGET_ID

TARGET_DOMAIN

TARGET_COLUMNS

TEST_DB.TEST_SCHEMA.S1–>TEST_DB.TEST_SCHEMA.T1

TEST_DB.TEST_SCHEMA.T1

66564

Table

[“CONTENT”]

TEST_DB.TEST_SCHEMA.S1–>TEST_DB.TEST_SCHEMA.T1–>TEST_DB.TEST_SCHEMA.S2

TEST_DB.TEST_SCHEMA.S2

118

Stage

[]

TEST_DB.TEST_SCHEMA.S1–>TEST_DB.TEST_SCHEMA.T1–>TEST_DB.TEST_SCHEMA.T2

TEST_DB.TEST_SCHEMA.T2

66568

Table

[“NAME”,”ID”]

TEST_DB.TEST_SCHEMA.S1–>TEST_DB.TEST_SCHEMA.T1–>TEST_DB.TEST_SCHEMA.T4

TEST_DB.TEST_SCHEMA.T4

66572

Table

[“ID”,”NAME”]

TEST_DB.TEST_SCHEMA.S1–>TEST_DB.TEST_SCHEMA.T3

TEST_DB.TEST_SCHEMA.T3

66570

Table

[“CUSTOMER_INFO”]

Example: Column Lineage

The following example queries the ACCESS_HISTORY view and uses the FLATTEN function to flatten the OBJECTS_MODIFIED column.

As a representative example, execute the following SQL query in your Snowflake account to produce the table below, where the numbered comments indicate the following:

  • // 1: Get the mapping between the directSources field and the target column.

  • // 2: Get the mapping between the baseSources field and the target column.

// 1

select
  directSources.value: "objectId" as source_object_id,
  directSources.value: "objectName" as source_object_name,
  directSources.value: "columnName" as source_column_name,
  'DIRECT' as source_column_type,
  om.value: "objectName" as target_object_name,
  columns_modified.value: "columnName" as target_column_name
from
  (
    select
      *
    from
      snowflake.account_usage.access_history
  ) t,
  lateral flatten(input => t.OBJECTS_MODIFIED) om,
  lateral flatten(input => om.value: "columns", outer => true) columns_modified,
  lateral flatten(
    input => columns_modified.value: "directSources",
    outer => true
  ) directSources

union

// 2

select
  baseSources.value: "objectId" as source_object_id,
  baseSources.value: "objectName" as source_object_name,
  baseSources.value: "columnName" as source_column_name,
  'BASE' as source_column_type,
  om.value: "objectName" as target_object_name,
  columns_modified.value: "columnName" as target_column_name
from
  (
    select
      *
    from
      snowflake.account_usage.access_history
  ) t,
  lateral flatten(input => t.OBJECTS_MODIFIED) om,
  lateral flatten(input => om.value: "columns", outer => true) columns_modified,
  lateral flatten(
    input => columns_modified.value: "baseSources",
    outer => true
  ) baseSources
;

Returns:

SOURCE_OBJECT_ID

SOURCE_OBJECT_NAME

SOURCE_COLUMN_NAME

SOURCE_COLUMN_TYPE

TARGET_OBJECT_NAME

TARGET_COLUMN_NAME

1

D.S.T0

NAME

BASE

D.S.T1

NAME

2

D.S.V1

NAME

DIRECT

D.S.T1

NAME

Back to top