CKE document access history

To help providers know which documents are being accessed in their Cortex Knowledge Extensions (CKE), Snowflake provides the following features:

Prerequisites

Because primary keys define a unique identifier for each document, you must specify a primary key for the Cortex Search Service to get the access history.

Note

Modifying the primary key columns of an existing Cortex Search Service invalidates the previous CKE access history.

To interpret the previous CKE access history, save a mapping from the old primary key columns to the new primary key columns.

Understand document IDs

Document IDs are composed of Cortex Search Service primary keys. To protect customer data, Snowflake encodes and hashes the primary key columns when tracking the access history. You can map the primary keys to the provided hashed document ID using the following functions:

Example CKE access history in the LISTING_ACCESS_HISTORY view

This example performs the following actions:

  • Retrieves only CKE access information from the LISTING_ACCESS_HISTORY view view and excludes all other events

  • Uses the SYSTEM$ENCODE_CKE_PRIMARY_KEY function to build an encoded representation of the CKE document’s primary key columns

  • Retrieves the hash version and uses the SYSTEM$CKE_HASH_FUNCTION to compute a hashed document ID for every primary key

  • Joins the computed hashed IDs and versions to the view to recover the original primary key columns

Step 1. Create a daily access summary table that retrieves only CKE access information.

CREATE TABLE IF NOT EXISTS cke_document_daily_access AS
SELECT query_date,
       consumer_account_name,
       consumer_name,
       hashed_doc_id,
       hash_version,
       total_access_count
  FROM (
    SELECT query_date,
           consumer_account_name,
           consumer_name,
           flattened.value::string AS hashed_doc_id,
           lah.share_objects_accessed[0]:"hashVersion"::string AS hash_version,
      COUNT(*) AS total_access_count
      FROM snowflake.data_sharing_usage.listing_access_history AS lah,
        LATERAL FLATTEN(
          input => lah.share_objects_accessed[0]:"hashedDocumentIds"
        ) AS flattened
      WHERE lah.share_objects_accessed[0]:"objectDomain" = 'Cortex Search Service'
        AND lah.share_objects_accessed[0]:"hashVersion" IS NOT NULL
      GROUP BY query_date,
               consumer_account_name,
               consumer_name,
               hashed_doc_id,
               hash_version
);

Step 2. Create a table to store the encoded primary keys.

CREATE TABLE IF NOT EXISTS encoded_primary_keys AS
  (
    SELECT pkCol1,
           pkCol2,
           SYSTEM$ENCODE_CKE_PRIMARY_KEY(pkCol1, pkCol2) AS encoded_primary_key
      FROM your_cortex_search_table
  )

Step 3. From the table you created in the previous step, prepare hash versions and compute hashed IDs for your primary keys. Then join the cke_document_daily_access table with the hashed primary key view to recover the original primary key columns.

WITH hash_versions AS
  (
    SELECT DISTINCT hash_version AS hash_version
      FROM cke_document_daily_access
  ),
  hashed_primary_key AS
  (
    SELECT pkCol1,
           pkCol2,
           hash_version,
           SYSTEM$CKE_HASH_FUNCTION(hash_version, encoded_primary_key) AS hashed_doc_id
      FROM encoded_primary_keys
      CROSS JOIN hash_versions
  )
SELECT pk.pkCol1,
       pk.pkCol2,
       a.query_date,
       a.consumer_account_name,
       a.consumer_name,
       a.total_access_count
  FROM cke_document_daily_access AS a
  JOIN hashed_primary_key AS pk
    ON a.hashed_doc_id = pk.hashed_doc_id
    AND a.hash_version = pk.hash_version;