Managing Directory Tables

This topic provides instructions for creating and managing external or internal stages with directory tables.

In this Topic:

Automatically Refreshing Directory Table Metadata

The metadata for an directory table can be refreshed automatically using the event messaging service for your cloud storage service.

The refresh operation synchronizes the metadata with the latest set of associated files in the external stage and path, i.e.:

  • New files in the path are added to the table metadata.

  • Changes to files in the path are updated in the table metadata.

  • Files no longer in the path are removed from the table metadata.

For instructions to create stages with automatically refreshed directory tables, see Automated Directory Table Metadata Refreshes.

Note

Currently, the ability to automatically refresh the metadata is not available for directory tables for either internal stages or external stages that reference Google Cloud Storage buckets. For these types of stages, you must manually refresh the directory table metadata. For instructions, see Manually Refreshing Directory Table Metadata (in this topic).

We suggest following our best practices for staging your data files and periodically executing an ALTER STAGE … REFRESH statement to register any missed files. For satisfactory performance, we also recommend using a selective path prefix with ALTER STAGE to reduce the number of files that need to be listed and checked if they have been registered already (e.g. bucket_name/YYYY/MM/DD/ or even bucket_name/YYYY/MM/DD/HH/ depending on your volume).

Manually Refreshing Directory Table Metadata

This section provides instructions for creating stages (using CREATE STAGE) that layer a directory table to store metadata about the staged files. The directory tables created using the instructions require manual metadata refreshes.

Currently, directory tables in internal stages require manual metadata refreshes. You could also choose to include a directory table in external stages and refresh the metadata manually. For information about automated metadata refreshes, see Automatically Refreshing Directory Table Metadata (in this topic).

The syntax for creating a stage with a directory table is nearly identical to creating a standard external or internal stage. Set the optional DIRECTORY parameter to TRUE.

For the complete syntax and parameter descriptions, see CREATE STAGE.

Syntax

-- Internal stage
CREATE [ OR REPLACE ] [ TEMPORARY ] STAGE [ IF NOT EXISTS ] <internal_stage_name>
  [ DIRECTORY = ( { ENABLE = TRUE | FALSE } ) ]
  [ ENCRYPTION = (TYPE = 'SNOWFLAKE_FULL' | TYPE = 'SNOWFLAKE_SSE') ]
  [ FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] ) } ]
  [ COPY_OPTIONS = ( copyOptions ) ]
  [ COMMENT = '<string_literal>' ]

-- External stage
CREATE [ OR REPLACE ] [ TEMPORARY ] STAGE [ IF NOT EXISTS ] <external_stage_name>
      <cloud_storage_access_settings>
    [ DIRECTORY = ( { ENABLE = TRUE | FALSE } ) ]
    [ FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] ) } ]
    [ COPY_OPTIONS = ( copyOptions ) ]
    [ COMMENT = '<string_literal>' ]

Where:

ENABLE = TRUE | FALSE

Specifies whether to add a directory table to the stage. When the value is TRUE, a directory table is created with the stage.

Default: FALSE

[ ENCRYPTION = (TYPE = 'SNOWFLAKE_FULL' | TYPE = 'SNOWFLAKE_SSE')

Specifies the type of encryption supported for all files stored in the stage.

TYPE = ...

Specifies the encryption type used. Possible values are:

  • SNOWFLAKE_FULL: Client-side encryption. The files are encrypted by a client when it uploads them to the internal stage using PUT.

  • SNOWFLAKE_SSE: Server-side encryption. The files are encrypted when they arrive in the stage. Internal stages with server-side encryption only support file access using pre-signed URLs.

    When files in a stage are client-side encrypted, users cannot read the staged files without having access to the encryption key.

    For more information about internal named stages with server-side encryption only, see CREATE STAGE.

Default: SNOWFLAKE_FULL

Usage Notes

  • After you create a stage with a directory table, you must execute ALTER STAGE … REFRESH to manually refresh the directory table metadata.

Examples

Create an internal stage named mystage that includes a directory table. The stage references a file format named myformat:

CREATE STAGE mystage
  DIRECTORY = (ENABLE = TRUE)
  FILE_FORMAT = myformat;

Create an external stage named mystage that includes a directory table. The stage references a bucket or container named load with a path of files. Secure access to the cloud storage location is provided via the my_storage_int storage integration:

Note

The storage location in the URL value must end in a forward slash (/).

Amazon S3

CREATE STAGE mystage
  URL='s3://load/files/'
  STORAGE_INTEGRATION = my_storage_int
  DIRECTORY = (ENABLE = TRUE);

Google Cloud Storage

CREATE STAGE mystage
  URL='gcs://load/files/'
  STORAGE_INTEGRATION = my_storage_int
  DIRECTORY = (ENABLE = TRUE);

Microsoft Azure

CREATE STAGE mystage
  URL='azure://myaccount.blob.core.windows.net/load/files/'
  STORAGE_INTEGRATION = my_storage_int
  DIRECTORY = (ENABLE = TRUE);

Manually Refreshing Directory Tables

Refresh the metadata in a directory table manually using the ALTER STAGE command.

For example:

ALTER STAGE mystage REFRESH;

Retrieving File URLs from Directory Tables

Query a directory table:

SELECT * FROM DIRECTORY( @<stage_name> )

Where:

stage_name

Name of a stage that includes a directory table.

For information about SELECT as a statement, and the other clauses within the statement, see Query Syntax.

Output

The output from a directory table query can include the following columns:

Column

Data Type

Description

RELATIVE_PATH

TEXT

Path to the files to access using the file URL.

SIZE

NUMBER

Size of the file (in bytes).

LAST_MODIFIED

TIMESTAMP_LTZ

Timestamp when the file was last updated in the stage.

MD5

HEX

MD5 checksum for the file.

ETAG

HEX

ETag header for the file.

FILE_URL

TEXT

Snowflake-hosted file URL to the file.

The file URL has the following format:

https://<account_identifier>/api/files/<db_name>.<schema_name>.<stage_name>/<relative_path>

Where:

account_identifier

Hostname of the Snowflake account for your stage. The hostname starts with an account locator (provided by Snowflake) and ends with the Snowflake domain (snowflakecomputing.com):

account_locator.snowflakecomputing.com

For more details, see Account Identifiers.

db_name

Name of the database that contains the stage where your files are located.

schema_name

Name of the schema that contains the stage where your files are located.

stage_name

Name of the stage where your files are located.

relative_path

Path to the files to access using the file URL.

Examples

Retrieve all metadata columns in a directory table for a stage named mystage:

SELECT * FROM DIRECTORY(@mystage);

Return the FILE_URL column values from the same directory table. Only return the file URLs for files greater than 100 K bytes in size:

SELECT FILE_URL FROM DIRECTORY(@mystage) WHERE SIZE > 100000;

Return the FILE_URL column values from the same directory table. Only return the file URLs for comma-separated value files:

SELECT FILE_URL FROM DIRECTORY(@mystage) WHERE RELATIVE_PATH ILIKE '%.csv';

Streams on Directory Tables

Streams on a directory table track files that are added or dropped in the referenced cloud storage location. For the SQL command syntax and examples, see CREATE STREAM.