Snowflake Feature Store

Feature engineering, in which raw data is transformed into features that can be used to train machine learning models, is a vital part of building high-quality machine learning applications. A feature store lets you easily create, find, and employ features that work with your data.

The Snowflake Feature Store is designed to make creating, storing, and managing features for data science and machine learning workloads easier and more efficient. It provides:

  • A Python SDK for defining, registering, retrieving, and managing features.

  • Back end infrastructure with Snowflake dynamic tables, tables, views, and tags for automating feature pipelines and governance.

For an example of using the Snowflake Feature Store in an end-to-end machine learning pipeline, see Getting Started with Snowflake Feature Store.

Feature Store Workflow

The following diagram shows the high-level workflow of the Snowflake Feature Store.

Overall architecture of Snowflake Feature Store

Note

This release of the Snowflake Feature Store includes the following features:

  • Python SDK for feature definition and management

  • Support for externally created or user-maintained feature tables

  • Helper functions for defining time window features and AsOf JOINs

  • Continuous, incremental feature update pipelines using dynamic tables

  • API for retrieval of batch features and training datasets with point-in-time lookup

  • Linking features to model metadata with the Snowflake Model Registry

Additionally, these related features are available to selected accounts:

  • A Snowsight user interface for the Feature Store

  • APIs for tracking end-to-end lineage of ML artifacts (source data, features, datasets, and models)

Additional capabilities for feature quality monitoring and low-latency online feature serving are on the roadmap.

For details on the Python API, see Snowflake Feature Store API Reference.

Installation

The Snowflake Feature Store is part of the Snowpark ML Python package, snowflake-ml-python. For installation instructions, see Installing Snowpark ML.

Access Control Requirements

You should consider configuring a role hierarchy to represent your feature store roles.

The required privileges depend on what type of user is using the feature store. Typically, each type of user will have their own Snowflake database role with the necessary privileges. Most of the time, you will have a producer role to manipulate objects in the feature store, with an optional consumer role to govern data access.

Examples of setting up consumer and producer roles in two feature stores

Producers can create and operate on feature views. They require:

  • CREATE DYNAMIC TABLE, CREATE TAG, CREATE VIEW, and INSERT ON TABLE on the feature store schema

  • CREATE TABLE and CREATE DATASET on the feature store schema and/or the destination schema when generating datasets for training

  • OPERATE on the dynamic tables and tasks in the feature store schema

  • USAGE on the warehouse passed in to feature store initializer

  • All consumer privileges listed below

Consumers can read information about feature view and entities in the feature store. They require at minimum:

  • USAGE on the feature store database and schema

  • SELECT on and MONITOR on DYNAMIC TABLES in the feature store schema

  • SELECT and REFERENCE on views in the feature store schema

  • USAGE on the warehouse passed to the feature store initializer

Consumers may also have the following privileges to allow them to use feature store data:

  • CREATE TABLE and CREATE DATASET on the feature store schema and/or the destination schema for generating datasets for training

  • SELECT and REFERENCE on tables in the feature store or any schemas containing generated datasets

  • USAGE on DATASETs in the feature store schema or any schemas containing generated datasets

With multiple feature stores, you probably will have these two types of roles for each individual feature store, or for logical groupings of feature stores.

Note

A role with MANAGE GRANTS, CREATE ROLE, and CREATE SCHEMA ON DATABASE <DB> privileges is needed to configure the necessary Feature Store roles and privileges. You may use the ACCOUNTADMIN built-in role or use a custom role with these privileges.

snowflake-ml-python package version 1.5.1 and later include a setup_feature_store utility API for configuring a new feature store with producer and consumer roles and privileges.

from snowflake.ml.feature_store import setup_feature_store

setup_feature_store(
    session=session,
    database="<FS_DATABASE_NAME>",
    schema="<FS_SCHEMA_NAME>",
    warehouse="<FS_WAREHOUSE>",
    producer_role="<FS_PRODUCER_ROLE>",
    consumer_role="<FS_CONSUMER_ROLE>",
)
Copy

You may also manually configure the Feature Store roles and privileges using the SQL commands shown in Access Control Setup in SQL.

Key Concepts

Within Snowflake, feature stores are schemas. You may create as many feature stores as you need and organize them in the databases you choose.

A feature store contains feature views. A feature view encapsulates a pipeline for transforming raw data into one or more related features that are refreshed from the data source at the same time. Inside Snowflake, a feature view is a dynamic table or a view.

Tip

Users who have access to multiple feature stores can combine feature views from more than one feature store to create training and inference datasets

A feature view can be materialized based on a specific table. Features in the materialized feature view are updated incrementally and efficiently as the source table receives new data. A materialized feature view is a Snowflake dynamic table. (This is different from a materialized view.)

Feature views are organized in the feature store according to the entity to which they apply. An entity is a higher-level abstraction that represents what the features are about.

For example, in a movie streaming service, the main entities might be users and movies. Raw movie data and user activity data can be converted into useful features such as per-movie viewing time and user session length. The entity also specifies other attributes of the raw data it’s meant to work with, such as the name of the column on which the entity’s feature data can be joined to the raw data.

Creating or Connecting to a Feature Store

Create a feature store, or connect to an existing feature store instance, by providing a Snowpark session, database name, feature store name and default warehouse name to the FeatureStore constructor. The mode parameter determines whether the feature store is created if it does not already exist.

When a feature store is created, a schema in the specified database is created with the feature store name. Generally, an admin role will create the feature store schema and corresponding roles. After the back end exists, feature store clients (with consumer roles) can use FAIL_IF_NOT_EXIST mode, which is the default.

To create a feature store, use the CreationMode.CREATE_IF_NOT_EXIST mode when instantiating FeatureStore. You can subsequently connect to the existing feature store using CreationMode.FAIL_IF_NOT_EXIST mode. CreationMode.FAIL_IF_NOT_EXIST is the default mode.

Creating the feature store looks like this:

from snowflake.ml.feature_store import FeatureStore, CreationMode

fs = FeatureStore(
    session=session,
    database="MY_DB",
    name="MY_FEATURE_STORE",
    default_warehouse="MY_WH",
    creation_mode=CreationMode.CREATE_IF_NOT_EXIST,
)
Copy

Subsequent use of the feature store looks like this:

fs = FeatureStore(
    session=session,
    database="MY_DB",
    name="MY_FEATURE_STORE",
    default_warehouse="MY_WH",
)
Copy

Creating and Registering Entities

Entities are the underlying objects that features and feature views are associated with. They encapsulate the join keys used for feature lookups. To create a new entity and register it in the feature store, use the feature store’s register_entity method.

from snowflake.ml.feature_store import Entity

entity = Entity(
    name="MY_ENTITY",
    join_keys=["UNIQUE_ID"],
    desc="my entity"
)
fs.register_entity(entity)
Copy

To see all the registered entities in your feature store, use the feature store’s list_entities method, which returns a Snowpark DataFrame.

fs.list_entities().show()
Copy

You can retrieve a previously registered entity using the get_entity method, for example to obtain its join keys.

entity = fs.get_entity(name="MY_ENTITY")
print(entity.join_keys)
Copy

Creating and Registering Feature Views

A feature view is a group of logically-related features that are refreshed on the same schedule. The FeatureView constructor accepts a Snowpark DataFrame that contains the feature generation logic. The provided DataFrame must contain the join_keys columns specified in the entities associated with the feature view. A timestamp column name is required if your feature view includes time-series features.

The refresh frequency can be a time delta (minimum value 1 minute), or it can be a cron expression with time zone (e.g. * * * * * America/Los_Angeles).

from snowflake.ml.feature_store import FeatureView

managed_fv = FeatureView(
    name="MY_MANAGED_FV",
    entities=[entity],
    feature_df=my_df,               # a Snowpark DataFrame
    timestamp_col="ts",             # optional timestamp column name in the dataframe
    refresh_freq="5 minutes",       # optional time unit of how often feature data refreshes
    desc="my managed feature view"  # optional description string.
)
Copy

The example above assumes that features of interest have already been defined in the my_df DataFrame. You can write custom feature logic using Snowpark Python on SQL. The Snowpark Python API provides utility functions for defining common feature types such as windowed aggregations. Examples of these are shown in Feature Examples and Common Query Patterns.

If you have ready-to-use features generated outside of the feature store, you can still register them by omitting the refresh frequency. The feature DataFrame could contain a simple projection from the existing feature table, or additional transformations that will be executed during feature consumption. This does not incur additional storage cost, but other feature store capabilities remain available.

With such features, refresh, immutability, consistency, and correctness are not managed by the feature store. The ready-to-use features must be maintained in some other fashion.

external_fv = FeatureView(
    name="MY_EXTERNAL_FV",
    entities=[entity],
    feature_df=my_external_df,
    timestamp_col="ts",
    refresh_freq=None,      # None = Feature Store will never refresh the feature data
    desc="my external feature view"
)
Copy

To enrich metadata at the feature level, you can add per-feature descriptions to the FeatureView. This makes it easier to find features using Snowsight Universal Search.

external_fv = external_fv.attach_feature_desc(
    {
        "SENDERID":"Sender account-id for the Transaction",
        "RECIEVERID":"Receiver account-id for the Transaction",
        "IBAN":"International Bank Identifier for the Receiver Bank",
        "AMOUNT":"Amount for the Transaction"
    }
)
Copy

At this point, the feature view has been completely defined and can be registered in the feature store.

Registering Feature Views

You register feature views using the register_feature_view method, with a customized name and version. Incremental maintenance (for supported query types) and automatic refresh will occur based on the specified refresh frequency.

When the provided query cannot be maintained via incremental maintenance using a dynamic table, the table will be fully refreshed from the query at the specified frequency. This may lead to greater lag in feature refresh and higher maintenance costs. You can alter the query logic, breaking the query into multiple smaller queries that support incremental maintenance, or provision a larger virtual warehouse for dynamic table maintenance. See General limitations for the latest information on dynamic table limitations.

registered_fv: FeatureView = fs.register_feature_view(
    feature_view=managed_fv,    # feature view created above, could also use external_fv
    version="1",
    block=True,         # whether function call blocks until initial data is available
    overwrite=False,    # whether to replace existing feature view with same name/version
)
Copy

A feature view pipeline definition is immutable after it has been registered, providing consistent feature computation as long as the feature view exists.

Retrieving Feature Views

Once a feature view has been registered with the feature store, you can retrieve it from there when you need it using the feature store’s get_feature_view method.

retrieved_fv: FeatureView = fs.get_feature_view(
    name="MY_MANAGED_FV",
    version="1
)
Copy

Discovering Feature Views

You can list all registered feature views in the feature store, optionally filtering by entity name or feature view name, using the list_feature_views method. Information about the matching features is returned as a Snowpark DataFrame.

fs.list_feature_view(
    entity_name="<entity_name>",                # optional
    feature_view_name="<feature_view_name>",    # optional
).show()
Copy

Features can also be discovered using the Snowsight Feature Store UI (available to select customers) or Universal Search.

Generating Datasets for Training

You can generate a dataset for training with the feature store’s generate_dataset method, which enriches the Snowpark DataFrame that contains the source data with the derived feature values. To select a subset of features from a feature view, use fv.slice.

Generating a dataset materializes the data. Datasets provide an immutable, file-based snapshot of data, which helps to ensure model reproducibility and is useful in distributed training. You can use feature views directly for training to avoid the additional storage costs incurred by datasets if you do not have these requirements.

For time-series features, provide the timestamp column name to automate the point-in-time feature value lookup.

dataset: Dataset = fs.generate_dataset(
    name="MY_DATASET",
    spine_df=MySourceDataFrame,
    features=[registered_fv],
    version="v1",                               # optional
    spine_timestamp_col="TS",                   # optional
    spine_label_cols=["LABEL1", "LABEL2"],      # optional
    include_feature_view_timestamp_col=False,   # optional
    desc="my new dataset",                      # optional
)
Copy

Note

Here, spine_df is a DataFrame containing the entity IDs in source data, the time stamp, label columns, and any additional columns containing training data. Requested features are retrieved for the list of entity IDs, with point-in-time correctness with respect to the provided time stamp.

After creating a dataset, you can pass it to your model when training:

my_model = train_my_model(dataset.read.to_snowpark_dataframe())
Copy

The model can then be logged in the Snowflake Model Registry.

Note

For more information on datasets, see Snowflake Datasets.

Retrieving Features and Making Predictions

If you have created a model in your Python session, you can simply retrieve the feature from the feature store and pass it to your model for prediction, as shown here. You may exclude specified columns using the exclude_columns argument or include the timestamp column from the feature view by setting include_feature_view_timestamp_col.

prediction_df: snowpark.DataFrame = fs.retrieve_feature_values(
    spine_df=prediction_source_dataframe,
    features=[registered_fv],
    spine_timestamp_col="TS",
    exclude_columns=[],
)

# predict with your previously-trained model
my_model.predict(prediction_df)
Copy

Examples

You can find more details with end-to-end examples in the demo notebooks folder inside a release folder on our Google Drive.

Cost Considerations

Materialized features use Snowflake dynamic tables. See About monitoring dynamic tables for information on monitoring dynamic tables and Understanding cost for dynamic tables for information on the costs of dynamic tables.

Feature Store Back End and Data Model

Feature store objects map directly to Snowflake objects. All feature store objects are therefore subject to Snowflake access control rules.

Feature store object

Snowflake object

feature store

schema

feature view

dynamic table (internal features) or view (external features)

entity

tag

feature

column in a dynamic table (internal features) or view (external features)

Properties of feature views (such as name and entity) are implemented as tags on dynamic tables or views.

You can query or manipulate the Snowflake objects directly using SQL. Changes you make via SQL are reflected in the Python API.

All objects of a Snowflake Feature Store are stored in the feature store’s schema. You can easily delete an entire feature store by dropping the schema (but make sure the schema doesn’t contain other resources).

Current Limitations and Known Issues