snowflake.ml.feature_store.FeatureView¶
- class snowflake.ml.feature_store.FeatureView(name: str, entities: list[Entity], feature_df: DataFrame, *, timestamp_col: Optional[str] = None, refresh_freq: Optional[str] = None, desc: str = '', warehouse: Optional[str] = None, initialize: str = 'ON_CREATE', refresh_mode: str = 'AUTO', cluster_by: Optional[list[str]] = None, online_config: Optional[OnlineConfig] = None, **_kwargs: Any)¶
Bases:
LineageNodeA FeatureView instance encapsulates a logical group of features.
Create a FeatureView instance.
- Parameters:
name – The name of the FeatureView. This must follow Snowflake identifier rules.
entities – The entities that the FeatureView is associated with.
feature_df – The Snowpark DataFrame containing data source and all feature feature_df logic. The final projection of the DataFrame should contain feature names, join keys and timestamp if applicable.
timestamp_col – name of the timestamp column for point-in-time lookup when consuming the feature values.
refresh_freq –
Time unit defining how often the new feature data should be generated, in the format
{ <num> { seconds | minutes | hours | days } | DOWNSTREAM | <cron expr> <time zone>}.The minimum refresh frequency is 1 minute.
When using a
cronformat, you must provide a time zone.When you don’t provide a refresh value, the
FeatureViewis registered as aViewon the Snowflake backend. There are no extra storage costs incurred for this view.desc – Description of the FeatureView.
warehouse – The warehouse used to refresh this feature view. Not needed when
refresh_freqisNone. This warehouse will overwrite the default warehouse of Feature Store if specified, otherwise the default warehouse will be used.initialize – Specifies the behavior of the initial refresh of feature view. This property cannot be altered after you register the feature view. It supports ON_CREATE (default) or ON_SCHEDULE. ON_CREATE refreshes the feature view synchronously at creation. ON_SCHEDULE refreshes the feature view at the next scheduled refresh. It is only effective when refresh_freq is not None.
refresh_mode – The refresh mode of managed feature view. The value can be ‘AUTO’, ‘FULL’ or ‘INCREMENTAL’. For managed feature view, the default value is ‘AUTO’. For static feature view it has no effect. For more information, see CREATE DYNAMIC TABLE.
cluster_by – Columns to cluster the feature view by. If
timestamp_colis provided, it is added to the default clustering keys. Default is to use the join keys from entities in the view.online_config –
Configuration for online storage. If provided with
enable=True, online storage will be enabled. Defaults toNone(no online storage).Note
This feature is currently in preview.
_kwargs –
Reserved kwargs for system generated args.
Caution
Use of additional keywords is prohibited.
Example:
>>> fs = FeatureStore(...) >>> # draft_fv is a local object that hasn't materialized to Snowflake backend yet. >>> feature_df = session.sql("select f_1, f_2 from source_table") >>> draft_fv = FeatureView( ... name="my_fv", ... entities=[e1, e2], ... feature_df=feature_df, ... timestamp_col='TS', # optional ... refresh_freq='1d', # optional ... desc='A line about this feature view', # optional ... warehouse='WH' # optional, the warehouse used to refresh (managed) feature view ... ) >>> print(draft_fv.status) FeatureViewStatus.DRAFT >>> # registered_fv is a local object that maps to a Snowflake backend object. >>> registered_fv = fs.register_feature_view(draft_fv, "v1") >>> print(registered_fv.status) FeatureViewStatus.ACTIVE >>> # Example with online configuration for online feature storage >>> config = OnlineConfig(enable=True, target_lag='15s') >>> online_fv = FeatureView( ... name="my_online_fv", ... entities=[e1, e2], ... feature_df=feature_df, ... timestamp_col='TS', ... refresh_freq='1d', ... desc='Feature view with online storage', ... online_config=config # optional, enables online feature storage ... ) >>> registered_online_fv = fs.register_feature_view(online_fv, "v1") >>> print(registered_online_fv.online) True
Methods
- attach_feature_desc(descs: dict[str, str]) FeatureView¶
Associate feature level descriptions to the FeatureView.
- Parameters:
descs – Dictionary contains feature name and corresponding descriptions.
- Returns:
FeatureView with feature level desc attached.
- Raises:
ValueError – if feature name is not found in the FeatureView.
Example:
>>> fs = FeatureStore(...) >>> e = fs.get_entity('TRIP_ID') >>> feature_df = session.table(source_table).select('TRIPDURATION', 'START_STATION_LATITUDE', 'TRIP_ID') >>> draft_fv = FeatureView(name='F_TRIP', entities=[e], feature_df=feature_df) >>> draft_fv = draft_fv.attach_feature_desc({ ... "TRIPDURATION": "Duration of a trip.", ... "START_STATION_LATITUDE": "Latitude of the start station." ... }) >>> registered_fv = fs.register_feature_view(draft_fv, version='1.0') >>> registered_fv.feature_descs OrderedDict([('TRIPDURATION', 'Duration of a trip.'), ('START_STATION_LATITUDE', 'Latitude of the start station.')])
- classmethod from_json(json_str: str, session: Session) FeatureView¶
- fully_qualified_name() str¶
Returns the fully qualified name (<database_name>.<schema_name>.<feature_view_name>) for the FeatureView in Snowflake.
- Returns:
fully qualified name string.
- Raises:
RuntimeError – if the FeatureView is not registered.
Example:
>>> fs = FeatureStore(...) >>> e = fs.get_entity('TRIP_ID') >>> feature_df = session.table(source_table).select( ... 'TRIPDURATION', ... 'START_STATION_LATITUDE', ... 'TRIP_ID' ... ) >>> darft_fv = FeatureView(name='F_TRIP', entities=[e], feature_df=feature_df) >>> registered_fv = fs.register_feature_view(darft_fv, version='1.0') >>> registered_fv.fully_qualified_name() 'MY_DB.MY_SCHEMA."F_TRIP$1.0"'
- fully_qualified_online_table_name() str¶
Get the fully qualified name for the online feature table.
- Returns:
The fully qualified name (<database_name>.<schema_name>.<online_table_name>) for the online feature table in Snowflake.
- Raises:
RuntimeError – if the FeatureView is not registered or not configured for online storage.
- lineage(direction: Literal['upstream', 'downstream'] = 'downstream', domain_filter: Optional[set[Literal['feature_view', 'dataset', 'model', 'table', 'view']]] = None) list[typing.Union[ForwardRef('feature_view.FeatureView'), ForwardRef('dataset.Dataset'), ForwardRef('model_version_impl.ModelVersion'), ForwardRef('LineageNode')]]¶
Retrieves the lineage nodes connected to this node.
- Parameters:
direction – The direction to trace lineage. Defaults to “downstream”.
domain_filter – Set of domains to filter nodes. Defaults to None.
- Returns:
A list of connected lineage nodes.
- Return type:
List[LineageNode]
- list_columns() DataFrame¶
List all columns and their information.
- Returns:
A Snowpark DataFrame contains feature information.
Example:
>>> fs = FeatureStore(...) >>> e = Entity("foo", ["id"], desc='my entity') >>> fs.register_entity(e) >>> draft_fv = FeatureView( ... name="fv", ... entities=[e], ... feature_df=self._session.table(<source_table>).select(["NAME", "ID", "TITLE", "AGE", "TS"]), ... timestamp_col="ts", >>> ).attach_feature_desc({"AGE": "my age", "TITLE": '"my title"'}) >>> fv = fs.register_feature_view(draft_fv, '1.0') >>> fv.list_columns().show() -------------------------------------------------- |"NAME" |"CATEGORY" |"DTYPE" |"DESC" | -------------------------------------------------- |NAME |FEATURE |string(64) | | |ID |ENTITY |bigint |my entity | |TITLE |FEATURE |string(128) |"my title" | |AGE |FEATURE |bigint |my age | |TS |TIMESTAMP |bigint |NULL | --------------------------------------------------
- slice(names: list[str]) FeatureViewSlice¶
Select a subset of features within the FeatureView.
- Parameters:
names – feature names to select.
- Returns:
FeatureViewSlice instance containing selected features.
- Raises:
ValueError – if selected feature names is not found in the FeatureView.
Example:
>>> fs = FeatureStore(...) >>> e = fs.get_entity('TRIP_ID') >>> # feature_df contains 3 features and 1 entity >>> feature_df = session.table(source_table).select( ... 'TRIPDURATION', ... 'START_STATION_LATITUDE', ... 'END_STATION_LONGITUDE', ... 'TRIP_ID' ... ) >>> darft_fv = FeatureView(name='F_TRIP', entities=[e], feature_df=feature_df) >>> fv = fs.register_feature_view(darft_fv, version='1.0') >>> # shows all 3 features >>> fv.feature_names ['TRIPDURATION', 'START_STATION_LATITUDE', 'END_STATION_LONGITUDE'] >>> # slice a subset of features >>> fv_slice = fv.slice(['TRIPDURATION', 'START_STATION_LATITUDE']) >>> fv_slice.names ['TRIPDURATION', 'START_STATION_LATITUDE'] >>> # query the full set of features in original feature view >>> fv_slice.feature_view_ref.feature_names ['TRIPDURATION', 'START_STATION_LATITUDE', 'END_STATION_LONGITUDE']
- to_df(session: Optional[Session] = None) DataFrame¶
Convert feature view to a Snowpark DataFrame object.
- Parameters:
session – [deprecated] This argument has no effect. No need to pass a session object.
- Returns:
A Snowpark Dataframe object contains the information about feature view.
Example:
>>> fs = FeatureStore(...) >>> e = Entity("foo", ["id"], desc='my entity') >>> fs.register_entity(e) >>> draft_fv = FeatureView( ... name="fv", ... entities=[e], ... feature_df=self._session.table(<source_table>).select(["NAME", "ID", "TITLE", "AGE", "TS"]), ... timestamp_col="ts", >>> ).attach_feature_desc({"AGE": "my age", "TITLE": '"my title"'}) >>> fv = fs.register_feature_view(draft_fv, '1.0') >>> fv.to_df().show() ----------------------------------------------------------------... |"NAME" |"ENTITIES" |"TIMESTAMP_COL" |"DESC" | ----------------------------------------------------------------... |FV |[ |TS |foobar | | | { | | | | | "desc": "my entity", | | | | | "join_keys": [ | | | | | "ID" | | | | | ], | | | | | "name": "FOO", | | | | | "owner": null | | | | | } | | | | |] | | | ----------------------------------------------------------------...
- to_json() str¶
Attributes
- cluster_by¶
- database¶
- desc¶
- entities¶
- feature_descs¶
- feature_df¶
- feature_names¶
- initialize¶
- name¶
- online¶
Check if online storage is enabled for this feature view.
- Returns:
True if online storage is enabled, False otherwise.
- online_config¶
- output_schema¶
- owner¶
- query¶
- refresh_freq¶
- refresh_mode¶
- refresh_mode_reason¶
- schema¶
- status¶
- timestamp_col¶
- version¶
- warehouse¶