Specifying model signatures¶
To ensure a consistent experience no matter where a model is run, the Snowflake Model Registry needs to know the input and output schema of the model’s inference methods: that is, the name and type of all columns in the input or output DataFrame. This allows these columns to be mapped between Python and SQL data types when necessary. This schema is referred to as a signature by analogy to the arguments of a function and their types.
For certain ML frameworks, the model registry can infer these schemas, either from data structures in the model itself
or from sample input data. However, models often accept or return objects that lack this information, such as NumPy
arrays. In these cases, Snowpark ML infers the input feature names as input_feature_1
, input_feature_2
, and so
on. Similarly, output features are named output_feature_1
, output_feature_2
, and so on.
To use more meaningful names in your custom models, you can use one of the following methods:
Update
sample_input_data
with column names, usually by converting the dataset to a pandas or Snowpark DataFrame.Explicitly pass signatures to
log_model
. When a model does not produce names in its output, explicit signatures might be the only option.
Inferring a signature¶
Like the model registry itself, you can generate signatures automatically. Use
snowflake.ml.model.model_signature.infer_signature
to infer a signature based on provided sample input, output, and
column names, and then apply that signature to the appropriate methods when logging the model, as in the following example:
import pandas as pd
from sklearn import svm, datasets
from snowflake.ml.model import model_signature
digits = datasets.load_digits()
target_digit = 6
def one_vs_all(dataset, digit):
return [x == digit for x in dataset]
train_features = digits.data[:10]
train_labels = one_vs_all(digits.target[:10], target_digit)
clf = svm.SVC(gamma=0.001, C=10.0, probability=True)
clf.fit(train_features, train_labels)
sig = model_signature.infer_signature(
train_features,
labels_df,
input_feature_names=['column1', 'column2', ...],
output_feature_names=['is_target_digit'])
# Supply a signature for every function the model exposes, in this case only `predict`.
mv = reg.log_model(
clf,
model_name='my_model',
version_name='v1',
signatures={"predict": sig}
)
This example applies the signature to only one method, but you can infer a signature for each method your model exposes.
You can use the same signature object (sig
in the example) for all methods that have the same signature.
Constructing a signature¶
You can also manually construct a signature by using snowflake.ml.model.model_signature.ModelSignature
. Both scalar and
tensor types (including ragged tensors) are supported.
Example:
from snowflake.ml.model.model_signature import ModelSignature, FeatureSpec, DataType
sig = ModelSignature(
inputs=[
FeatureSpec(dtype=DataType.DOUBLE, name=f_0),
FeatureSpec(dtype=DataType.INT64, name=sparse_0_fixed_len, shape=(5, 5)),
FeatureSpec(dtype=DataType.INT64, name=sparse_1_variable_len, shape=(-1,)),
],
outputs=[
FeatureSpec(dtype=DataType.FLOAT, name=output),
]
)
Then pass the signature object, sig
, to log_model
with the signatures
argument as in the example above
for the methods to which it applies.
Data type mappings¶
This section describes the equivalence of types in the Snowflake Model Registry for supported type systems.
Column data types¶
The following table shows the equivalence of model signature (SQL) type, pandas DataFrames (NumPy) type, and Snowpark Python type.
Model signature (SQL) type |
pandas DataFrame (NumPy) type |
Snowpark Python type |
---|---|---|
INT8 |
|
|
INT16 |
|
|
INT32 |
|
|
INT64 |
|
|
FLOAT |
|
|
DOUBLE |
|
|
UINT8 |
|
|
UINT16 |
|
|
UINT32 |
|
|
UINT64 |
|
|
BOOL |
|
|
STRING |
|
|
BYTES |
|
|
TIMESTAMP_NTZ |
|
|
The representation of tensor features where the shape is specified uses np.object_
.
Missing values¶
NULL values are not permitted in the sample input data or the inference input data.
Conversion from NumPy¶
If the NumPy data type can be safely cast to a NumPy type shown in Column data types, it is inferred as the corresponding data type.
Conversion from PyTorch¶
PyTorch type |
Model signature (SQL) type |
---|---|
|
UINT8 |
|
INT8 |
|
INT16 |
|
INT32 |
|
INT64 |
|
FLOAT |
|
DOUBLE |
|
BOOL |
Conversion from Snowpark¶
In addition to the mappings shown in Column data types, the following conversions apply:
DecimalType
with scale of 0 maps to INT64.DecimalType
with scale greater than 0 maps to DOUBLE.