モデルによる前処理と後処理¶

このトピックでは、モデルを作成し、Snowflake Model Registryにログを記録し、デプロイする方法について、多くのモデルタイプとシナリオを例に説明します。これらには次が含まれます。

インメモリscikit-learnモデルとパイプライン。
あなただけのカスタムモデル。
複数のモデル。

インメモリscikit-learnモデルとパイプライン¶

Snowflake ML では、 ModelContext クラスのキーワード引数を使用することで、インメモリ scikit-learn モデルを Modeling Registry にシームレスに統合することができます。以下は、モデルコンテキストのキーワード引数としてインメモリ scikit-learn モデルを渡し、カスタムモデルクラスでそれを呼び出す例です。

from sklearn import datasets, svm
import pandas as pd
from snowflake.ml.model import custom_model

# Step 1: Import the Iris dataset
iris_X, iris_y = datasets.load_iris(return_X_y=True)

# Step 2: Initialize a scikit-learn LinearSVC model and train it
svc = svm.LinearSVC()
svc.fit(iris_X, iris_y)

# Step 3: Initialize ModelContext with keyword arguments
mc = custom_model.ModelContext(
    my_model=svc,
)

# Step 4: Define a custom model class to utilize the context
class ExampleSklearnModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        # Use the model from the context for predictions
        model_output = self.context['my_model'].predict(input)
        # Return the predictions in a DataFrame
        return pd.DataFrame({'output': model_output})

Copy

Snowflake ML での `scikit-learn` パイプラインの使用¶

以下は、Snowflake ML 内で scikit-learn パイプラインを使用する方法を示す例です。これには、スケーリングやインピュテーションなどの前処理ステップが含まれ、その後に予測モデルが続き、これらはすべて ModelContext を使用してカスタム・モデル・クラス内で管理されます。

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
import pandas as pd
from snowflake.ml.model import custom_model

# Step 1: Load the Iris dataset
iris_X, iris_y = datasets.load_iris(return_X_y=True)

# Step 2: Create a scikit-learn pipeline
# The pipeline includes:
# - A SimpleImputer to handle missing values
# - A StandardScaler to standardize the data
# - A Support Vector Classifier (SVC) for predictions
pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler()),
    ('classifier', SVC(kernel='linear', probability=True))
])

# Step 3: Fit the pipeline to the dataset
pipeline.fit(iris_X, iris_y)

# Step 4: Initialize ModelContext with the pipeline
mc = custom_model.ModelContext(
    pipeline_model=pipeline,
)

# Step 5: Define a custom model class to utilize the pipeline
class ExamplePipelineModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        # Use the pipeline from the context to process input and make predictions
        predictions = self.context['pipeline_model'].predict(input)
        probabilities = self.context['pipeline_model'].predict_proba(input)

        # Return predictions and probabilities as a DataFrame
        return pd.DataFrame({
            'predictions': predictions,
            'probability_class_0': probabilities[:, 0],
            'probability_class_1': probabilities[:, 1]
        })

# Example usage:
# Convert new input data into a DataFrame
new_input = pd.DataFrame(iris_X[:5])  # Using the first 5 samples for demonstration

# Initialize the custom model and run predictions
custom_pipeline_model = ExamplePipelineModel(context=mc)
result = custom_pipeline_model.predict(new_input)

print(result)

Copy

独自のモデルの使用¶

以下の例では、カスタムモデルとして独自のモデルを使用しています。

mc = custom_model.ModelContext(
    my_model=your_own_model,
)

from snowflake.ml.model import custom_model
import pandas as pd
import json

class ExampleYourOwnModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        model_output = self.context['my_model'].predict(features)
        return pd.DataFrame({'output': model_output})

Copy

複数のモデルの使用¶

以下は、複数のモデルを組み合わせ、構成ファイルを使用して予測を生成する際にバイアスを適用するカスタムモデルです。

mc = custom_model.ModelContext(
    model1=model1,
    model2=model2,
    feature_preproc=preproc
    }
)

Copy

注釈

model1 および model2 は、レジストリによってネイティブにサポートされた任意のタイプのモデルのオブジェクトです。 feature_preproc は scikit-learn pipeline オブジェクトです。

from snowflake.ml.model import custom_model
import pandas as pd
import json

class ExamplePipelineModel(custom_model.CustomModel):

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        ...
        return pd.DataFrame(...)


# Here is the fully-functional custom model that uses both model1 and model2
class ExamplePipelineModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        features = self.context['feature_preproc'].transform(input)
        model_output = self.context['model1'].predict(
            self.context['model2'].predict(features)
        )
        return pd.DataFrame({'output': model_output})

Copy

モデルによる前処理と後処理¶

インメモリscikit-learnモデルとパイプライン¶

Snowflake ML での scikit-learn パイプラインの使用¶

独自のモデルの使用¶

複数のモデルの使用¶

Snowflake ML での `scikit-learn` パイプラインの使用¶