シリアライズされたファイルを介して独自のモデルタイプを持ち込むことができます。¶

モデルレジストリは、組み込みモデルタイプのログをレジストリで直接サポートします。また、 snowflake.ml.model.custom_model.CustomModel を使って他のモデルタイプのログを取る方法も提供しています。外部ツールで学習した、あるいはオープンソースリポジトリから入手したシリアライズ可能なモデルは、 CustomModel で使用できます。

このガイドでは、その方法を説明します。

カスタムモデルを作成します。
ファイルとモデルオブジェクトでモデルコンテキストを作成します。
code_paths を使用してモデルに追加のコードを含めます。
カスタムモデルをSnowflake Model Registryにログします。
推論のためにモデルを展開します。

注釈

このクイックスタートは、カスタム PyCaret モデルのログの例を提供します。

キーワード引数によるモデルコンテキストの定義¶

``snowflake.ml.model.custom_model.ModelContext``はユーザー定義のキーワード引数でインスタンス化できます。値は、文字列ファイルパス、または:doc:`サポートされたモデルタイプ </developer-guide/snowflake-ml/model-registry/built-in-models/overview>`のインスタンスのいずれかです。ファイルとシリアライズされたモデルは、モデル推論ロジックで使用するために、モデルと一緒にパッケージ化されます。

インメモリモデルオブジェクトの使用¶

組み込みモデルタイプを操作する場合、推奨されるアプローチはインメモリモデルオブジェクトを ModelContext に直接渡すことです。これにより、Snowflake ML はシリアル化を自動的に処理できるようになります。

import pandas as pd
from snowflake.ml.model import custom_model

# Initialize ModelContext with an in-memory model object
# my_model can be any supported model type (e.g., sklearn, xgboost, lightgbm, and others)
model_context = custom_model.ModelContext(
    my_model=my_model,
)

# Define a custom model class that utilizes the context
class ExampleBringYourOwnModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        # Use the model with key 'my_model' from the context to make predictions
        model_output = self.context['my_model'].predict(input)
        return pd.DataFrame({'output': model_output})

# Instantiate the custom model with the model context. This instance can be logged in the model registry.
my_model = ExampleBringYourOwnModel(model_context)

注釈

カスタムモデルクラスでは、常にモデルコンテキストを介してモデルオブジェクトにアクセスします。たとえば、直接 self.model = model``（``model はインメモリモデルオブジェクト）を割り当てる代わりに self.model = self.context['my_model'] を使用します。モデルに直接アクセスすると、クロージャ内のモデルの2番目のコピーがキャプチャされるので、シリアライズ中にモデルファイルが大幅に大きくなります。

シリアル化されたファイルの使用¶

Pythonピクルスや JSON のようにシリアル化されたファイルに格納されるモデルやデータの場合、ファイルパスを ModelContext に提供できます。ファイルは、シリアル化されたモデル、構成ファイル、またはパラメーターを含むファイルにすることができます。これは、ディスクまたは構成データに保存されたトレーニング済みモデルを操作する場合に便利です。

import pickle
import pandas as pd
from snowflake.ml.model import custom_model

# Initialize ModelContext with a file path
# my_file_path is a local pickle file path
model_context = custom_model.ModelContext(
    my_file_path='/path/to/file.pkl',
)

# Define a custom model class that loads the pickled object
class ExampleBringYourOwnModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

        # Use 'my_file_path' key from the context to load the pickled object
        with open(self.context['my_file_path'], 'rb') as f:
            self.obj = pickle.load(f)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        # Use the loaded object to make predictions
        model_output = self.obj.predict(input)
        return pd.DataFrame({'output': model_output})

# Instantiate the custom model with the model context. This instance can be logged in the model registry.
my_model = ExampleBringYourOwnModel(model_context)

重要

サポートされているモデルタイプ（ XGBoost など）とサポートされていないモデルやデータを組み合わせる場合は、サポートされているモデルを自分でシリアル化する必要はありません。サポートされているモデルオブジェクトをコンテキストで直接設定（例: base_model = my_xgb_model）すると自動的にシリアル化されます。

重要

@custom_model.inference_api で装飾されたメソッドは常に複数行のデータフレームで動作するように記述されている必要があります。入力 DataFrame が常に1行のみを含むとは限らないことを念頭に置いてください。特にリアルタイム推論では、サーバー側のバッチ処理により、複数のソースからの単一記録リクエストも1つの DataFrame にバッチ処理できます。

推論パラメーターの定義¶

カスタムモデル推論メソッドは、温度設定やトークンの最大数など、推論動作を制御するオプションのパラメーターを受け入れることができます。パラメーターは、``@inference_api``メソッドのキーワードのみの引数（``*``の後）として定義し、型アノテーションとデフォルト値を指定します。

import pandas as pd
from snowflake.ml.model import custom_model

class TextGenerationModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(
        self,
        input: pd.DataFrame,
        *,
        temperature: float = 0.7,
        max_tokens: int = 256,
    ) -> pd.DataFrame:
        # Use temperature and max_tokens to control generation behavior
        output = self.context['my_model'].generate(
            input["input_text"],
            temperature=temperature,
            max_tokens=max_tokens,
        )
        return pd.DataFrame({"output_text": output})

このモデルがログに記録されると、パラメーターは自動的にモデル署名に含まれます。呼び出し元は推論時にそれらをオーバーライドすることも、省略してデフォルトを使用することもできます。詳細については、:doc:`モデル署名の指定</developer-guide/snowflake-ml/model-registry/model-signature>`をご参照ください。

推論パラメーターには以下の要件が適用されます。

キーワードのみである必要があります（メソッド署名内で``*``の後に定義）。
型の注釈が必要です。サポートされている型は、int、float、str、bool、bytes、datetime.datetime、およびサポートされている要素型（例: list[str]、list[list[int]]）をもつ``list``です。
それらにはデフォルト値が必要です。

カスタムモデルのテストとログ記録¶

カスタムモデルをローカルで実行してテストすることができます。

my_model = ExampleBringYourOwnModel(model_context)
output_df = my_model.predict(input_df)

モデルが意図したとおりに動作したら、Snowflake Model Registryにログを記録します。ここに示すように、モデルクラスが必要とするライブラリを指定するために conda_dependencies （もしくは pip_requirements）をプロバイダーとして指定します。モデルの入力シグネチャを推測するために、 sample_input_data （pandasまたはSnowpark DataFrame）をプロバイダーとして提供します。あるいは、モデル署名を提供します。

reg = Registry(session=sp_session, database_name="ML", schema_name="REGISTRY")
mv = reg.log_model(my_model,
            model_name="my_custom_model",
            version_name="v1",
            conda_dependencies=["scikit-learn"],
            comment="My Custom ML Model",
            sample_input_data=train_features)
output_df = mv.run(input_df)

code_pathsを使用した追加コードの追加¶

Registry.log_model の code_paths パラメーターを使用して、ヘルパーモジュール、ユーティリティ、モデルを使用した構成ファイルなどのPythonコードをパッケージ化します。このコードは、ローカルと同じようにインポートできます。

ファイルやディレクトリをコピーする文字列パス、または CodePath オブジェクトを提供します。オブジェクトは、含まれるサブディレクトリやファイル、およびモデルによって使用されるインポートパスをより詳細に制御します。

文字列パスの使用¶

ファイルまたはディレクトリを含めるために文字列パスのリストを渡します。各パスの最後のコンポーネントは、インポート可能なモジュール名になります。

mv = reg.log_model(
    my_model,
    model_name="my_model",
    version_name="v1",
    code_paths=["src/mymodule"],  # import with: import mymodule
)

フィルター付き CodePath の使用¶

ディレクトリツリーの一部のみをパッケージ化する場合や、モデルが使用するインポートパスを制御する場合は、 CodePath クラスを使用します。

from snowflake.ml.model import CodePath

CodePath には2つのパラメーターがあります。

root:ディレクトリまたはファイルパス。
filter （オプション）:サブディレクトリまたはファイルを選択する root の下にある相対パス。

filter が提供される場合、ソースは root/filter であり、 filter 値はインポートパスを決定します。例: filter="utils" により import utils が許可され、さらに filter="pkg/subpkg" により import pkg.subpkg が許可されます。

例: このプロジェクト構造が以下であると仮定した場合

my_project/src/
├── utils/
│   └── preprocessing.py
├── models/
│   └── classifier.py
└── tests/          # Not needed for inference

tests/ を除く utils/ および models/ のみをパッケージ化するために、

mv = reg.log_model(
    my_model,
    model_name="my_model",
    version_name="v1",
    code_paths=[
        CodePath("my_project/src/", filter="utils/"),
        CodePath("my_project/src/", filter="models/"),
    ],
)

単一のファイルをフィルターすることもできます。

code_paths=[
    CodePath("my_project/src/", filter="utils/preprocessing.py"),
]
# Import with: import utils.preprocessing

例： PyCaret モデルのログ記録¶

次の例では、 PyCaret を使用してカスタムモデルタイプをログに記録します。 PyCaret は、Snowflakeがネイティブにサポートしていない、ローコードで高効率なサードパーティパッケージです。同様のメソッドを使用して、独自のモデルタイプを持ち込むことができます。

ステップ1：モデルコンテキストの定義¶

モデルをログに記録する前に、モデルコンテキストを定義します。モデルコンテキストは、独自のカスタムモデルタイプを参照します。次の例では、コンテキストの``model_file``属性を使用して、シリアライズされた（ピクルス化された）モデルへのパスを指定します。属性には、その名前が他のものに使用されない限り、任意の名前を選択できます。

pycaret_model_context = custom_model.ModelContext(
  model_file = 'pycaret_best_model.pkl',
)

ステップ2: カスタムモデルクラスの作成¶

ネイティブサポートのないモデルタイプをログに記録するために、カスタムモデルクラスを定義します。この例では、 CustomModel から派生した PyCaretModel クラスが定義され、モデルがレジストリにログ記録されるようになっています。

from pycaret.classification import load_model, predict_model

class PyCaretModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)
        model_dir = self.context["model_file"][:-4]  # Remove '.pkl' suffix
        self.model = load_model(model_dir, verbose=False)
        self.model.memory = '/tmp/'  # Update memory directory

    @custom_model.inference_api
    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
        model_output = predict_model(self.model, data=X)
        return pd.DataFrame({
            "prediction_label": model_output['prediction_label'],
            "prediction_score": model_output['prediction_score']
        })

注釈

図のように、モデルのメモリディレクトリを /tmp/ にセットします。Snowflakeのウェアハウスノードはディレクトリアクセスが制限されています。/tmp は常に書き込み可能で、モデルがファイルを書き込む場所が必要な場合に安全な選択です。他のタイプでは必要ないかもしれません。

ステップ 3: カスタムモデルのテスト¶

以下のようなコードを使用して、 PyCaret モデルをローカルでテストします。

test_data = [
    [1, 237, 1, 1.75, 1.99, 0.00, 0.00, 0, 0, 0.5, 1.99, 1.75, 0.24, 'No', 0.0, 0.0, 0.24, 1],
    # Additional test rows...
]
col_names = ['Id', 'WeekofPurchase', 'StoreID', 'PriceCH', 'PriceMM', 'DiscCH', 'DiscMM',
            'SpecialCH', 'SpecialMM', 'LoyalCH', 'SalePriceMM', 'SalePriceCH',
            'PriceDiff', 'Store7', 'PctDiscMM', 'PctDiscCH', 'ListPriceDiff', 'STORE']

test_df = pd.DataFrame(test_data, columns=col_names)

my_pycaret_model = PyCaretModel(pycaret_model_context)
output_df = my_pycaret_model.predict(test_df)

ステップ 4: モデルシグネチャの定義¶

この例では、モデルのシグネチャを入力検証用に推論するためにサンプルデータを使用します。

predict_signature = model_signature.infer_signature(input_data=test_df, output_data=output_df)

ステップ5：モデルのログを取る¶

次のコードは、Snowflake Model Registryにモデルをログ（登録）します。

snowml_registry = Registry(session)

custom_mv = snowml_registry.log_model(
    my_pycaret_model,
    model_name="my_pycaret_best_model",
    version_name="version_1",
    conda_dependencies=["pycaret==3.0.2", "scipy==1.11.4", "joblib==1.2.0"],
    options={"relax_version": False},
    signatures={"predict": predict_signature},
    comment = 'My PyCaret classification experiment using the CustomModel API'
)

ステップ6：レジストリでモデルを確認します。¶

モデルが Model Registry で利用可能かどうかを確認するには、 show_models 関数を使用します。

snowml_registry.show_models()

ステップ7：登録されたモデルによる予測¶

run 関数を使用して、予測モデルを呼び出します。

snowpark_df = session.create_dataframe(test_data, schema=col_nms)

custom_mv.run(snowpark_df).show()

次のステップ¶

Snowflake Model Registryを使用して PyCaret モデルをデプロイすると、Snowsightでモデルを表示できます。ナビゲーションメニューで AI & ML » Models を選択します。そこに表示されない場合は、 ACCOUNTADMIN ロールまたはモデルのログに使用したロールを使用していることを確認してください。

SQL のモデルを使うには、 SQL を次のように使います：

SELECT
    my_pycaret_model!predict(*) AS predict_dict,
    predict_dict['prediction_label']::text AS prediction_label,
    predict_dict['prediction_score']::double AS prediction_score
from pycaret_input_data;