문장 변환기¶

Snowflake Model Registry는 문장 변환기(sentence_transformers.SentenceTransformer)를 사용하는 모델을 지원합니다. 자세한 내용은 `문장 변환기 설명서 <https://sbert.net/>`_를 참조하세요.

레지스트리가 대상 메서드의 서명을 알도록 하려면 샘플 입력 데이터 또는 모델의 메서드에 대한 입력 및 출력 스키마를 정의하는 서명을 지정해야 합니다.

샘플 입력 데이터의 경우 sample_input_data 매개 변수에 대한 값으로 Snowpark DataFrame을 지정합니다. 예를 들어, :code:`sample_input = pd.DataFrame([“This is a sample sentence.”], columns=[“TEXT”])`와 같은 값을 지정할 수 있습니다.

서명 매개 변수를 사용하는 경우 사전을 signatures 매개 변수의 값으로 지정합니다. 사전은 모델의 입력 및 출력 메서드를 정의합니다. 예를 들어, 다음 코드는 모델의 encode 메서드에 대한 입력 및 출력 스키마를 정의합니다.

from snowflake.ml.model.model_signature import ModelSignature, FeatureSpec, DataType

  signatures = {
      "encode": ModelSignature(
          inputs=[FeatureSpec(dtype=DataType.STRING, name='TEXT')],
          outputs=[FeatureSpec(dtype=DataType.FLOAT, name='EMBEDDINGS', shape=(-1,))]
      )
  }

log_model``을 호출할 때 ``options 사전에서 다음 추가 옵션을 사용할 수 있습니다.


옵션	설명
`target_methods`	모델 오브젝트에서 사용할 수 있는 메서드 이름 목록입니다. 문장 변환기 모델에는 대상 메서드가 존재한다고 가정하면 기본적으로 `encode` 메서드가 있습니다.
`cuda_version`	CUDA가 있는 플랫폼에 배포할 때 사용되는 GPU 런타임 버전은 기본적으로 11.8입니다. 수동으로 `None` 으로 설정하면 GPU가 있는 플랫폼에 모델을 배포할 수 없습니다.

다음 예제에서는

사전 학습된 문장 변환기 모델을 로드합니다.
Snowflake ML Model Registry에 기록
추론을 위해 로깅된 모델을 사용합니다.

참고

이 예제에서 ``reg``는 ``snowflake.ml.registry.Registry``의 인스턴스입니다. 레지스트리 오브젝트 생성에 대한 자세한 내용은 Snowflake Model Registry 섹션을 참조하세요.

from sentence_transformers import SentenceTransformer
import pandas as pd

# 1. Initialize the model
# This example uses the 'all-MiniLM-L6-v2' model, which is a popular
# and efficient model for generating sentence embeddings.
model = SentenceTransformer('all-MiniLM-L6-v2')

# 2. Prepare sample input data
# Sentence Transformers expect a single column of text data for the 'encode' method.
sentences = ["This is an example sentence", "Each sentence is converted into a vector"]
sample_input = pd.DataFrame(sentences, columns=["TEXT"])

# 3. Log the model
# Provide the model object, a name, and a version.
# Including sample_input_data allows the registry to infer the input/output signatures.
model_ref = reg.log_model(
    model=model,
    model_name="my_sentence_transformer",
    version_name="v1",
    sample_input_data=sample_input,
)

# 4. Use the model for inference
# The 'run' method executes the default 'encode' function on the input data.
result_df = model_ref.run(sample_input, function_name="encode")

# The result is a DataFrame where the output column (usually named 'outputs')
# contains the embeddings as arrays of floats.
print(result_df)