Hugging Faceパイプライン¶

Snowflakeモデルレジストリは、 transformers.Pipeline メソッドを使用してロードできる変換器として定義されたHugging Faceモデルをサポートします。

以下のメソッドのいずれかを使用して、Hugging Faceモデルをモデルレジストリにログします。

Snowsight を使用して、Hugging Faceからモデルをインポートしてデプロイします。手順については外部サービスからモデルをインポートしてデプロイするをご参照ください。
snowflake.ml.model.models.huggingface.TransformersPipeline インスタンスを作成し、 log_model(): を呼び出します。
```
# reg: snowflake.ml.registry.Registry

from snowflake.ml.model.models import huggingface


model = huggingface.TransformersPipeline(
    task="text-classification",
    model="ProsusAI/finbert",
    # compute_pool_for_log=... # Optional
)

mv = reg.log_model(model, model_name='finbert', version_name='v5')
```
重要
- compute_pool_for_log 引数を指定しない場合、モデルはデフォルトの CPU コンピューティングプールを使用してログに記録されます。
- compute_pool_for_log 引数を指定した場合、モデルは指定されたコンピューティングプールを使用してログに記録されます。
- compute_pool_for_log 引数をNoneに指定した場合、モデルファイルはローカルにダウンロードされ、モデルレジストリにアップロードされます。これには、 huggingface-hub をインストールする必要があります。

Hugging Faceからモデルをメモリにロードし、モデルレジストリにログを記録します。

# reg: snowflake.ml.registry.Registry

lm_hf_model = transformers.pipeline(
    task="text-generation",
    model="bigscience/bloom-560m",
    token="...",  # Put your HuggingFace token here.
    return_full_text=False,
    max_new_tokens=100,
)

lmv = reg.log_model(lm_hf_model, model_name='bloom', version_name='v560m')

Snowflake Notebooksを使用している場合、モデルの重みをダウンロードするには、ノートブックに外部アクセス統合がアタッチされている必要があります。この統合は、次のホストへのエグレスを許可するために必要です。

huggingface.co
hub-ci.huggingface.co
cdn-lfs-us-1.hf.co
cdn-lfs-eu-1.hf.co
cdn-lfs.hf.co
transfer.xethub.hf.co
cas-server.xethub.hf.co
cas-bridge.xethub.hf.c

注釈

このホストのリストは、Hugging Faceにアクセスするために必要なもののみであり、いつでも変更される可能性があります。モデルは他のソースからのアーティファクトを必要とするかもしれません。

以下の例では、ノートブックで使用するための新しい外部アクセス統合 huggingface_network_rule を作成しています。

CREATE NETWORK RULE huggingface_network_rule
TYPE = HOST_PORT
VALUE_LIST = (
    'huggingface.co',
    'hub-ci.huggingface.co',
    'cdn-lfs-us-1.hf.co',
    'cdn-lfs-eu-1.hf.co',
    'cdn-lfs.hf.co',
    'transfer.xethub.hf.co',
    'cas-server.xethub.hf.co',
    'cas-bridge.xethub.hf.co'
)
MODE = EGRESS
COMMENT = 'Network Rule for Hugging Face external access';

CREATE EXTERNAL ACCESS INTEGRATION huggingface_access_integration
ALLOWED_NETWORK_RULES = (huggingface_network_rule)
ENABLED = true;

詳細については、外部アクセス統合の作成と使用をご参照ください。

外部アクセス統合が作成されたら、ノートブックに添付し、Hugging Faceモデルリポジトリにアクセスして、モデルの重みと構成をダウンロードします。詳細については、 Snowflake Notebooks の外部アクセスの設定をご参照ください。

モデルレジストリ API¶

log_model() を呼び出すとき、 options ディクショナリは以下のキーをサポートしています。


オプションキー	説明	型
`target_methods`	モデルオブジェクトで利用可能なメソッドのリスト。Hugging Faceモデルは、デフォルトでオブジェクトの `__call__` メソッドを使用します（存在する場合）。	`list[str]`
`cuda_version`	CUDAを持つプラットフォームへの展開時に使用するGPUランタイムのバージョン。`None` に設定されている場合、モデルはGPUのあるプラットフォームにはデプロイできません。デフォルトは `12.4` です。	`Optional[str]`

モデルレジストリは、パイプラインが以下のリストのタスクを含む場合に、 signatures 引数を推測します。

fill-mask
question-answering（単一出力、複数出力）
summarization
table-question-answering
text2text-generation
text-classification（単一出力、複数出力）
sentiment-analysis（単一出力、複数出力）
text-generation （ OpenAI互換設定を含む）
token-classification
ner
translation
translation_xx_to_yy 。ここで、 xx および yy は、 ISO 3166-1 alpha-2 で定義されている2文字の国コードです。
zero-shot-classification

注釈

タスク名は大文字と小文字が区別されます。

sample_input_data への log_model 引数は、Hugging Face モデルでは無視されます。レジストリがターゲットメソッドのシグネチャを知ることができるように、前述のリストにないHugging Faceモデルをログに記録する際には signatures 引数を指定します。

推論された署名を見るには、 show_functions() メソッドを呼び出します。この署名は、モデル関数の入力に必要な型と列名、およびその出力の形式を提供します。次の例は、 text-generation タスクを持つモデル bigscience/bloom-560m のシグネチャを示しています。

{'name': '__CALL__',
  'target_method': '__call__',
  'signature': ModelSignature(
                      inputs=[
                          FeatureSpec(dtype=DataType.STRING, name='inputs')
                      ],
                      outputs=[
                          FeatureSpec(dtype=DataType.STRING, name='outputs')
                      ]
                  )}]

次の例は、以前の署名を使用してモデルを呼び出す方法を示しています。

# model: snowflake.ml.model.ModelVersion

import pandas as pd

remote_prediction = model.run(pd.DataFrame(["Hello, how are you?"], columns=["inputs"]))

使用上の注意¶

Hugging Faceのモデルの多くは大型で、標準的なウェアハウスには収まりません。Snowparkに最適化されたウェアハウスを使用するか、モデルの小さいバージョンを選択します。たとえば、 Llama-2-70b-chat-hf モデルの代替 Llama-2-7b-chat-hf です。
Snowflakeウェアハウスには GPUs はありません。CPU に最適化されたHugging Faceモデルのみを使用します。
Hugging Face変換器の中には、入力行ごとにディクショナリの配列を返すものがあります。モデルレジストリは、辞書の配列を、JSON表現を含む文字列に変換します。例えば、複数出力のQuestion Answeringの出力はこのようになります。
```
'[{"score": 0.61094731092453, "start": 139, "end": 178, "answer": "learn more about the world of athletics"},
{"score": 0.17750297486782074, "start": 139, "end": 180, "answer": "learn more about the world of athletics.\""}]'
```

例¶

# Prepare model

import transformers
import pandas as pd

finbert_model = transformers.pipeline(
    task="text-classification",
    model="ProsusAI/finbert",
    top_k=2,
)

# Log the model
mv = registry.log_model(
    finbert_model,
    model_name="finbert",
    version_name="v1",
)

# Use the model
mv.run(pd.DataFrame(
        [
            ["I have a problem with my Snowflake that needs to be resolved asap!!", ""],
            ["I would like to have udon for today's dinner.", ""],
        ]
    )
)

結果:

0  [{"label": "negative", "score": 0.8106237053871155}, {"label": "neutral", "score": 0.16587384045124054}]
1  [{"label": "neutral", "score": 0.9263970851898193}, {"label": "positive", "score": 0.05286872014403343}]

Hugging Faceパイプラインの推測署名¶

このセクションでは、必要な入力と期待される出力の説明と例を含め、サポートされるHugging Faceパイプラインの推測署名について説明します。すべての入出力はSnowpark DataFramesです。

フィル・マスク・パイプライン¶

タスクが「 fill-mask 」であるパイプラインには、以下の入出力があります。

入力¶

inputs：入力するマスクがある文字列。

例:

--------------------------------------------------
|"inputs"                                        |
--------------------------------------------------
|LynYuu is the [MASK] of the Grand Duchy of Yu.  |
--------------------------------------------------

出力¶

outputs: score、 token、 token_str、 sequence のようなキーを含む、結果オブジェクトのリストの JSON 表現を含む文字列。詳細については、 FillMaskPipeline をご参照ください。

例:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"score": 0.9066258072853088, "token": 3007, "token_str": "capital", "sequence": "lynyuu is the capital of the grand duchy of yu."}, {"score": 0.08162177354097366, "token": 2835, "token_str": "seat", "sequence": "lynyuu is the seat of the grand duchy of yu."}, {"score": 0.0012052370002493262, "token": 4075, "token_str": "headquarters", "sequence": "lynyuu is the headquarters of the grand duchy of yu."}, {"score": 0.0006560495239682496, "token": 2171, "token_str": "name", "sequence": "lynyuu is the name of the grand duchy of yu."}, {"score": 0.0005427763098850846, "token": 3200, "token_str"...  |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

コード例¶
import transformers
import pandas as pd

model = transformers.pipeline(
    task="fill-mask",
    model="google-bert/bert-base-uncased",
)

mv = registry.log_model(
    model=model,
    model_name="GOOGLE_BERT_BASE_UNCASED",
)

input_df = pd.DataFrame([{"text": "LynYuu is the [MASK] of the Grand Duchy of Yu."}])
mv.run(
    input_df,
    # function_name="__call__", # Optional
)

トークン分類¶

タスクが「ner」または token-classification のパイプラインは、以下の入力と出力を持ちます。

入力¶

inputs：分類されるトークンを含む文字列。

例:

------------------------------------------------
|"inputs"                                      |
------------------------------------------------
|My name is Izumi and I live in Tokyo, Japan.  |
------------------------------------------------

出力¶

outputs: entity、 score、 index、 word、 name、 start、 end のようなキーを含む、結果オブジェクトのリストの JSON 表現を含む文字列。詳細については、 TokenClassificationPipeline をご参照ください。

例:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"entity": "PRON", "score": 0.9994392991065979, "index": 1, "word": "my", "start": 0, "end": 2}, {"entity": "NOUN", "score": 0.9968984127044678, "index": 2, "word": "name", "start": 3, "end": 7}, {"entity": "AUX", "score": 0.9937735199928284, "index": 3, "word": "is", "start": 8, "end": 10}, {"entity": "PROPN", "score": 0.9928083419799805, "index": 4, "word": "i", "start": 11, "end": 12}, {"entity": "PROPN", "score": 0.997334361076355, "index": 5, "word": "##zumi", "start": 12, "end": 16}, {"entity": "CCONJ", "score": 0.999173104763031, "index": 6, "word": "and", "start": 17, "end": 20}, {...  |

コード例¶

import transformers
import pandas as pd

model = transformers.pipeline(
    task="token-classification",
    model="dslim/bert-base-NER",
)

mv = registry.log_model(
    model=model,
    model_name="BERT_BASE_NER",
)

mv.run(
    pd.DataFrame([{"inputs": "My name is Izumi and I live in Tokyo, Japan."}]),
    # function_name="__call__", # Optional
)

質問応答（単一出力）¶

タスクが「 question-answering 」であるパイプラインは、 top_k が未設定または1に設定されている場合、次の入力と出力を持ちます。

入力¶

question：回答が必要な質問を含む文字列。
context：回答を含む文字列。

例:

-----------------------------------------------------------------------------------
|"question"                  |"context"                                           |
-----------------------------------------------------------------------------------
|What did Doris want to do?  |Doris is a cheerful mermaid from the ocean dept...  |
-----------------------------------------------------------------------------------

出力¶

score：0.0～1.0までの浮動小数点信頼度スコア。
start：コンテキスト内の回答の最初のトークンの整数インデックス。
end：元のコンテクストにおける回答の最後のトークンの整数インデックス。
answer：見つかった回答を含む文字列。

例:

--------------------------------------------------------------------------------
|"score"           |"start"  |"end"  |"answer"                                 |
--------------------------------------------------------------------------------
|0.61094731092453  |139      |178    |learn more about the world of athletics  |
--------------------------------------------------------------------------------

コード例¶

import transformers
import pandas as pd


model = transformers.pipeline(
    task="question-answering",
    model="deepset/roberta-base-squad2",
)

QA_input = {
    "question": "Why is model conversion important?",
    "context": "The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.",
}

mv = registry.log_model(
    model=model,
    model_name="ROBERTA_BASE_SQUAD2",
)

mv.run(
    pd.DataFrame.from_records([QA_input]),
    # function_name="__call__", # Optional
)

質問応答（複数出力）¶

タスクが「 question-answering 」であるパイプラインは、 top_k が設定され、1より大きい場合、次の入出力があります。

入力¶

question：回答が必要な質問を含む文字列。
context：回答を含む文字列。

例:

-----------------------------------------------------------------------------------
|"question"                  |"context"                                           |
-----------------------------------------------------------------------------------
|What did Doris want to do?  |Doris is a cheerful mermaid from the ocean dept...  |
-----------------------------------------------------------------------------------

出力¶

outputs: score、 start、 end、 answer のようなキーを含む、結果オブジェクトのリストの JSON 表現を含む文字列。

例:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs"                                                                                                                                                                                                                                                                                                                                        |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"score": 0.61094731092453, "start": 139, "end": 178, "answer": "learn more about the world of athletics"}, {"score": 0.17750297486782074, "start": 139, "end": 180, "answer": "learn more about the world of athletics.\""}, {"score": 0.06438097357749939, "start": 138, "end": 178, "answer": "\"learn more about the world of athletics"}]  |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

コード例¶

import transformers
import pandas as pd


model = transformers.pipeline(
    task="question-answering",
    model="deepset/roberta-base-squad2",
    top_k=3,
)

QA_input = {
    "question": "Why is model conversion important?",
    "context": "The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.",
}

mv = registry.log_model(
    model=model,
    model_name="ROBERTA_BASE_SQUAD2",
)

mv.run(
    pd.DataFrame.from_records([QA_input]),
    # function_name="__call__", # Optional
)

要約¶

タスクが「要約」のパイプラインでは、 return_tensors がFalseまたは未設定であり、次の入出力があります。

入力¶

documents：要約するテキストを含む文字列。

例:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"documents"                                                                                                                                                                                               |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Neuro-sama is a chatbot styled after a female VTuber that hosts live streams on the Twitch channel "vedal987". Her speech and personality are generated by an artificial intelligence (AI) system  wh...  |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

出力¶

summary_text: 生成された要約を含む文字列、または num_return_sequences が1より大きい場合、結果のリストの JSON 表現を含む文字列であり、各辞書には summary_text を含むフィールドがあります。

例:

---------------------------------------------------------------------------------
|"summary_text"                                                                 |
---------------------------------------------------------------------------------
| Neuro-sama is a chatbot styled after a female VTuber that hosts live streams  |
---------------------------------------------------------------------------------

コード例¶

import transformers
import pandas as pd


model = transformers.pipeline(
    task="summarization",
    model="facebook/bart-large-cnn",
)

text = "The transformers library is a great library for natural language processing which provides a unified interface for many different models and tasks."

mv = registry.log_model(
    model=model,
    model_name="BART_LARGE_CNN",
)

mv.run(
    pd.DataFrame.from_records([{"documents": text}]),
    # function_name="__call__", # Optional
)

テーブル質問応答¶

タスクが「 table-question-answering 」であるパイプラインには、以下の入出力があります。

入力¶

query: 回答が必要な質問を含む文字列。
table: 回答を含む可能性のあるテーブルを表す {column -> [values]} 形式の JSON シリアル化ディクショナリを含む文字列。

例:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"query"                                  |"table"                                                                                                                                                                                                                                                   |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Which channel has the most subscribers?  |{"Channel": ["A.I.Channel", "Kaguya Luna", "Mirai Akari", "Siro"], "Subscribers": ["3,020,000", "872,000", "694,000", "660,000"], "Videos": ["1,200", "113", "639", "1,300"], "Created At": ["Jun 30 2016", "Dec 4 2017", "Feb 28 2014", "Jun 23 2017"]}  |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

出力¶

answer: 考えられる回答を含む文字列。
coordinates: 回答が見つかったセルの座標を表す整数のリスト。
cells: 回答が見つかったセルの内容を含む文字列のリスト。
aggregator：使用するアグリゲーターの名前を含む文字列。

例:

----------------------------------------------------------------
|"answer"     |"coordinates"  |"cells"          |"aggregator"  |
----------------------------------------------------------------
|A.I.Channel  |[              |[                |NONE          |
|             |  [            |  "A.I.Channel"  |              |
|             |    0,         |]                |              |
|             |    0          |                 |              |
|             |  ]            |                 |              |
|             |]              |                 |              |
----------------------------------------------------------------

コード例¶

import transformers
import pandas as pd
import json

model = transformers.pipeline(
    task="table-question-answering",
    model="microsoft/tapex-base-finetuned-wikisql",
)

data = {
    "year": [1896, 1900, 1904, 2004, 2008, 2012],
    "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"],
}
query = "What is the city of the year 2004?"


mv = registry.log_model(
    model=model,
    model_name="TAPEX_BASE_FINETUNED_WIKISQL",
)

mv.run(
    pd.DataFrame.from_records([{"query": query, "table": json.dumps(data)}]),
    # function_name="__call__", # Optional
)

テキスト分類（単一出力）¶

タスクが「 text-clasification 」または「sentiment-analysis」で、 top_k が設定されていないかNoneのパイプラインには、次の入出力があります。

入力¶

text：分類する文字列。
text_pair: text と一緒に分類する文字列で、テキストの類似度を計算するモデルで使用されます。モデルが使用しない場合は空のままにします。

例:

----------------------------------
|"text"       |"text_pair"       |
----------------------------------
|I like you.  |I love you, too.  |
----------------------------------

出力¶

label：テキストの分類ラベルを表す文字列。
score：0.0から1.0までの浮動小数点信頼度スコア。

例:

--------------------------------
|"label"  |"score"             |
--------------------------------
|LABEL_0  |0.9760091304779053  |
--------------------------------

コード例¶

import transformers
import pandas as pd

model = transformers.pipeline(
    task="text-classification",
    model="cardiffnlp/twitter-roberta-base-sentiment-latest",
)

text = "I'm happy today!"


mv = registry.log_model(
    model=model,
    model_name="TWITTER_ROBERTA_BASE_SENTIMENT_LATEST",
)

mv.run(
    pd.DataFrame.from_records([{"text": text}]),
    # function_name="__call__", # Optional
)

テキスト分類（複数出力）¶

タスクが「 text-clasification 」または「sentiment-analysis」で、 top_k が数値に設定されているパイプラインには、次の入出力があります。

注釈

テキスト分類タスクは、 top_k が任意の数に設定されている場合、その数が1であっても、複数出力とみなされます。単一出力を取得するには、 top_k の値をNoneにしてください。

入力¶

text：分類する文字列。
text_pair: text と一緒に分類する文字列で、テキストの類似度を計算するモデルで使用されます。モデルが使用しない場合は空のままにします。

例:

--------------------------------------------------------------------
|"text"                                              |"text_pair"  |
--------------------------------------------------------------------
|I am wondering if I should have udon or rice fo...  |             |
--------------------------------------------------------------------

出力¶

outputs: 結果のリストの JSON 表現を含む文字列で、各リストには、 label と score を含むフィールドがあります。

例:

--------------------------------------------------------
|"outputs"                                             |
--------------------------------------------------------
|[{"label": "NEGATIVE", "score": 0.9987024068832397}]  |
--------------------------------------------------------

コード例¶

import transformers
import pandas as pd

model = transformers.pipeline(
    task="text-classification",
    model="cardiffnlp/twitter-roberta-base-sentiment-latest",
    top_k=3,
)

text = "I'm happy today!"


mv = registry.log_model(
    model=model,
    model_name="TWITTER_ROBERTA_BASE_SENTIMENT_LATEST",
)

mv.run(
    pd.DataFrame.from_records([{"text": text}]),
    # function_name="__call__", # Optional
)

テキストからテキストへの生成¶

タスクが「 text2text generation 」であるパイプライン（return_tensors がFalseまたは未設定）には、次の入出力があります。

入力¶

inputs: プロンプトを含む文字列。

例:

--------------------------------------------------------------------------------
|"inputs"                                                                      |
--------------------------------------------------------------------------------
|A descendant of the Lost City of Atlantis, who swam to Earth while saying, "  |
--------------------------------------------------------------------------------

出力¶

generated_text: num_return_sequences が1の場合に生成されたテキストを含む文字列、またはnum_return_sequences が1より大きい場合の generated_text などのフィールドを含む結果ディクショナリの JSON リストの文字列表現。

例:

----------------------------------------------------------------
|"generated_text"                                              |
----------------------------------------------------------------
|, said that he was a descendant of the Lost City of Atlantis  |
----------------------------------------------------------------

コード例¶

import transformers
import pandas as pd

model = transformers.pipeline(
    task="text2text-generation",
    model="google-t5/t5-small",
)

text = "Tell me a joke."


mv = registry.log_model(
    model=model,
    model_name="T5_SMALL",
)

mv.run(
    pd.DataFrame.from_records([{"inputs": text}]),
    # function_name="__call__", # Optional
)

注釈

return_tensors がTrueのテキストからテキストの生成パイプラインはサポートされていません。

翻訳生成¶

タスクが「 translation 」であるパイプライン（return_tensors がFalseまたは未設定）には、以下の入出力があります。

注釈

return_tensors がTrueの翻訳生成パイプラインはサポートされていません。

入力¶

inputs：翻訳するテキストを含む文字列。

例:

------------------------------------------------------------------------------------------------------
|"inputs"                                                                                            |
------------------------------------------------------------------------------------------------------
|Snowflake's Data Cloud is powered by an advanced data platform provided as a self-managed service.  |
------------------------------------------------------------------------------------------------------

出力¶

translation_text: num_return_sequences が1の場合、生成された翻訳を表す文字列。または、 translation_text などのフィールドをそれぞれ含む、結果ディクショナリのリスト JSON の文字列表現。

例:

---------------------------------------------------------------------------------------------------------------------------------
|"translation_text"                                                                                                             |
---------------------------------------------------------------------------------------------------------------------------------
|Le Cloud de données de Snowflake est alimenté par une plate-forme de données avancée fournie sous forme de service autogérés.  |
---------------------------------------------------------------------------------------------------------------------------------

コード例¶

import transformers
import pandas as pd

model = transformers.pipeline(
    task="translation",
    model="deepvk/kazRush-kk-ru",
)

text = "Иттерді кім шығарды?"


mv = registry.log_model(
    model=model,
    model_name="KAZRUSH_KK_RU",
)

mv.run(
    pd.DataFrame.from_records([{"inputs": text}]),
    # function_name="__call__", # Optional
)

ゼロショット分類¶

タスクが「 zero-shot-classification 」であるパイプラインには、以下の入出力があります。

入力¶

sequences：分類するテキストを含む文字列。
candidate_labels：テキストに適用するラベルを含む文字列のリスト。

例:

-----------------------------------------------------------------------------------------
|"sequences"                                                       |"candidate_labels"  |
-----------------------------------------------------------------------------------------
|I have a problem with Snowflake that needs to be resolved asap!!  |[                   |
|                                                                  |  "urgent",         |
|                                                                  |  "not urgent"      |
|                                                                  |]                   |
|I have a problem with Snowflake that needs to be resolved asap!!  |[                   |
|                                                                  |  "English",        |
|                                                                  |  "Japanese"        |
|                                                                  |]                   |
-----------------------------------------------------------------------------------------

出力¶

sequence：入力文字列。
labels：適用されたラベルを表す文字列のリスト。
scores：各ラベルの浮動小数点信頼度スコアのリスト。

例:

--------------------------------------------------------------------------------------------------------------
|"sequence"                                                        |"labels"        |"scores"                |
--------------------------------------------------------------------------------------------------------------
|I have a problem with Snowflake that needs to be resolved asap!!  |[               |[                       |
|                                                                  |  "urgent",     |  0.9952737092971802,   |
|                                                                  |  "not urgent"  |  0.004726255778223276  |
|                                                                  |]               |]                       |
|I have a problem with Snowflake that needs to be resolved asap!!  |[               |[                       |
|                                                                  |  "Japanese",   |  0.5790848135948181,   |
|                                                                  |  "English"     |  0.42091524600982666   |
|                                                                  |]               |]                       |
--------------------------------------------------------------------------------------------------------------

テキスト生成¶

タスクが「テキスト生成」で、 return_tensors がFalseまたは未設定であるパイプラインには、以下の入出力があります。

注釈

return_tensors がTrueのテキスト生成パイプラインはサポートされていません。

入力¶

inputs: プロンプトを含む文字列。

例:

--------------------------------------------------------------------------------
|"inputs"                                                                      |
--------------------------------------------------------------------------------
|A descendant of the Lost City of Atlantis, who swam to Earth while saying, "  |
--------------------------------------------------------------------------------

出力¶

outputs: generated_text を含むフィールドを含む結果オブジェクトのリストを JSON で表した文字列。

例:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs"                                                                                                                                                                                                 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"generated_text": "A descendant of the Lost City of Atlantis, who swam to Earth while saying, \"For my life, I don't know if I'm gonna land upon Earth.\"\n\nIn \"The Misfits\", in a flashback, wh...  |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

コード例¶

import transformers
import pandas as pd

model = transformers.pipeline(
    task="text-generation",
    model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
)

mv = registry.log_model(
    model=model,
    model_name="TINYLLAMA",
)

text = "A descendant of the Lost City of Atlantis, who swam to Earth while saying,"
mv.run(
    pd.DataFrame.from_records([{"inputs": text}]),
    # function_name="__call__", # Optional
)

テキスト生成（OpenAI互換）¶

タスクが「テキスト生成」で、 return_tensors がFalseまたは未設定であるパイプラインには、以下の入出力があります。

モデルのログの間、 snowflake.ml.model.openai_signatures.OPENAI_CHAT_SIGNATURE 署名を提供することによって、モデルはOpenAI APIとの互換性を得ます。これにより、ユーザーは openai.client.ChatCompletion スタイルリクエストをモデルに渡せます。

注釈

return_tensors がTrueのテキスト生成パイプラインはサポートされていません。

入力¶

messages ：モデルに送信するメッセージを含むディクショナリのリスト。
max_completion_tokens:生成するトークンの最大数。
temperature:生成に使用する温度。
stop:生成に使用する停止シーケンス。
n:作成する生成の数。
stream:生成をストリーミングするかどうか。
top_p:生成に使用するtop p値。
frequency_penalty:生成に使用する頻度ペナルティ。
presence_penalty:生成に使用する存在ペナルティ。

例:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| messages                                                                                                                                                                                          |   max_completion_tokens |   temperature | stop   |   n | stream   |   top_p |   frequency_penalty |  presence_penalty |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| [{'role': 'system', 'content': 'Complete the sentence.'}, {'role': 'user', 'content': [{'type': 'text', 'text': 'A descendant of the Lost City of Atlantis, who swam to Earth while saying, '}]}] |                     250 |           0.9 |        |   3 | False    |       1 |                 0.1 |               0.2 |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

出力¶

outputs: generated_text を含むフィールドを含む結果オブジェクトのリストを JSON で表した文字列。

例:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id           | object          |     created | model                                      | choices                                                                                                                                      |  usage                                                               |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| chatcmpl-... | chat.completion | 1.76912e+09 | /shared/model/model/models/TINYLLAMA/model | [{'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'content': 'The descendant is not actually ...', 'role': 'assistant'}}] | {'completion_tokens': 399, 'prompt_tokens': 52, 'total_tokens': 451} |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

コード例¶

import transformers
import pandas as pd
from snowflake.ml.model import openai_signatures

model = transformers.pipeline(
    task="text-generation",
    model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
)


mv = registry.log_model(
    model=model,
    model_name="TINYLLAMA",
    signatures=openai_signatures.OPENAI_CHAT_SIGNATURE,
)

# create a pd.DataFrame with openai.client.chat.completion arguments
x_df = pd.DataFrame.from_records(
    [
        {
            "messages": [
                {
                    "role": "system",
                    "content": [
                        {
                            "type": "text",
                            "text": "Complete the sentence.",
                        }
                    ],
                },
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "A descendant of the Lost City of Atlantis, who swam to Earth while saying, ",
                        }
                    ],
                },
            ],
            "max_completion_tokens": 250,
            "temperature": 0.9,
            "stop": None,
            "n": 3,
            "stream": False,
            "top_p": 1.0,
            "frequency_penalty": 0.1,
            "presence_penalty": 0.2,
        }
    ],
)

# OpenAI Chat Completion compatible output
output_df = mv.run(X=x_df)