Inferred signatures for Hugging Face pipelines

The Snowflake Model Registry automatically infers the signatures of Hugging Face pipelines that contain a single task from the following list:

  • conversational

  • fill-mask

  • question-answering

  • summarization

  • table-question-answering

  • text2text-generation

  • text-classification (alias sentiment-analysis)

  • text-generation

  • token-classification (alias ner)

  • translation

  • translation_xx_to_yy

  • zero-shot-classification

This topic describes the signatures of these types of Hugging Face pipelines, including a description and example of the required inputs and expected outputs. All inputs and outputs are Snowpark DataFrames.

For general guidance about logging Hugging Face pipelines in the registry, see Hugging Face pipeline.

Conversational pipeline

A pipeline whose task is conversational has the following inputs and outputs.

Inputs

  • user_inputs: A list of strings that represent the user’s previous and current inputs. The last one in the list is the current input.

  • generated_responses: A list of strings that represent the model’s previous responses.

Example:

---------------------------------------------------------------------------
|"user_inputs"                                    |"generated_responses"  |
---------------------------------------------------------------------------
|[                                                |[                      |
|  "Do you speak French?",                        |  "Yes I do."          |
|  "Do you know how to say Snowflake in French?"  |]                      |
|]                                                |                       |
---------------------------------------------------------------------------

Outputs

  • generated_responses: A list of strings that represent the model’s previous and current responses. The last one in the list is the current response.

Example:

-------------------------
|"generated_responses"  |
-------------------------
|[                      |
|  "Yes I do.",         |
|  "I speak French."    |
|]                      |
-------------------------

Fill-mask pipeline

A pipeline whose task is “fill-mask” has the following inputs and outputs.

Inputs

  • inputs: A string where there is a mask to fill.

Example:

--------------------------------------------------
|"inputs"                                        |
--------------------------------------------------
|LynYuu is the [MASK] of the Grand Duchy of Yu.  |
--------------------------------------------------

Outputs

  • outputs: A string that contains a JSON representation of a list of objects, each of which may contain keys such as score, token, token_str, or sequence. For details, see FillMaskPipeline.

Example:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"score": 0.9066258072853088, "token": 3007, "token_str": "capital", "sequence": "lynyuu is the capital of the grand duchy of yu."}, {"score": 0.08162177354097366, "token": 2835, "token_str": "seat", "sequence": "lynyuu is the seat of the grand duchy of yu."}, {"score": 0.0012052370002493262, "token": 4075, "token_str": "headquarters", "sequence": "lynyuu is the headquarters of the grand duchy of yu."}, {"score": 0.0006560495239682496, "token": 2171, "token_str": "name", "sequence": "lynyuu is the name of the grand duchy of yu."}, {"score": 0.0005427763098850846, "token": 3200, "token_str"...  |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Token classification

A pipeline whose task is “ner” or “token-classification” has the following inputs and outputs.

Inputs

  • inputs: A string that contains the tokens to be classified.

Example:

------------------------------------------------
|"inputs"                                      |
------------------------------------------------
|My name is Izumi and I live in Tokyo, Japan.  |
------------------------------------------------

Outputs

  • outputs: A string that contains a JSON representation of a list of result objects, each of which may contain keys such as entity, score, index, word, name, start, or end. For details, see TokenClassificationPipeline.

Example:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"entity": "PRON", "score": 0.9994392991065979, "index": 1, "word": "my", "start": 0, "end": 2}, {"entity": "NOUN", "score": 0.9968984127044678, "index": 2, "word": "name", "start": 3, "end": 7}, {"entity": "AUX", "score": 0.9937735199928284, "index": 3, "word": "is", "start": 8, "end": 10}, {"entity": "PROPN", "score": 0.9928083419799805, "index": 4, "word": "i", "start": 11, "end": 12}, {"entity": "PROPN", "score": 0.997334361076355, "index": 5, "word": "##zumi", "start": 12, "end": 16}, {"entity": "CCONJ", "score": 0.999173104763031, "index": 6, "word": "and", "start": 17, "end": 20}, {...  |

Question answering (single output)

A pipeline whose task is “question-answering”, where top_k is either unset or set to 1, has the following inputs and outputs.

Inputs

  • question: A string that contains the question to answer.

  • context: A string that may contain the answer.

Example:

-----------------------------------------------------------------------------------
|"question"                  |"context"                                           |
-----------------------------------------------------------------------------------
|What did Doris want to do?  |Doris is a cheerful mermaid from the ocean dept...  |
-----------------------------------------------------------------------------------

Outputs

  • score: Floating-point confidence score from 0.0 to 1.0.

  • start: Integer index of the first token of the answer in the context.

  • end: Integer index of the last token of the answer in the original context.

  • answer: A string that contains the found answer.

Example:

--------------------------------------------------------------------------------
|"score"           |"start"  |"end"  |"answer"                                 |
--------------------------------------------------------------------------------
|0.61094731092453  |139      |178    |learn more about the world of athletics  |
--------------------------------------------------------------------------------

Question answering (multiple outputs)

A pipeline whose task is “question-answering”, where top_k is set and is larger than 1, has the following inputs and outputs.

Inputs

  • question: A string that contains the question to answer.

  • context: A string that may contain the answer.

Example:

-----------------------------------------------------------------------------------
|"question"                  |"context"                                           |
-----------------------------------------------------------------------------------
|What did Doris want to do?  |Doris is a cheerful mermaid from the ocean dept...  |
-----------------------------------------------------------------------------------

Outputs

  • outputs: A string that contains a JSON representation of a list of result objects, each of which may contain keys such as score, start, end, or answer.

Example:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs"                                                                                                                                                                                                                                                                                                                                        |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"score": 0.61094731092453, "start": 139, "end": 178, "answer": "learn more about the world of athletics"}, {"score": 0.17750297486782074, "start": 139, "end": 180, "answer": "learn more about the world of athletics.\""}, {"score": 0.06438097357749939, "start": 138, "end": 178, "answer": "\"learn more about the world of athletics"}]  |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Summarization

A pipeline whose task is “summarization”, where return_tensors is False or unset, has the following inputs and outputs.

Inputs

  • documents: A string that contains text to summarize.

Example:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"documents"                                                                                                                                                                                               |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Neuro-sama is a chatbot styled after a female VTuber that hosts live streams on the Twitch channel "vedal987". Her speech and personality are generated by an artificial intelligence (AI) system  wh...  |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Outputs

  • summary_text: A string that contains the generated summary, or, if num_return_sequences is greater than 1, a string that contains a JSON representation of a list of results, each of which is a dictionary that contains fields, including summary_text.

Example:

---------------------------------------------------------------------------------
|"summary_text"                                                                 |
---------------------------------------------------------------------------------
| Neuro-sama is a chatbot styled after a female VTuber that hosts live streams  |
---------------------------------------------------------------------------------

Table question answering

A pipeline whose task is “table-question-answering” has the following inputs and outputs.

Inputs

  • query: A string that contains the question to be answered.

  • table: A string that contains a JSON-serialized dictionary in the form {column -> [values]} representing the table that may contain an answer.

Example:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"query"                                  |"table"                                                                                                                                                                                                                                                   |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Which channel has the most subscribers?  |{"Channel": ["A.I.Channel", "Kaguya Luna", "Mirai Akari", "Siro"], "Subscribers": ["3,020,000", "872,000", "694,000", "660,000"], "Videos": ["1,200", "113", "639", "1,300"], "Created At": ["Jun 30 2016", "Dec 4 2017", "Feb 28 2014", "Jun 23 2017"]}  |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Outputs

  • answer: A string that contains a possible answer.

  • coordinates: A list of integers that represent the coordinates of the cells where the answer was located.

  • cells: A list of strings that contain the content of the cells where the answer was located.

  • aggregator: A string that contains the name of the aggregator used.

Example:

----------------------------------------------------------------
|"answer"     |"coordinates"  |"cells"          |"aggregator"  |
----------------------------------------------------------------
|A.I.Channel  |[              |[                |NONE          |
|             |  [            |  "A.I.Channel"  |              |
|             |    0,         |]                |              |
|             |    0          |                 |              |
|             |  ]            |                 |              |
|             |]              |                 |              |
----------------------------------------------------------------

Text classification (single output)

A pipeline whose task is “text-classification” or “sentiment-analysis”, where top_k is not set or is None, has the following inputs and outputs.

Inputs

  • text: A string to classify.

  • text_pair: A string to classify along with text, and which is used with models that compute text similarity. Leave empty if the model does not use it.

Example:

----------------------------------
|"text"       |"text_pair"       |
----------------------------------
|I like you.  |I love you, too.  |
----------------------------------

Outputs

  • label: A string that represents the classification label of the text.

  • score: A floating-point confidence score from 0.0 to 1.0.

Example:

--------------------------------
|"label"  |"score"             |
--------------------------------
|LABEL_0  |0.9760091304779053  |
--------------------------------

Text classification (multiple output)

A pipeline whose task is “text-classification” or “sentiment-analysis”, where top_k is set to a number, has the following inputs and outputs.

Note

A text classification task is considered multiple-output if top_k is set to any number, even if that number is 1. To get a single output, use a top_k value of None.

Inputs

  • text: A string to classify.

  • text_pair: A string to classify along with text, which is used with models that compute text similarity. Leave empty if the model does not use it.

Example:

--------------------------------------------------------------------
|"text"                                              |"text_pair"  |
--------------------------------------------------------------------
|I am wondering if I should have udon or rice fo...  |             |
--------------------------------------------------------------------

Outputs

  • outputs: A string that contains a JSON representation of a list of results, each of which contains fields that include label and score.

Example:

--------------------------------------------------------
|"outputs"                                             |
--------------------------------------------------------
|[{"label": "NEGATIVE", "score": 0.9987024068832397}]  |
--------------------------------------------------------

Text generation

A pipeline whose task is “text-generation”, where return_tensors is False or unset, has the following inputs and outputs.

Note

Text generation pipelines where return_tensors is True are not supported.

Inputs

  • inputs: A string that contains a prompt.

Example:

--------------------------------------------------------------------------------
|"inputs"                                                                      |
--------------------------------------------------------------------------------
|A descendant of the Lost City of Atlantis, who swam to Earth while saying, "  |
--------------------------------------------------------------------------------

Outputs

  • outputs: A string that contains a JSON representation of a list of result objects, each of which contains fields that include generated_text.

Example:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs"                                                                                                                                                                                                 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"generated_text": "A descendant of the Lost City of Atlantis, who swam to Earth while saying, \"For my life, I don't know if I'm gonna land upon Earth.\"\n\nIn \"The Misfits\", in a flashback, wh...  |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Text-to-text generation

A pipeline whose task is “text2text-generation”, where return_tensors is False or unset, has the following inputs and outputs.

Note

Text-to-text generation pipelines where return_tensors is True are not supported.

Inputs

  • inputs: A string that contains a prompt.

Example:

--------------------------------------------------------------------------------
|"inputs"                                                                      |
--------------------------------------------------------------------------------
|A descendant of the Lost City of Atlantis, who swam to Earth while saying, "  |
--------------------------------------------------------------------------------

Outputs

  • generated_text : A string that contains the generated text if num_return_sequences is 1, or if num_return_sequences is greater than 1, a string representation of a JSON list of result dictionaries that contain fields including generated_text .

Example:

----------------------------------------------------------------
|"generated_text"                                              |
----------------------------------------------------------------
|, said that he was a descendant of the Lost City of Atlantis  |
----------------------------------------------------------------

Translation generation

A pipeline whose task is “translation”, where return_tensors is False or unset, has the following inputs and outputs.

Note

Translation generation pipelines where return_tensors is True are not supported.

Inputs

  • inputs: A string that contains text to translate.

Example:

------------------------------------------------------------------------------------------------------
|"inputs"                                                                                            |
------------------------------------------------------------------------------------------------------
|Snowflake's Data Cloud is powered by an advanced data platform provided as a self-managed service.  |
------------------------------------------------------------------------------------------------------

Outputs

  • translation_text: A string that represents generated translation if num_return_sequences is 1, or a string representation of a JSON list of result dictionaries, each containing fields that include translation_text.

Example:

---------------------------------------------------------------------------------------------------------------------------------
|"translation_text"                                                                                                             |
---------------------------------------------------------------------------------------------------------------------------------
|Le Cloud de données de Snowflake est alimenté par une plate-forme de données avancée fournie sous forme de service autogérés.  |
---------------------------------------------------------------------------------------------------------------------------------

Zero-shot classification

A pipeline whose task is “zero-shot-classification” has the following inputs and outputs.

Inputs

  • sequences: A string that contains the text to be classified.

  • candidate_labels: A list of strings that contain the labels to be applied to the text.

Example:

-----------------------------------------------------------------------------------------
|"sequences"                                                       |"candidate_labels"  |
-----------------------------------------------------------------------------------------
|I have a problem with Snowflake that needs to be resolved asap!!  |[                   |
|                                                                  |  "urgent",         |
|                                                                  |  "not urgent"      |
|                                                                  |]                   |
|I have a problem with Snowflake that needs to be resolved asap!!  |[                   |
|                                                                  |  "English",        |
|                                                                  |  "Japanese"        |
|                                                                  |]                   |
-----------------------------------------------------------------------------------------

Outputs

  • sequence: The input string.

  • labels: A list of strings that represent the labels that were applied.

  • scores: A list of floating-point confidence scores for each label.

Example:

--------------------------------------------------------------------------------------------------------------
|"sequence"                                                        |"labels"        |"scores"                |
--------------------------------------------------------------------------------------------------------------
|I have a problem with Snowflake that needs to be resolved asap!!  |[               |[                       |
|                                                                  |  "urgent",     |  0.9952737092971802,   |
|                                                                  |  "not urgent"  |  0.004726255778223276  |
|                                                                  |]               |]                       |
|I have a problem with Snowflake that needs to be resolved asap!!  |[               |[                       |
|                                                                  |  "Japanese",   |  0.5790848135948181,   |
|                                                                  |  "English"     |  0.42091524600982666   |
|                                                                  |]               |]                       |
--------------------------------------------------------------------------------------------------------------