snowflake.ml.modeling.pipeline.Pipeline¶

class snowflake.ml.modeling.pipeline.Pipeline(steps: List[Tuple[str, Any]])¶

Bases: BaseTransformer

Pipeline of transforms.

Sequentially apply a list of transforms. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final step can be a transform or estimator, that is, it must implement fit and transform/predict methods. TODO: SKLearn pipeline expects last step(and only the last step) to be an estimator obj or a dummy

estimator(like None or passthrough). Currently this Pipeline class works with a list of all transforms or a list of transforms ending with an estimator. Should we change this implementation to only work with list of steps ending with an estimator or a dummy estimator like SKLearn?

Parameters:

steps – List of (name, transform) tuples (implementing fit/transform) that are chained in sequential order. The last transform can be an estimator.

Methods

fit(dataset: Union[DataFrame, DataFrame], squash: Optional[bool] = False) → Pipeline¶

Fit the entire pipeline using the dataset.

Parameters:
  • dataset – Input dataset.

  • squash – Run the whole pipeline within a stored procedure

Returns:

Fitted pipeline.

Raises:

ValueError – A pipeline incompatible with sklearn is used on MLRS

fit_predict(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Fits all the transformer objs one after another and transforms the data. Then fits and predicts using the estimator. This will only be available if the estimator (or final step) has fit_predict or predict methods.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

fit_transform(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Fits all the transformer objs one after another and transforms the data. Then fits and transforms data using the estimator. This will only be available if the estimator (or final step) has fit_transform or transform methods.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

get_input_cols() → List[str]¶

Input columns getter.

Returns:

Input columns.

get_label_cols() → List[str]¶

Label column getter.

Returns:

Label column(s).

get_output_cols() → List[str]¶

Output columns getter.

Returns:

Output columns.

get_params(deep: bool = True) → Dict[str, Any]¶

Get parameters for this transformer.

Parameters:

deep – If True, will return the parameters for this transformer and contained subobjects that are transformers.

Returns:

Parameter names mapped to their values.

get_passthrough_cols() → List[str]¶

Passthrough columns getter.

Returns:

Passthrough column(s).

get_sample_weight_col() → Optional[str]¶

Sample weight column getter.

Returns:

Sample weight column.

get_sklearn_args(default_sklearn_obj: Optional[object] = None, sklearn_initial_keywords: Optional[Union[str, Iterable[str]]] = None, sklearn_unused_keywords: Optional[Union[str, Iterable[str]]] = None, snowml_only_keywords: Optional[Union[str, Iterable[str]]] = None, sklearn_added_keyword_to_version_dict: Optional[Dict[str, str]] = None, sklearn_added_kwarg_value_to_version_dict: Optional[Dict[str, Dict[str, str]]] = None, sklearn_deprecated_keyword_to_version_dict: Optional[Dict[str, str]] = None, sklearn_removed_keyword_to_version_dict: Optional[Dict[str, str]] = None) → Dict[str, Any]¶

Get sklearn keyword arguments.

This method enables modifying object parameters for special cases.

Parameters:
  • default_sklearn_obj – Sklearn object used to get default parameter values. Necessary when sklearn_added_keyword_to_version_dict is provided.

  • sklearn_initial_keywords – Initial keywords in sklearn.

  • sklearn_unused_keywords – Sklearn keywords that are unused in snowml.

  • snowml_only_keywords – snowml only keywords not present in sklearn.

  • sklearn_added_keyword_to_version_dict – Added keywords mapped to the sklearn versions in which they were added.

  • sklearn_added_kwarg_value_to_version_dict – Added keyword argument values mapped to the sklearn versions in which they were added.

  • sklearn_deprecated_keyword_to_version_dict – Deprecated keywords mapped to the sklearn versions in which they were deprecated.

  • sklearn_removed_keyword_to_version_dict – Removed keywords mapped to the sklearn versions in which they were removed.

Returns:

Sklearn parameter names mapped to their values.

predict(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and predict using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

Raises:

ValueError – An sklearn object has not been fit and stored before calling this function.

predict_log_proba(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and apply predict_log_proba using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

Raises:

ValueError – An sklearn object has not been fit before calling this function

predict_proba(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and apply predict_proba using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

Raises:

ValueError – An sklearn object has not been fit before calling this function

score(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and apply score using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

Raises:

ValueError – An sklearn object has not been fit before calling this function

score_samples(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and predict using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

Raises:

ValueError – An sklearn object has not been fit before calling this function

set_drop_input_cols(drop_input_cols: Optional[bool] = False) → None¶
set_input_cols(input_cols: Optional[Union[str, Iterable[str]]]) → Base¶

Input columns setter.

Parameters:

input_cols – A single input column or multiple input columns.

Returns:

self

set_label_cols(label_cols: Optional[Union[str, Iterable[str]]]) → Base¶

Label column setter.

Parameters:

label_cols – A single label column or multiple label columns if multi task learning.

Returns:

self

set_output_cols(output_cols: Optional[Union[str, Iterable[str]]]) → Base¶

Output columns setter.

Parameters:

output_cols – A single output column or multiple output columns.

Returns:

self

set_params(**params: Dict[str, Any]) → None¶

Set the parameters of this transformer.

The method works on simple transformers as well as on nested objects. The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params – Transformer parameter names mapped to their values.

Raises:

SnowflakeMLException – Invalid parameter keys.

set_passthrough_cols(passthrough_cols: Optional[Union[str, Iterable[str]]]) → Base¶

Passthrough columns setter.

Parameters:

passthrough_cols – Column(s) that should not be used or modified by the estimator/transformer. Estimator/Transformer just passthrough these columns without any modifications.

Returns:

self

set_sample_weight_col(sample_weight_col: Optional[str]) → Base¶

Sample weight column setter.

Parameters:

sample_weight_col – A single column that represents sample weight.

Returns:

self

to_lightgbm() → Any¶
to_sklearn() → Pipeline¶

Returns an sklearn Pipeline representing the object, if possible.

Returns:

previously fit sklearn Pipeline if present, else an unfit pipeline

Raises:

ValueError – The pipeline cannot be represented as an sklearn pipeline.

to_xgboost() → Any¶
transform(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Call transform of each transformer in the pipeline.

Parameters:

dataset – Input dataset.

Returns:

Transformed data. Output datatype will be same as input datatype.

Attributes

model_signatures¶