You are viewing documentation about an older version (1.7.0). View latest version

snowflake.ml.modeling.pipeline.Pipeline¶

class snowflake.ml.modeling.pipeline.Pipeline(steps: List[Tuple[str, Any]])¶

Bases: BaseTransformer

Pipeline of transforms.

Sequentially apply a list of transforms. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final step can be a transform or estimator, that is, it must implement fit and transform/predict methods.

Parameters:

steps – List of (name, transform) tuples (implementing fit/transform) that are chained in sequential order. The last transform can be an estimator.

Methods

fit(dataset: Union[DataFrame, DataFrame], squash: Optional[bool] = False) → Pipeline¶

Fit the entire pipeline using the dataset.

Parameters:
  • dataset – Input dataset.

  • squash – Run the whole pipeline within a stored procedure

Returns:

Fitted pipeline.

fit_predict(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Fits all the transformer objs one after another and transforms the data. Then fits and predicts using the estimator. This will only be available if the estimator (or final step) has fit_predict or predict methods.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

fit_transform(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Fits all the transformer objs one after another and transforms the data. Then fits and transforms data using the estimator. This will only be available if the estimator (or final step) has fit_transform or transform methods.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

get_input_cols() → List[str]¶

Input columns getter.

Returns:

Input columns.

get_label_cols() → List[str]¶

Label column getter.

Returns:

Label column(s).

get_output_cols() → List[str]¶

Output columns getter.

Returns:

Output columns.

get_params(deep: bool = True) → Dict[str, Any]¶

Get the snowflake-ml parameters for this transformer.

Parameters:

deep – If True, will return the parameters for this transformer and contained subobjects that are transformers.

Returns:

Parameter names mapped to their values.

get_passthrough_cols() → List[str]¶

Passthrough columns getter.

Returns:

Passthrough column(s).

get_sample_weight_col() → Optional[str]¶

Sample weight column getter.

Returns:

Sample weight column.

get_sklearn_args(default_sklearn_obj: Optional[object] = None, sklearn_initial_keywords: Optional[Union[str, Iterable[str]]] = None, sklearn_unused_keywords: Optional[Union[str, Iterable[str]]] = None, snowml_only_keywords: Optional[Union[str, Iterable[str]]] = None, sklearn_added_keyword_to_version_dict: Optional[Dict[str, str]] = None, sklearn_added_kwarg_value_to_version_dict: Optional[Dict[str, Dict[str, str]]] = None, sklearn_deprecated_keyword_to_version_dict: Optional[Dict[str, str]] = None, sklearn_removed_keyword_to_version_dict: Optional[Dict[str, str]] = None) → Dict[str, Any]¶

Get sklearn keyword arguments.

This method enables modifying object parameters for special cases.

Parameters:
  • default_sklearn_obj – Sklearn object used to get default parameter values. Necessary when sklearn_added_keyword_to_version_dict is provided.

  • sklearn_initial_keywords – Initial keywords in sklearn.

  • sklearn_unused_keywords – Sklearn keywords that are unused in snowml.

  • snowml_only_keywords – snowml only keywords not present in sklearn.

  • sklearn_added_keyword_to_version_dict – Added keywords mapped to the sklearn versions in which they were added.

  • sklearn_added_kwarg_value_to_version_dict – Added keyword argument values mapped to the sklearn versions in which they were added.

  • sklearn_deprecated_keyword_to_version_dict – Deprecated keywords mapped to the sklearn versions in which they were deprecated.

  • sklearn_removed_keyword_to_version_dict – Removed keywords mapped to the sklearn versions in which they were removed.

Returns:

Sklearn parameter names mapped to their values.

predict(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and predict using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

predict_log_proba(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and apply predict_log_proba using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

Raises:

ValueError – An sklearn object has not been fit before calling this function

predict_proba(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and apply predict_proba using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

Raises:

ValueError – An sklearn object has not been fit before calling this function

score(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and apply score using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

Raises:

ValueError – An sklearn object has not been fit before calling this function

score_samples(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Transform the dataset by applying all the transformers in order and predict using the estimator.

Parameters:

dataset – Input dataset.

Returns:

Output dataset.

Raises:

ValueError – An sklearn object has not been fit before calling this function

set_drop_input_cols(drop_input_cols: Optional[bool] = False) → None¶
set_input_cols(input_cols: Optional[Union[str, Iterable[str]]]) → Base¶

Input columns setter.

Parameters:

input_cols – A single input column or multiple input columns.

Returns:

self

set_label_cols(label_cols: Optional[Union[str, Iterable[str]]]) → Base¶

Label column setter.

Parameters:

label_cols – A single label column or multiple label columns if multi task learning.

Returns:

self

set_output_cols(output_cols: Optional[Union[str, Iterable[str]]]) → Base¶

Output columns setter.

Parameters:

output_cols – A single output column or multiple output columns.

Returns:

self

set_params(**params: Any) → None¶

Set the parameters of this transformer.

The method works on simple transformers as well as on sklearn compatible pipelines with nested objects, once the transformer has been fit. Nested objects have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params – Transformer parameter names mapped to their values.

Raises:

SnowflakeMLException – Invalid parameter keys.

set_passthrough_cols(passthrough_cols: Optional[Union[str, Iterable[str]]]) → Base¶

Passthrough columns setter.

Parameters:

passthrough_cols – Column(s) that should not be used or modified by the estimator/transformer. Estimator/Transformer just passthrough these columns without any modifications.

Returns:

self

set_sample_weight_col(sample_weight_col: Optional[str]) → Base¶

Sample weight column setter.

Parameters:

sample_weight_col – A single column that represents sample weight.

Returns:

self

to_lightgbm() → Any¶
to_sklearn() → Pipeline¶

Returns an sklearn Pipeline representing the object, if possible.

Returns:

previously fit sklearn Pipeline if present, else an unfit pipeline

Raises:

ValueError – The pipeline cannot be represented as an sklearn pipeline.

to_xgboost() → Any¶
transform(dataset: Union[DataFrame, DataFrame]) → Union[DataFrame, DataFrame]¶

Call transform of each transformer in the pipeline.

Parameters:

dataset – Input dataset.

Returns:

Transformed data. Output datatype will be same as input datatype.

Attributes

model_signatures¶