You are viewing documentation about an older version (1.0.9). View latest version

snowflake.ml.modeling.compose.ColumnTransformer

class snowflake.ml.modeling.compose.ColumnTransformer(*, transformers, remainder='drop', sparse_threshold=0.3, n_jobs=None, transformer_weights=None, verbose=False, verbose_feature_names_out=True, input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, label_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False, sample_weight_col: Optional[str] = None)

Bases: BaseTransformer

Applies transformers to columns of an array or pandas DataFrame For more details on this class, see sklearn.compose.ColumnTransformer

transformers: list of tuples

List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data.

name: str

Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using set_params and searched in grid search.

transformer: {‘drop’, ‘passthrough’} or estimator

Estimator must support fit and transform. Special-cased strings ‘drop’ and ‘passthrough’ are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively.

columns: str, array-like of str, int, array-like of int, array-like of bool, slice or callable

Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where transformer expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data X and can return any of the above. To select multiple columns by name or dtype, you can use make_column_selector.

remainder: {‘drop’, ‘passthrough’} or estimator, default=’drop’

By default, only the specified columns in transformers are transformed and combined in the output, and the non-specified columns are dropped. (default of 'drop'). By specifying remainder='passthrough', all remaining columns that were not specified in transformers, but present in the data passed to fit will be automatically passed through. This subset of columns is concatenated with the output of the transformers. For dataframes, extra columns not seen during fit will be excluded from the output of transform. By setting remainder to be an estimator, the remaining non-specified columns will use the remainder estimator. The estimator must support fit and transform. Note that using this feature requires that the DataFrame columns input at fit and transform have identical order.

sparse_threshold: float, default=0.3

If the output of the different transformers contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use sparse_threshold=0 to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.

n_jobs: int, default=None

Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

transformer_weights: dict, default=None

Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.

verbose: bool, default=False

If True, the time elapsed while fitting each transformer will be printed as it is completed.

verbose_feature_names_out: bool, default=True

If True, get_feature_names_out() will prefix all feature names with the name of the transformer that generated that feature. If False, get_feature_names_out() will not prefix any feature names and will error if feature names are not unique.

input_cols: Optional[Union[str, List[str]]]

A string or list of strings representing column names that contain features. If this parameter is not specified, all columns in the input DataFrame except the columns specified by label_cols and sample-weight_col parameters are considered input columns.

label_cols: Optional[Union[str, List[str]]]

A string or list of strings representing column names that contain labels. This is a required param for estimators, as there is no way to infer these columns. If this parameter is not specified, then object is fitted without labels(Like a transformer).

output_cols: Optional[Union[str, List[str]]]

A string or list of strings representing column names that will store the output of predict and transform operations. The length of output_cols mus match the expected number of output columns from the specific estimator or transformer class used. If this parameter is not specified, output column names are derived by adding an OUTPUT_ prefix to the label column names. These inferred output column names work for estimator’s predict() method, but output_cols must be set explicitly for transformers.

sample_weight_col: Optional[str]

A string representing the column name containing the examples’ weights. This argument is only required when working with weighted datasets.

drop_input_cols: Optional[bool], default=False

If set, the response of predict(), transform() methods will not contain input columns.

Methods

fit(dataset)

Fit all transformers using X For more details on this function, see sklearn.compose.ColumnTransformer.fit

score(dataset)

Method not supported for this class.

set_input_cols(input_cols)

Input columns setter.

to_sklearn()

Get sklearn.compose.ColumnTransformer object.

transform(dataset)

Transform X separately by each transformer, concatenate results For more details on this function, see sklearn.compose.ColumnTransformer.transform

Attributes

model_signatures

Returns model signature of current class.