snowflake.ml.modeling.compose.ColumnTransformer¶
- class snowflake.ml.modeling.compose.ColumnTransformer(*, transformers, remainder='drop', sparse_threshold=0.3, n_jobs=None, transformer_weights=None, verbose=False, verbose_feature_names_out=True, input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, label_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False, sample_weight_col: Optional[str] = None)¶
Bases:
BaseTransformer
Applies transformers to columns of an array or pandas DataFrame For more details on this class, see sklearn.compose.ColumnTransformer
- transformers: list of tuples
List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data.
- name: str
Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using
set_params
and searched in grid search.- transformer: {‘drop’, ‘passthrough’} or estimator
Estimator must support fit and transform. Special-cased strings ‘drop’ and ‘passthrough’ are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively.
- columns: str, array-like of str, int, array-like of int, array-like of bool, slice or callable
Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where
transformer
expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data X and can return any of the above. To select multiple columns by name or dtype, you can usemake_column_selector
.
- remainder: {‘drop’, ‘passthrough’} or estimator, default=’drop’
By default, only the specified columns in transformers are transformed and combined in the output, and the non-specified columns are dropped. (default of
'drop'
). By specifyingremainder='passthrough'
, all remaining columns that were not specified in transformers, but present in the data passed to fit will be automatically passed through. This subset of columns is concatenated with the output of the transformers. For dataframes, extra columns not seen during fit will be excluded from the output of transform. By settingremainder
to be an estimator, the remaining non-specified columns will use theremainder
estimator. The estimator must support fit and transform. Note that using this feature requires that the DataFrame columns input at fit and transform have identical order.- sparse_threshold: float, default=0.3
If the output of the different transformers contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use
sparse_threshold=0
to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.- n_jobs: int, default=None
Number of jobs to run in parallel.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details.- transformer_weights: dict, default=None
Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.
- verbose: bool, default=False
If True, the time elapsed while fitting each transformer will be printed as it is completed.
- verbose_feature_names_out: bool, default=True
If True,
get_feature_names_out()
will prefix all feature names with the name of the transformer that generated that feature. If False,get_feature_names_out()
will not prefix any feature names and will error if feature names are not unique.- input_cols: Optional[Union[str, List[str]]]
A string or list of strings representing column names that contain features. If this parameter is not specified, all columns in the input DataFrame except the columns specified by label_cols and sample-weight_col parameters are considered input columns.
- label_cols: Optional[Union[str, List[str]]]
A string or list of strings representing column names that contain labels. This is a required param for estimators, as there is no way to infer these columns. If this parameter is not specified, then object is fitted without labels(Like a transformer).
- output_cols: Optional[Union[str, List[str]]]
A string or list of strings representing column names that will store the output of predict and transform operations. The length of output_cols mus match the expected number of output columns from the specific estimator or transformer class used. If this parameter is not specified, output column names are derived by adding an OUTPUT_ prefix to the label column names. These inferred output column names work for estimator’s predict() method, but output_cols must be set explicitly for transformers.
- sample_weight_col: Optional[str]
A string representing the column name containing the examples’ weights. This argument is only required when working with weighted datasets.
- drop_input_cols: Optional[bool], default=False
If set, the response of predict(), transform() methods will not contain input columns.
Methods
fit
(dataset)Fit all transformers using X For more details on this function, see sklearn.compose.ColumnTransformer.fit
score
(dataset)Method not supported for this class.
set_input_cols
(input_cols)Input columns setter.
to_sklearn
()Get sklearn.compose.ColumnTransformer object.
transform
(dataset)Transform X separately by each transformer, concatenate results For more details on this function, see sklearn.compose.ColumnTransformer.transform
Attributes
model_signatures
Returns model signature of current class.