You are viewing documentation about an older version (1.0.9). View latest version

snowflake.ml.modeling.preprocessing.RobustScalerΒΆ

class snowflake.ml.modeling.preprocessing.RobustScaler(*, with_centering: bool = True, with_scaling: bool = True, quantile_range: Tuple[float, float] = (25.0, 75.0), unit_variance: bool = False, input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False)ΒΆ

Bases: BaseTransformer

Scales features using statistics that are robust to outliers. Values must be of float type.

For more details on what this transformer does, see sklearn.preprocessing.RobustScaler.

Args:

with_centering: If True, center the data around zero before scaling. with_scaling: If True, scale the data to interquartile range. quantile_range: tuple like (q_min, q_max), where 0.0 < q_min < q_max < 100.0, default=(25.0, 75.0). Quantile

range used to calculate scale_. By default, this is equal to the IQR, i.e., q_min is the first quantile and q_max is the third quantile.

unit_variance: If True, scale data so that normally-distributed features have a variance of 1. In general, if

the difference between the x-values of q_max and q_min for a standard normal distribution is greater than 1, the dataset is scaled down. If less than 1, the dataset is scaled up.

input_cols: The name(s) of one or more columns in a DataFrame containing a feature to be scaled. output_cols: The name(s) of one or more columns in a DataFrame in which results will be stored. The number of

columns specified must match the number of input columns. For dense output, the column names specified are used as base names for the columns created for each category.

drop_input_cols: Remove input columns from output if set True. False by default.

Attributes:

center_: Dictionary mapping input column name to the median value for that feature. scale_: Dictionary mapping input column name to the (scaled) interquartile range for that feature.

Methods

fit(dataset)

Compute center, scale and quantile values of the dataset.

transform(dataset)

Center and scale the data.

Attributes

center_

scale_