Package com.snowflake.snowpark_java
Class DataFrameStatFunctions
- java.lang.Object
-
- com.snowflake.snowpark_java.DataFrameStatFunctions
-
public class DataFrameStatFunctions extends Object
Provides eagerly computed statistical functions for DataFrames.To access an object of this class, use
DataFrame.stat()
.- Since:
- 1.1.0
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Optional<Double>[][]
approxQuantile(String[] cols, double[] percentile)
For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles.Optional<Double>[]
approxQuantile(String col, double[] percentile)
For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.Optional<Double>
corr(String col1, String col2)
Calculates the correlation coefficient for non-null pairs in two numeric columns.Optional<Double>
cov(String col1, String col2)
Calculates the sample covariance for non-null pairs in two numeric columns.DataFrame
crosstab(String col1, String col2)
Computes a pair-wise frequency table (a ''contingency table'') for the specified columns.DataFrame
sampleBy(Column col, Map<?,Double> fractions)
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.DataFrame
sampleBy(String colName, Map<?,Double> fractions)
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.
-
-
-
Method Detail
-
corr
public Optional<Double> corr(String col1, String col2)
Calculates the correlation coefficient for non-null pairs in two numeric columns.- Parameters:
col1
- The name of the first numeric column to use.col2
- The name of the second numeric column to use.- Returns:
- The correlation of the two numeric columns. If there is not enough data to generate the correlation, the method returns None.
- Since:
- 1.1.0
-
cov
public Optional<Double> cov(String col1, String col2)
Calculates the sample covariance for non-null pairs in two numeric columns.- Parameters:
col1
- The name of the first numeric column to use.col2
- The name of the second numeric column to use.- Returns:
- The sample covariance of the two numeric columns, If there is not enough data to generate the covariance, the method returns None.
- Since:
- 1.1.0
-
approxQuantile
public Optional<Double>[] approxQuantile(String col, double[] percentile)
For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.This function uses the t-Digest algorithm.
- Parameters:
col
- The name of the numeric column.percentile
- An array of double values greater than or equal to 0.0 and less than 1.0.- Returns:
- An array of approximate percentile values, If there is not enough data to calculate the quantile, the method returns None.
- Since:
- 1.1.0
-
approxQuantile
public Optional<Double>[][] approxQuantile(String[] cols, double[] percentile)
For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles. For example, `result(0)(1)` contains the approximate value for column `cols(0)` at quantile `percentile(1)`.This function uses the t-Digest algorithm.
- Parameters:
cols
- An array of column names.percentile
- An array of double values greater than or equal to 0.0 and less than 1.0.- Returns:
- A matrix with the dimensions `(cols.size * percentile.size)` containing the approximate percentile values. If there is not enough data to calculate the quantile, the method returns None.
- Since:
- 1.1.0
-
crosstab
public DataFrame crosstab(String col1, String col2)
Computes a pair-wise frequency table (a ''contingency table'') for the specified columns. The method returns a DataFrame containing this table.In the returned contingency table:
- The first column of each row contains the distinct values of
col1
. - The name of the first column is the name ofcol1
. - The rest of the column names are the distinct values ofcol2
. - The counts are returned as Longs. - For pairs that have no occurrences, the contingency table contains 0 as the count.Note: The number of distinct values in
col2
should not exceed 1000.- Parameters:
col1
- The name of the first column to use.col2
- The name of the second column to use.- Returns:
- A DataFrame containing the contingency table.
- Since:
- 1.1.0
-
sampleBy
public DataFrame sampleBy(Column col, Map<?,Double> fractions)
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.- Parameters:
col
- An expression for the column that defines the strata.fractions
- A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.- Returns:
- A new DataFrame that contains the stratified sample.
- Since:
- 1.1.0
-
sampleBy
public DataFrame sampleBy(String colName, Map<?,Double> fractions)
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.- Parameters:
colName
- The name of the column that defines the strata.fractions
- A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.- Returns:
- A new DataFrame that contains the stratified sample.
- Since:
- 1.1.0
-
-