Snowpark Scala API Reference 1.17.0 - com.snowflake.snowpark.DataFrameStatFunctions

final def != ( arg0: Any ) : Boolean

Definition Classes: AnyRef → Any

final def ## () : Int

Definition Classes: AnyRef → Any

final def == ( arg0: Any ) : Boolean

Definition Classes: AnyRef → Any

def action [ T ] ( funcName: String ) ( func: ⇒ T ) : T

Attributes: protected
Annotations: @inline ()

def approxQuantile ( cols: Array [ String ] , percentile: Array [ Double ] ) : Array [ Array [ Option [ Double ]]]

For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles.

For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles. For example, result(0)(1) contains the approximate value for column cols(0) at quantile percentile(1) .

This function uses the t-Digest algorithm.

For example, the following code:

import session.implicits._
val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
val res = double2.stat.approxQuantile(Array("a", "b"), Array(0, 0.1, 0.6))

prints out the following result:

res: Array(Array(Some(0.05), Some(0.15000000000000002), Some(0.25)),
           Array(Some(0.45), Some(0.55), Some(0.6499999999999999)))

cols: An array of column names.
percentile: An array of double values greater than or equal to 0.0 and less than 1.0.
returns: A matrix with the dimensions (cols.size * percentile.size) containing the approximate percentile values. If there is not enough data to calculate the quantile, the method returns None.

Since: 0.2.0

def approxQuantile ( col: String , percentile: Array [ Double ] ) : Array [ Option [ Double ]]

For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.

This function uses the t-Digest algorithm.

For example, the following code:

import session.implicits._
val df = Seq(1, 2, 3, 4, 5, 6, 7, 8, 9, 0).toDF("a")
val res = df.stat.approxQuantile("a", Array(0, 0.1, 0.4, 0.6, 1))

prints out the following result:

res: Array(Some(-0.5), Some(0.5), Some(3.5), Some(5.5), Some(9.5))

col: The name of the numeric column.
percentile: An array of double values greater than or equal to 0.0 and less than 1.0.
returns: An array of approximate percentile values, If there is not enough data to calculate the quantile, the method returns None.

Since: 0.2.0

final def asInstanceOf [ T0 ] : T0

Definition Classes: Any

def clone () : AnyRef

Attributes: protected[ lang ]
Definition Classes: AnyRef
Annotations: @throws ( ... ) @native () @HotSpotIntrinsicCandidate ()

def corr ( col1: String , col2: String ) : Option [ Double ]

Calculates the correlation coefficient for non-null pairs in two numeric columns.

For example, the following code:

import session.implicits._
val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
double res = df.stat.corr("a", "b").get

prints out the following result:

res: 0.9999999999999991

col1: The name of the first numeric column to use.
col2: The name of the second numeric column to use.
returns: The correlation of the two numeric columns. If there is not enough data to generate the correlation, the method returns None.

Since: 0.2.0

def cov ( col1: String , col2: String ) : Option [ Double ]

Calculates the sample covariance for non-null pairs in two numeric columns.

For example, the following code:

import session.implicits._
val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
double res = df.stat.cov("a", "b").get

prints out the following result:

res: 0.010000000000000037

col1: The name of the first numeric column to use.
col2: The name of the second numeric column to use.
returns: The sample covariance of the two numeric columns, If there is not enough data to generate the covariance, the method returns None.

Since: 0.2.0

def crosstab ( col1: String , col2: String ) : DataFrame

Computes a pair-wise frequency table (a contingency table ) for the specified columns.

Computes a pair-wise frequency table (a contingency table ) for the specified columns. The method returns a DataFrame containing this table.

In the returned contingency table:

The first column of each row contains the distinct values of col1 .
The name of the first column is the name of col1 .
The rest of the column names are the distinct values of col2 .
The counts are returned as Longs.
For pairs that have no occurrences, the contingency table contains 0 as the count.

Note: The number of distinct values in col2 should not exceed 1000.

For example, the following code:

import session.implicits._
val df = Seq((1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3, 2), (3, 3)).toDF("key", "value")
val ct = df.stat.crosstab("key", "value")
ct.show()

prints out the following result:

---------------------------------------------------------------------------------------------
|"KEY"  |"CAST(1 AS NUMBER(38,0))"  |"CAST(2 AS NUMBER(38,0))"  |"CAST(3 AS NUMBER(38,0))"  |
---------------------------------------------------------------------------------------------
|1      |1                          |1                          |0                          |
|2      |2                          |0                          |1                          |
|3      |0                          |1                          |1                          |
---------------------------------------------------------------------------------------------

col1: The name of the first column to use.
col2: The name of the second column to use.
returns: A DataFrame containing the contingency table.

Since: 0.2.0

final def eq ( arg0: AnyRef ) : Boolean

Definition Classes: AnyRef

def equals ( arg0: Any ) : Boolean

Definition Classes: AnyRef → Any

final def getClass () : Class [_]

Definition Classes: AnyRef → Any
Annotations: @native () @HotSpotIntrinsicCandidate ()

def hashCode () : Int

Definition Classes: AnyRef → Any
Annotations: @native () @HotSpotIntrinsicCandidate ()

final def isInstanceOf [ T0 ] : Boolean

Definition Classes: Any

def log () : Logger

Attributes: protected[ internal ]
Definition Classes: Logging

def logDebug ( msg: String , throwable: Throwable ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

def logDebug ( msg: String ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

def logError ( msg: String , throwable: Throwable ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

def logError ( msg: String ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

def logInfo ( msg: String , throwable: Throwable ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

def logInfo ( msg: String ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

def logTrace ( msg: String , throwable: Throwable ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

def logTrace ( msg: String ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

def logWarning ( msg: String , throwable: Throwable ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

def logWarning ( msg: String ) : Unit

Attributes: protected[ internal ]
Definition Classes: Logging

final def ne ( arg0: AnyRef ) : Boolean

Definition Classes: AnyRef

final def notify () : Unit

Definition Classes: AnyRef
Annotations: @native () @HotSpotIntrinsicCandidate ()

final def notifyAll () : Unit

Definition Classes: AnyRef
Annotations: @native () @HotSpotIntrinsicCandidate ()

def sampleBy [ T ] ( col: String , fractions: Map [ T , Double ] ) : DataFrame

Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

For example, the following code:

import session.implicits._
val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age")
val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0)
df.stat.sampleBy("name", fractions).show()

prints out the following result:

------------------
|"NAME"  |"AGE"  |
------------------
|Bob     |17     |
|Nico    |8      |
------------------

T: The type of the stratum.
col: The name of the column that defines the strata.
fractions: A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.
returns: A new DataFrame that contains the stratified sample.

Since: 0.2.0

def sampleBy [ T ] ( col: Column , fractions: Map [ T , Double ] ) : DataFrame

Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

For example, the following code:

import session.implicits._
val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age")
val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0)
df.stat.sampleBy(col("name"), fractions).show()

prints out the following result:

------------------
|"NAME"  |"AGE"  |
------------------
|Bob     |17     |
|Nico    |8      |
------------------

T: The type of the stratum.
col: An expression for the column that defines the strata.
fractions: A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.
returns: A new DataFrame that contains the stratified sample.

Since: 0.2.0

final def synchronized [ T0 ] ( arg0: ⇒ T0 ) : T0

Definition Classes: AnyRef

def toString () : String

Definition Classes: AnyRef → Any

def transformation ( funcName: String ) ( func: ⇒ DataFrame ) : DataFrame

Attributes: protected
Annotations: @inline ()

final def wait ( arg0: Long , arg1: Int ) : Unit

Definition Classes: AnyRef
Annotations: @throws ( ... )

final def wait ( arg0: Long ) : Unit

Definition Classes: AnyRef
Annotations: @throws ( ... ) @native ()

final def wait () : Unit

Definition Classes: AnyRef
Annotations: @throws ( ... )

Packages

DataFrameStatFunctions

final class DataFrameStatFunctions extends Logging

Value Members

Deprecated Value Members

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

DataFrameStatFunctions 

final class DataFrameStatFunctions extends Logging

Value Members

Deprecated Value Members

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped

DataFrameStatFunctions