com . snowflake . snowpark

DataFrameStatFunctions

final class DataFrameStatFunctions extends Logging

Provides eagerly computed statistical functions for DataFrames.

To access an object of this class, use DataFrame.stat .

Since: 0.2.0

Linear Supertypes

Logging , AnyRef , Any

Ordering

Alphabetic
By Inheritance

Inherited

DataFrameStatFunctions
Logging
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Value Members

final def != ( arg0: Any ) : Boolean

Definition Classes

AnyRef → Any
final def ## () : Int

Definition Classes

AnyRef → Any
final def == ( arg0: Any ) : Boolean

Definition Classes

AnyRef → Any
def action [ T ] ( funcName: String ) ( func: ⇒ T ) : T

Attributes

protected

Annotations

@inline ()
def approxQuantile ( cols: Array [ String ] , percentile: Array [ Double ] ) : Array [ Array [ Option [ Double ]]]
For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles.
For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles. For example, result(0)(1) contains the approximate value for column cols(0) at quantile percentile(1) .

This function uses the t-Digest algorithm.

For example, the following code:
```
import session.implicits._
val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
val res = double2.stat.approxQuantile(Array("a", "b"), Array(0, 0.1, 0.6))
```
prints out the following result:
```
res: Array(Array(Some(0.05), Some(0.15000000000000002), Some(0.25)),
           Array(Some(0.45), Some(0.55), Some(0.6499999999999999)))
```
cols

An array of column names.

percentile

An array of double values greater than or equal to 0.0 and less than 1.0.

returns

A matrix with the dimensions (cols.size * percentile.size) containing the approximate percentile values. If there is not enough data to calculate the quantile, the method returns None.

Since

0.2.0
def approxQuantile ( col: String , percentile: Array [ Double ] ) : Array [ Option [ Double ]]
For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.
For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.

This function uses the t-Digest algorithm.

For example, the following code:
```
import session.implicits._
val df = Seq(1, 2, 3, 4, 5, 6, 7, 8, 9, 0).toDF("a")
val res = df.stat.approxQuantile("a", Array(0, 0.1, 0.4, 0.6, 1))
```
prints out the following result:
```
res: Array(Some(-0.5), Some(0.5), Some(3.5), Some(5.5), Some(9.5))
```
col

The name of the numeric column.

percentile

An array of double values greater than or equal to 0.0 and less than 1.0.

returns

An array of approximate percentile values, If there is not enough data to calculate the quantile, the method returns None.

Since

0.2.0
final def asInstanceOf [ T0 ] : T0

Definition Classes

Any
def clone () : AnyRef

Attributes

protected[ lang ]

Definition Classes

AnyRef

Annotations

@throws ( ... ) @native () @HotSpotIntrinsicCandidate ()
def corr ( col1: String , col2: String ) : Option [ Double ]
Calculates the correlation coefficient for non-null pairs in two numeric columns.
Calculates the correlation coefficient for non-null pairs in two numeric columns.

For example, the following code:
```
import session.implicits._
val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
double res = df.stat.corr("a", "b").get
```
prints out the following result:
```
res: 0.9999999999999991
```
col1

The name of the first numeric column to use.

col2

The name of the second numeric column to use.

returns

The correlation of the two numeric columns. If there is not enough data to generate the correlation, the method returns None.

Since

0.2.0
def cov ( col1: String , col2: String ) : Option [ Double ]
Calculates the sample covariance for non-null pairs in two numeric columns.
Calculates the sample covariance for non-null pairs in two numeric columns.

For example, the following code:
```
import session.implicits._
val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
double res = df.stat.cov("a", "b").get
```
prints out the following result:
```
res: 0.010000000000000037
```
col1

The name of the first numeric column to use.

col2

The name of the second numeric column to use.

returns

The sample covariance of the two numeric columns, If there is not enough data to generate the covariance, the method returns None.

Since

0.2.0

def crosstab ( col1: String , col2: String ) : DataFrame

Computes a pair-wise frequency table (a contingency table ) for the specified columns.

Computes a pair-wise frequency table (a contingency table ) for the specified columns. The method returns a DataFrame containing this table.

In the returned contingency table:

The first column of each row contains the distinct values of col1 .
The name of the first column is the name of col1 .
The rest of the column names are the distinct values of col2 .
The counts are returned as Longs.
For pairs that have no occurrences, the contingency table contains 0 as the count.

Note: The number of distinct values in col2 should not exceed 1000.

For example, the following code:

import session.implicits._
val df = Seq((1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3, 2), (3, 3)).toDF("key", "value")
val ct = df.stat.crosstab("key", "value")
ct.show()

prints out the following result:

---------------------------------------------------------------------------------------------
|"KEY"  |"CAST(1 AS NUMBER(38,0))"  |"CAST(2 AS NUMBER(38,0))"  |"CAST(3 AS NUMBER(38,0))"  |
---------------------------------------------------------------------------------------------
|1      |1                          |1                          |0                          |
|2      |2                          |0                          |1                          |
|3      |0                          |1                          |1                          |
---------------------------------------------------------------------------------------------

col1: The name of the first column to use.
col2: The name of the second column to use.
returns: A DataFrame containing the contingency table.

Since: 0.2.0

final def eq ( arg0: AnyRef ) : Boolean

Definition Classes

AnyRef
def equals ( arg0: Any ) : Boolean

Definition Classes

AnyRef → Any
final def getClass () : Class [_]

Definition Classes

AnyRef → Any

Annotations

@native () @HotSpotIntrinsicCandidate ()
def hashCode () : Int

Definition Classes

AnyRef → Any

Annotations

@native () @HotSpotIntrinsicCandidate ()
final def isInstanceOf [ T0 ] : Boolean

Definition Classes

Any
def log () : Logger

Attributes

protected[ internal ]

Definition Classes

Logging
def logDebug ( msg: String , throwable: Throwable ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
def logDebug ( msg: String ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
def logError ( msg: String , throwable: Throwable ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
def logError ( msg: String ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
def logInfo ( msg: String , throwable: Throwable ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
def logInfo ( msg: String ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
def logTrace ( msg: String , throwable: Throwable ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
def logTrace ( msg: String ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
def logWarning ( msg: String , throwable: Throwable ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
def logWarning ( msg: String ) : Unit

Attributes

protected[ internal ]

Definition Classes

Logging
final def ne ( arg0: AnyRef ) : Boolean

Definition Classes

AnyRef
final def notify () : Unit

Definition Classes

AnyRef

Annotations

@native () @HotSpotIntrinsicCandidate ()
final def notifyAll () : Unit

Definition Classes

AnyRef

Annotations

@native () @HotSpotIntrinsicCandidate ()
def sampleBy [ T ] ( col: String , fractions: Map [ T , Double ] ) : DataFrame
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

For example, the following code:
```
import session.implicits._
val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age")
val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0)
df.stat.sampleBy("name", fractions).show()
```
prints out the following result:
```
------------------
|"NAME"  |"AGE"  |
------------------
|Bob     |17     |
|Nico    |8      |
------------------
```
T

The type of the stratum.

col

The name of the column that defines the strata.

fractions

A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.

returns

A new DataFrame that contains the stratified sample.

Since

0.2.0
def sampleBy [ T ] ( col: Column , fractions: Map [ T , Double ] ) : DataFrame
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.
Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

For example, the following code:
```
import session.implicits._
val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age")
val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0)
df.stat.sampleBy(col("name"), fractions).show()
```
prints out the following result:
```
------------------
|"NAME"  |"AGE"  |
------------------
|Bob     |17     |
|Nico    |8      |
------------------
```
T

The type of the stratum.

col

An expression for the column that defines the strata.

fractions

A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.

returns

A new DataFrame that contains the stratified sample.

Since

0.2.0
final def synchronized [ T0 ] ( arg0: ⇒ T0 ) : T0

Definition Classes

AnyRef
def toString () : String

Definition Classes

AnyRef → Any
def transformation ( funcName: String ) ( func: ⇒ DataFrame ) : DataFrame

Attributes

protected

Annotations

@inline ()
final def wait ( arg0: Long , arg1: Int ) : Unit

Definition Classes

AnyRef

Annotations

@throws ( ... )
final def wait ( arg0: Long ) : Unit

Definition Classes

AnyRef

Annotations

@throws ( ... ) @native ()
final def wait () : Unit

Definition Classes

AnyRef

Annotations

@throws ( ... )

Deprecated Value Members

def finalize () : Unit

Attributes

protected[ lang ]

Definition Classes

AnyRef

Annotations

@throws ( classOf[java.lang.Throwable] ) @Deprecated

Deprecated

Packages

DataFrameStatFunctions

final class DataFrameStatFunctions extends Logging

Value Members

Deprecated Value Members

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

DataFrameStatFunctions 

final class DataFrameStatFunctions extends Logging

Value Members

Deprecated Value Members

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped

DataFrameStatFunctions