c

# DataFrameStatFunctions 

#### final class DataFrameStatFunctions extends Logging

Provides eagerly computed statistical functions for DataFrames.

To access an object of this class, use DataFrame.stat .

Since

0.2.0

Linear Supertypes
Logging , AnyRef , Any
Ordering
1. Alphabetic
2. By Inheritance
Inherited
1. DataFrameStatFunctions
2. Logging
3. AnyRef
4. Any
1. Hide All
2. Show All
Visibility
1. Public
2. All

### Value Members

1. final def != ( arg0: Any ) : Boolean
Definition Classes
AnyRef → Any
2. final def ## () : Int
Definition Classes
AnyRef → Any
3. final def == ( arg0: Any ) : Boolean
Definition Classes
AnyRef → Any
4. def approxQuantile ( cols: Array [ String ] , percentile: Array [ Double ] ) : Array [ Array [ Option [ Double ]]]

For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles.

For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles. For example, ``` result(0)(1) ``` contains the approximate value for column ``` cols(0) ``` at quantile ``` percentile(1) ``` .

This function uses the t-Digest algorithm.

For example, the following code:

```import session.implicits._
val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
val res = double2.stat.approxQuantile(Array("a", "b"), Array(0, 0.1, 0.6))```

prints out the following result:

```res: Array(Array(Some(0.05), Some(0.15000000000000002), Some(0.25)),
Array(Some(0.45), Some(0.55), Some(0.6499999999999999)))```
cols

An array of column names.

percentile

An array of double values greater than or equal to 0.0 and less than 1.0.

returns

A matrix with the dimensions ``` (cols.size * percentile.size) ``` containing the approximate percentile values. If there is not enough data to calculate the quantile, the method returns None.

Since

0.2.0

5. def approxQuantile ( col: String , percentile: Array [ Double ] ) : Array [ Option [ Double ]]

For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.

For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.

This function uses the t-Digest algorithm.

For example, the following code:

```import session.implicits._
val df = Seq(1, 2, 3, 4, 5, 6, 7, 8, 9, 0).toDF("a")
val res = df.stat.approxQuantile("a", Array(0, 0.1, 0.4, 0.6, 1))```

prints out the following result:

`res: Array(Some(-0.5), Some(0.5), Some(3.5), Some(5.5), Some(9.5))`
col

The name of the numeric column.

percentile

An array of double values greater than or equal to 0.0 and less than 1.0.

returns

An array of approximate percentile values, If there is not enough data to calculate the quantile, the method returns None.

Since

0.2.0

6. final def asInstanceOf [ T0 ] : T0
Definition Classes
Any
7. def clone () : AnyRef
Attributes
protected[ lang ]
Definition Classes
AnyRef
Annotations
@throws ( ... ) @native () @HotSpotIntrinsicCandidate ()
8. def corr ( col1: String , col2: String ) : Option [ Double ]

Calculates the correlation coefficient for non-null pairs in two numeric columns.

Calculates the correlation coefficient for non-null pairs in two numeric columns.

For example, the following code:

```import session.implicits._
val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
double res = df.stat.corr("a", "b").get```

prints out the following result:

`res: 0.9999999999999991`
col1

The name of the first numeric column to use.

col2

The name of the second numeric column to use.

returns

The correlation of the two numeric columns. If there is not enough data to generate the correlation, the method returns None.

Since

0.2.0

9. def cov ( col1: String , col2: String ) : Option [ Double ]

Calculates the sample covariance for non-null pairs in two numeric columns.

Calculates the sample covariance for non-null pairs in two numeric columns.

For example, the following code:

```import session.implicits._
val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
double res = df.stat.cov("a", "b").get```

prints out the following result:

`res: 0.010000000000000037`
col1

The name of the first numeric column to use.

col2

The name of the second numeric column to use.

returns

The sample covariance of the two numeric columns, If there is not enough data to generate the covariance, the method returns None.

Since

0.2.0

10. def crosstab ( col1: String , col2: String )

Computes a pair-wise frequency table (a contingency table ) for the specified columns.

Computes a pair-wise frequency table (a contingency table ) for the specified columns. The method returns a DataFrame containing this table.

In the returned contingency table:

• The first column of each row contains the distinct values of ``` col1 ``` .
• The name of the first column is the name of ``` col1 ``` .
• The rest of the column names are the distinct values of ``` col2 ``` .
• The counts are returned as Longs.
• For pairs that have no occurrences, the contingency table contains 0 as the count.

Note: The number of distinct values in ``` col2 ``` should not exceed 1000.

For example, the following code:

```import session.implicits._
val df = Seq((1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3, 2), (3, 3)).toDF("key", "value")
val ct = df.stat.crosstab("key", "value")
ct.show()```

prints out the following result:

```---------------------------------------------------------------------------------------------
|"KEY"  |"CAST(1 AS NUMBER(38,0))"  |"CAST(2 AS NUMBER(38,0))"  |"CAST(3 AS NUMBER(38,0))"  |
---------------------------------------------------------------------------------------------
|1      |1                          |1                          |0                          |
|2      |2                          |0                          |1                          |
|3      |0                          |1                          |1                          |
---------------------------------------------------------------------------------------------```
col1

The name of the first column to use.

col2

The name of the second column to use.

returns

A DataFrame containing the contingency table.

Since

0.2.0

11. final def eq ( arg0: AnyRef ) : Boolean
Definition Classes
AnyRef
12. def equals ( arg0: Any ) : Boolean
Definition Classes
AnyRef → Any
13. final def getClass () : Class [_]
Definition Classes
AnyRef → Any
Annotations
@native () @HotSpotIntrinsicCandidate ()
14. def hashCode () : Int
Definition Classes
AnyRef → Any
Annotations
@native () @HotSpotIntrinsicCandidate ()
15. final def isInstanceOf [ T0 ] : Boolean
Definition Classes
Any
16. def log () : Logger
Attributes
protected[ internal ]
Definition Classes
Logging
17. def logDebug ( msg: String , throwable: Throwable ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
18. def logDebug ( msg: String ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
19. def logError ( msg: String , throwable: Throwable ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
20. def logError ( msg: String ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
21. def logInfo ( msg: String , throwable: Throwable ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
22. def logInfo ( msg: String ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
23. def logTrace ( msg: String , throwable: Throwable ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
24. def logTrace ( msg: String ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
25. def logWarning ( msg: String , throwable: Throwable ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
26. def logWarning ( msg: String ) : Unit
Attributes
protected[ internal ]
Definition Classes
Logging
27. final def ne ( arg0: AnyRef ) : Boolean
Definition Classes
AnyRef
28. final def notify () : Unit
Definition Classes
AnyRef
Annotations
@native () @HotSpotIntrinsicCandidate ()
29. final def notifyAll () : Unit
Definition Classes
AnyRef
Annotations
@native () @HotSpotIntrinsicCandidate ()
30. def sampleBy [ T ] ( col: String , fractions: Map [ T , Double ] )

Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

For example, the following code:

```import session.implicits._
val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age")
val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0)
df.stat.sampleBy("name", fractions).show()```

prints out the following result:

```------------------
|"NAME"  |"AGE"  |
------------------
|Bob     |17     |
|Nico    |8      |
------------------```
T

The type of the stratum.

col

The name of the column that defines the strata.

fractions

A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.

returns

A new DataFrame that contains the stratified sample.

Since

0.2.0

31. def sampleBy [ T ] ( col: Column , fractions: Map [ T , Double ] )

Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

For example, the following code:

```import session.implicits._
val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age")
val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0)
df.stat.sampleBy(col("name"), fractions).show()```

prints out the following result:

```------------------
|"NAME"  |"AGE"  |
------------------
|Bob     |17     |
|Nico    |8      |
------------------```
T

The type of the stratum.

col

An expression for the column that defines the strata.

fractions

A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.

returns

A new DataFrame that contains the stratified sample.

Since

0.2.0

32. final def synchronized [ T0 ] ( arg0: ⇒ T0 ) : T0
Definition Classes
AnyRef
33. def toString () : String
Definition Classes
AnyRef → Any
34. final def wait ( arg0: Long , arg1: Int ) : Unit
Definition Classes
AnyRef
Annotations
@throws ( ... )
35. final def wait ( arg0: Long ) : Unit
Definition Classes
AnyRef
Annotations
@throws ( ... ) @native ()
36. final def wait () : Unit
Definition Classes
AnyRef
Annotations
@throws ( ... )

### Deprecated Value Members

1. def finalize () : Unit
Attributes
protected[ lang ]
Definition Classes
AnyRef
Annotations
@throws ( classOf[java.lang.Throwable] ) @Deprecated @deprecated
Deprecated