c

com . snowflake . snowpark

DataFrameStatFunctions

final class DataFrameStatFunctions extends Logging

Provides eagerly computed statistical functions for DataFrames.

To access an object of this class, use DataFrame.stat .

Since

0.2.0

Linear Supertypes
Logging , AnyRef , Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataFrameStatFunctions
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def != ( arg0: Any ) : Boolean
    Definition Classes
    AnyRef → Any
  2. final def ## () : Int
    Definition Classes
    AnyRef → Any
  3. final def == ( arg0: Any ) : Boolean
    Definition Classes
    AnyRef → Any
  4. def approxQuantile ( cols: Array [ String ] , percentile: Array [ Double ] ) : Array [ Array [ Option [ Double ]]]

    For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles.

    For an array of numeric columns and an array of desired quantiles, returns a matrix of approximate values for each column at each of the desired quantiles. For example, result(0)(1) contains the approximate value for column cols(0) at quantile percentile(1) .

    This function uses the t-Digest algorithm.

    For example, the following code:

    import session.implicits._
    val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
    val res = double2.stat.approxQuantile(Array("a", "b"), Array(0, 0.1, 0.6))

    prints out the following result:

    res: Array(Array(Some(0.05), Some(0.15000000000000002), Some(0.25)),
               Array(Some(0.45), Some(0.55), Some(0.6499999999999999)))
    cols

    An array of column names.

    percentile

    An array of double values greater than or equal to 0.0 and less than 1.0.

    returns

    A matrix with the dimensions (cols.size * percentile.size) containing the approximate percentile values. If there is not enough data to calculate the quantile, the method returns None.

    Since

    0.2.0

  5. def approxQuantile ( col: String , percentile: Array [ Double ] ) : Array [ Option [ Double ]]

    For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.

    For a specified numeric column and an array of desired quantiles, returns an approximate value for the column at each of the desired quantiles.

    This function uses the t-Digest algorithm.

    For example, the following code:

    import session.implicits._
    val df = Seq(1, 2, 3, 4, 5, 6, 7, 8, 9, 0).toDF("a")
    val res = df.stat.approxQuantile("a", Array(0, 0.1, 0.4, 0.6, 1))

    prints out the following result:

    res: Array(Some(-0.5), Some(0.5), Some(3.5), Some(5.5), Some(9.5))
    col

    The name of the numeric column.

    percentile

    An array of double values greater than or equal to 0.0 and less than 1.0.

    returns

    An array of approximate percentile values, If there is not enough data to calculate the quantile, the method returns None.

    Since

    0.2.0

  6. final def asInstanceOf [ T0 ] : T0
    Definition Classes
    Any
  7. def clone () : AnyRef
    Attributes
    protected[ lang ]
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... ) @native () @HotSpotIntrinsicCandidate ()
  8. def corr ( col1: String , col2: String ) : Option [ Double ]

    Calculates the correlation coefficient for non-null pairs in two numeric columns.

    Calculates the correlation coefficient for non-null pairs in two numeric columns.

    For example, the following code:

    import session.implicits._
    val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
    double res = df.stat.corr("a", "b").get

    prints out the following result:

    res: 0.9999999999999991
    col1

    The name of the first numeric column to use.

    col2

    The name of the second numeric column to use.

    returns

    The correlation of the two numeric columns. If there is not enough data to generate the correlation, the method returns None.

    Since

    0.2.0

  9. def cov ( col1: String , col2: String ) : Option [ Double ]

    Calculates the sample covariance for non-null pairs in two numeric columns.

    Calculates the sample covariance for non-null pairs in two numeric columns.

    For example, the following code:

    import session.implicits._
    val df = Seq((0.1, 0.5), (0.2, 0.6), (0.3, 0.7)).toDF("a", "b")
    double res = df.stat.cov("a", "b").get

    prints out the following result:

    res: 0.010000000000000037
    col1

    The name of the first numeric column to use.

    col2

    The name of the second numeric column to use.

    returns

    The sample covariance of the two numeric columns, If there is not enough data to generate the covariance, the method returns None.

    Since

    0.2.0

  10. def crosstab ( col1: String , col2: String ) : DataFrame

    Computes a pair-wise frequency table (a contingency table ) for the specified columns.

    Computes a pair-wise frequency table (a contingency table ) for the specified columns. The method returns a DataFrame containing this table.

    In the returned contingency table:

    • The first column of each row contains the distinct values of col1 .
    • The name of the first column is the name of col1 .
    • The rest of the column names are the distinct values of col2 .
    • The counts are returned as Longs.
    • For pairs that have no occurrences, the contingency table contains 0 as the count.

    Note: The number of distinct values in col2 should not exceed 1000.

    For example, the following code:

    import session.implicits._
    val df = Seq((1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3, 2), (3, 3)).toDF("key", "value")
    val ct = df.stat.crosstab("key", "value")
    ct.show()

    prints out the following result:

    ---------------------------------------------------------------------------------------------
    |"KEY"  |"CAST(1 AS NUMBER(38,0))"  |"CAST(2 AS NUMBER(38,0))"  |"CAST(3 AS NUMBER(38,0))"  |
    ---------------------------------------------------------------------------------------------
    |1      |1                          |1                          |0                          |
    |2      |2                          |0                          |1                          |
    |3      |0                          |1                          |1                          |
    ---------------------------------------------------------------------------------------------
    col1

    The name of the first column to use.

    col2

    The name of the second column to use.

    returns

    A DataFrame containing the contingency table.

    Since

    0.2.0

  11. final def eq ( arg0: AnyRef ) : Boolean
    Definition Classes
    AnyRef
  12. def equals ( arg0: Any ) : Boolean
    Definition Classes
    AnyRef → Any
  13. final def getClass () : Class [_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native () @HotSpotIntrinsicCandidate ()
  14. def hashCode () : Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native () @HotSpotIntrinsicCandidate ()
  15. final def isInstanceOf [ T0 ] : Boolean
    Definition Classes
    Any
  16. def log () : Logger
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  17. def logDebug ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  18. def logDebug ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  19. def logError ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  20. def logError ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  21. def logInfo ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  22. def logInfo ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  23. def logTrace ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  24. def logTrace ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  25. def logWarning ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  26. def logWarning ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  27. final def ne ( arg0: AnyRef ) : Boolean
    Definition Classes
    AnyRef
  28. final def notify () : Unit
    Definition Classes
    AnyRef
    Annotations
    @native () @HotSpotIntrinsicCandidate ()
  29. final def notifyAll () : Unit
    Definition Classes
    AnyRef
    Annotations
    @native () @HotSpotIntrinsicCandidate ()
  30. def sampleBy [ T ] ( col: String , fractions: Map [ T , Double ] ) : DataFrame

    Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

    Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

    For example, the following code:

    import session.implicits._
    val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age")
    val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0)
    df.stat.sampleBy("name", fractions).show()

    prints out the following result:

    ------------------
    |"NAME"  |"AGE"  |
    ------------------
    |Bob     |17     |
    |Nico    |8      |
    ------------------
    T

    The type of the stratum.

    col

    The name of the column that defines the strata.

    fractions

    A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.

    returns

    A new DataFrame that contains the stratified sample.

    Since

    0.2.0

  31. def sampleBy [ T ] ( col: Column , fractions: Map [ T , Double ] ) : DataFrame

    Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

    Returns a DataFrame containing a stratified sample without replacement, based on a Map that specifies the fraction for each stratum.

    For example, the following code:

    import session.implicits._
    val df = Seq(("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)).toDF("name", "age")
    val fractions = Map("Bob" -> 0.5, "Nico" -> 1.0)
    df.stat.sampleBy(col("name"), fractions).show()

    prints out the following result:

    ------------------
    |"NAME"  |"AGE"  |
    ------------------
    |Bob     |17     |
    |Nico    |8      |
    ------------------
    T

    The type of the stratum.

    col

    An expression for the column that defines the strata.

    fractions

    A Map that specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the Map, the method uses 0 as the fraction.

    returns

    A new DataFrame that contains the stratified sample.

    Since

    0.2.0

  32. final def synchronized [ T0 ] ( arg0: ⇒ T0 ) : T0
    Definition Classes
    AnyRef
  33. def toString () : String
    Definition Classes
    AnyRef → Any
  34. final def wait ( arg0: Long , arg1: Int ) : Unit
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... )
  35. final def wait ( arg0: Long ) : Unit
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... ) @native ()
  36. final def wait () : Unit
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... )

Deprecated Value Members

  1. def finalize () : Unit
    Attributes
    protected[ lang ]
    Definition Classes
    AnyRef
    Annotations
    @throws ( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped