DataFrame¶
Classes
|
Represents a lazily-evaluated relational dataset that contains a collection of |
Provides functions for handling missing values in a |
|
Provides computed statistical functions for DataFrames. |
Methods
|
Aggregate the data in the DataFrame. |
|
For a specified numeric column and a list of desired quantiles, returns an approximate value for the column at each of the desired quantiles. |
|
For a specified numeric column and a list of desired quantiles, returns an approximate value for the column at each of the desired quantiles. |
|
Caches the content of this DataFrame to create a new cached Table DataFrame. |
|
Returns a reference to a column in the DataFrame. |
Executes the query representing this DataFrame and returns the result as a list of |
|
|
Executes the query representing this DataFrame asynchronously and returns: class:AsyncJob. |
|
Executes a COPY INTO <table> command to load data from files in a stage location into a specified table. |
|
Calculates the correlation coefficient for non-null pairs in two numeric columns. |
Executes the query representing this DataFrame and returns the number of rows in the result (similar to the COUNT function in SQL). |
|
|
Calculates the sample covariance for non-null pairs in two numeric columns. |
|
Creates a temporary view that returns the same results as this DataFrame. |
|
Creates a view that captures the computation expressed by this DataFrame. |
Creates a temporary view that returns the same results as this DataFrame. |
|
|
Creates a view that captures the computation expressed by this DataFrame. |
|
Performs a cross join, which returns the Cartesian product of the current |
|
Performs a cross join, which returns the Cartesian product of the current |
|
Computes a pair-wise frequency table (a |
|
Performs a SQL GROUP BY CUBE. |
|
Computes basic statistics for numeric columns, which includes |
Returns a new DataFrame that contains only the rows with distinct values from the current DataFrame. |
|
|
Returns a new DataFrame that excludes the columns with the specified names from the output. |
|
Creates a new DataFrame by removing duplicated rows on given subset of columns. |
|
Creates a new DataFrame by removing duplicated rows on given subset of columns. |
|
Returns a new DataFrame that excludes all rows containing fewer than a specified number of non-null and non-NaN values in the specified columns. |
|
Returns a new DataFrame that contains all the rows from the current DataFrame except for the rows that also appear in the |
Prints the list of queries that will be executed to evaluate this DataFrame. |
|
|
Returns a new DataFrame that replaces all null and NaN values in the specified columns with the values provided. |
|
Filters rows based on the specified conditional expression (similar to WHERE in SQL). |
Executes the query representing this DataFrame and returns the first |
|
|
Flattens (explodes) compound values into multiple rows. |
|
Groups rows by the columns specified by expressions (similar to GROUP BY in SQL). |
|
Groups rows by the columns specified by expressions (similar to GROUP BY in SQL). |
|
Performs a SQL GROUP BY GROUPING SETS. |
|
Returns a new DataFrame that contains the intersection of rows from the current DataFrame and another DataFrame ( |
|
Performs a join of the specified type ( |
|
Lateral joins the current DataFrame with the output of the specified table function. |
|
Returns a new DataFrame that contains at most |
|
Returns a new DataFrame that contains all the rows from the current DataFrame except for the rows that also appear in the |
|
Performs a natural join of the specified type ( |
|
Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL). |
|
Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL). |
|
Rotates this DataFrame by turning the unique values from one column in the input expression into multiple columns and aggregating results where required on any remaining column values. |
|
Randomly splits the current DataFrame into separate DataFrames, using the specified weights. |
|
Randomly splits the current DataFrame into separate DataFrames, using the specified weights. |
|
Returns a DataFrame with the specified column |
|
Returns a new DataFrame that replaces values in the specified columns. |
|
Performs a SQL GROUP BY ROLLUP. |
|
Samples rows based on either the number of rows to be returned or a percentage of rows to be returned. |
|
Returns a DataFrame containing a stratified sample without replacement, based on a |
|
Returns a DataFrame containing a stratified sample without replacement, based on a |
|
Returns a new DataFrame with the specified Column expressions as output (similar to SELECT in SQL). |
|
Projects a set of SQL expressions and returns a new |
|
Projects a set of SQL expressions and returns a new |
|
Evaluates this DataFrame and prints out the first |
|
Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL). |
|
Returns a new DataFrame that contains all the rows from the current DataFrame except for the rows that also appear in the |
|
Executes the query representing this DataFrame and returns the first |
|
Creates a new DataFrame containing columns with the specified names. |
|
Executes the query representing this DataFrame and returns an iterator of |
|
Executes the query representing this DataFrame and returns the result as a Pandas DataFrame. |
|
Creates a new DataFrame containing columns with the specified names. |
Executes the query representing this DataFrame and returns an iterator of |
|
Executes the query representing this DataFrame and returns the result as a Pandas DataFrame. |
|
Executes the query representing this DataFrame and returns an iterator of Pandas dataframes (containing a subset of rows) that you can use to retrieve the results. |
|
|
Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( |
|
Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( |
|
Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( |
|
Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( |
|
Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( |
|
Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( |
|
Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( |
|
Rotates a table by transforming columns into rows. |
|
Filters rows based on the specified conditional expression (similar to WHERE in SQL). |
|
Returns a DataFrame with an additional column with the specified name |
|
Returns a DataFrame with the specified column |
|
Returns a DataFrame with an additional column with the specified name |
|
Returns a DataFrame with the specified column |
|
Returns a DataFrame with additional columns with the specified names |
|
Returns a new DataFrame that excludes all rows containing fewer than a specified number of non-null and non-NaN values in the specified columns. |
|
Returns a new DataFrame that replaces all null and NaN values in the specified columns with the values provided. |
|
Returns a new DataFrame that replaces values in the specified columns. |
|
For a specified numeric column and a list of desired quantiles, returns an approximate value for the column at each of the desired quantiles. |
|
For a specified numeric column and a list of desired quantiles, returns an approximate value for the column at each of the desired quantiles. |
|
Calculates the correlation coefficient for non-null pairs in two numeric columns. |
|
Calculates the sample covariance for non-null pairs in two numeric columns. |
|
Computes a pair-wise frequency table (a |
|
Returns a DataFrame containing a stratified sample without replacement, based on a |
|
Returns a DataFrame containing a stratified sample without replacement, based on a |
Attributes
Returns all column names as a list. |
|
Returns a |
|
Returns a |
|
The definition of the columns in this DataFrame (the "relational schema" for the DataFrame). |
|
Returns a new |
|
Whether the dataframe is cached. |