class DataFrameReader extends AnyRef

Provides methods to load data in various supported formats from a Snowflake stage to a DataFrame. The paths provided to the DataFrameReader must refer to Snowflake stages.

To use this object:

  1. Access an instance of a DataFrameReader by calling the Session.read method.
  2. Specify any format-specific options and copy options by calling the option or options method. These methods return a DataFrameReader that is configured with these options. (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)
  3. Specify the schema of the data that you plan to load by constructing a types.StructType object and passing it to the schema method. This method returns a DataFrameReader that is configured to read data that uses the specified schema.
  4. Specify the format of the data by calling the method named after the format (e.g. csv , json , etc.). These methods return a DataFrame that is configured to load data in the specified format.
  5. Call a DataFrame method that performs an action.

The following examples demonstrate how to use a DataFrameReader.

Example 1: Loading the first two columns of a CSV file and skipping the first header line.

// Import the package for StructType.
import com.snowflake.snowpark.types._
val filePath = "@mystage1"
// Define the schema for the data in the CSV file.
val userSchema = StructType(Seq(StructField("a", IntegerType), StructField("b", StringType)))
// Create a DataFrame that is configured to load data from the CSV file.
val csvDF = session.read.option("skip_header", 1).schema(userSchema).csv(filePath)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = csvDF.collect()

Example 2: Loading a gzip compressed json file.

val filePath = "@mystage2/data.json.gz"
// Create a DataFrame that is configured to load data from the gzipped JSON file.
val jsonDF = session.read.option("compression", "gzip").json(filePath)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = jsonDF.collect()

If you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.

Example 3: Loading only the CSV files from a stage location.

import com.snowflake.snowpark.types._
// Define the schema for the data in the CSV files.
val userSchema: StructType = StructType(Seq(StructField("a", IntegerType),StructField("b", StringType)))
// Create a DataFrame that is configured to load data from the CSV files in the stage.
val csvDF = session.read.option("pattern", ".*[.]csv").schema(userSchema).csv("@stage_location")
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = csvDF.collect()

In addition, if you want to load the files from the stage into a specified table with COPY INTO <table_name> command, you can use a copyInto() method e.g. CopyableDataFrame.copyInto(tableName:String)* .

Example 4: Loading data from a JSON file in a stage to a table by using COPY INTO <table_name> .

val filePath = "@mystage1"
// Create a DataFrame that is configured to load data from the JSON file.
val jsonDF = session.read.json(filePath)
// Load the data into the specified table `T1`.
// The table "T1" should exist before calling copyInto().
jsonDF.copyInto("T1")
Since

0.1.0

Linear Supertypes
AnyRef , Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataFrameReader
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataFrameReader ( session: Session )

    session

    Snowflake Session

Value Members

  1. final def != ( arg0: Any ) : Boolean
    Definition Classes
    AnyRef → Any
  2. final def ## () : Int
    Definition Classes
    AnyRef → Any
  3. final def == ( arg0: Any ) : Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf [ T0 ] : T0
    Definition Classes
    Any
  5. def avro ( path: String ) : CopyableDataFrame

    Returns a DataFrame that is set up to load data from the specified Avro file.

    Returns a DataFrame that is set up to load data from the specified Avro file.

    This method only supports reading data from files in Snowflake stages.

    Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

    For example:

    session.read.avro(path).where(col("$1:num") > 1)

    If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

    For example: The following example loads the Avro files in the stage location specified by path to the table T1 .

    // The table "T1" should exist before calling copyInto().
    session.read.avro(path).copyInto("T1")
    path

    The path to the Avro file (including the stage name).

    returns

    A CopyableDataFrame

    Since

    0.1.0

  6. def clone () : AnyRef
    Attributes
    protected[ lang ]
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... ) @native ()
  7. def csv ( path: String ) : CopyableDataFrame

    Returns a CopyableDataFrame that is set up to load data from the specified CSV file.

    Returns a CopyableDataFrame that is set up to load data from the specified CSV file.

    This method only supports reading data from files in Snowflake stages.

    Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

    For example:

    val filePath = "@mystage1/myfile.csv"
    // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
    val df = session.read.schema(userSchema).csv(fileInAStage).filter(col("a") < 2)
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = df.collect()

    If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

    For example: The following example loads the CSV files in the stage location specified by path to the table T1 .

    // The table "T1" should exist before calling copyInto().
    session.read.schema(userSchema).csv(path).copyInto("T1")
    path

    The path to the CSV file (including the stage name).

    returns

    A CopyableDataFrame

    Since

    0.1.0

  8. final def eq ( arg0: AnyRef ) : Boolean
    Definition Classes
    AnyRef
  9. def equals ( arg0: Any ) : Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize () : Unit
    Attributes
    protected[ lang ]
    Definition Classes
    AnyRef
    Annotations
    @throws ( classOf[java.lang.Throwable] )
  11. final def getClass () : Class [_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native ()
  12. def hashCode () : Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native ()
  13. final def isInstanceOf [ T0 ] : Boolean
    Definition Classes
    Any
  14. def json ( path: String ) : CopyableDataFrame

    Returns a DataFrame that is set up to load data from the specified JSON file.

    Returns a DataFrame that is set up to load data from the specified JSON file.

    This method only supports reading data from files in Snowflake stages.

    Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

    For example:

    // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
    val df = session.read.json(path).where(col("$1:num") > 1)
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = df.collect()

    If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

    For example: The following example loads the JSON files in the stage location specified by path to the table T1 .

    // The table "T1" should exist before calling copyInto().
    session.read.json(path).copyInto("T1")
    path

    The path to the JSON file (including the stage name).

    returns

    A CopyableDataFrame

    Since

    0.1.0

  15. final def ne ( arg0: AnyRef ) : Boolean
    Definition Classes
    AnyRef
  16. final def notify () : Unit
    Definition Classes
    AnyRef
    Annotations
    @native ()
  17. final def notifyAll () : Unit
    Definition Classes
    AnyRef
    Annotations
    @native ()
  18. def option ( key: String , value: Any ) : DataFrameReader

    Sets the specified option in the DataFrameReader.

    Sets the specified option in the DataFrameReader.

    Use this method to configure any format-specific options and copy options . (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)

    Example 1: Loading a LZO compressed Parquet file.

    // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
    val df = session.read.option("compression", "lzo").parquet(filePath)
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = df.collect()

    Example 2: Loading an uncompressed JSON file.

    // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
    val df = session.read.option("compression", "none").json(filePath)
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = df.collect()

    Example 3: Loading the first two columns of a colon-delimited CSV file in which the first line is the header:

    import com.snowflake.snowpark.types._
    // Define the schema for the data in the CSV files.
    val userSchema = StructType(Seq(StructField("a", IntegerType), StructField("b", StringType)))
    // Create a DataFrame that is configured to load data from the CSV file.
    val csvDF = session.read.option("field_delimiter", ":").option("skip_header", 1).schema(userSchema).csv(filePath)
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = csvDF.collect()

    In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.

    Example 4: Loading only the CSV files from a stage location.

    import com.snowflake.snowpark.types._
    // Define the schema for the data in the CSV files.
    val userSchema: StructType = StructType(Seq(StructField("a", IntegerType),StructField("b", StringType)))
    // Create a DataFrame that is configured to load data from the CSV files in the stage.
    val csvDF = session.read.option("pattern", ".*[.]csv").schema(userSchema).csv("@stage_location")
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = csvDF.collect()
    key

    Name of the option (e.g. compression , skip_header , etc.).

    value

    Value of the option.

    returns

    A DataFrameReader

    Since

    0.1.0

  19. def options ( configs: Map [ String , Any ] ) : DataFrameReader

    Sets multiple specified options in the DataFrameReader.

    Sets multiple specified options in the DataFrameReader.

    Use this method to configure any format-specific options and copy options . (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)

    In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.

    Example 1: Loading a LZO compressed Parquet file and removing any white space from the fields.

    // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
    val df = session.read.option(Map("compression"-> "lzo", "trim_space" -> true)).parquet(filePath)
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = df.collect()
    configs

    Map of the names of options (e.g. compression , skip_header , etc.) and their corresponding values.

    returns

    A DataFrameReader

    Since

    0.1.0

  20. def orc ( path: String ) : CopyableDataFrame

    Returns a DataFrame that is set up to load data from the specified ORC file.

    Returns a DataFrame that is set up to load data from the specified ORC file.

    This method only supports reading data from files in Snowflake stages.

    Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

    For example:

    // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
    val df = session.read.orc(path).where(col("$1:num") > 1)
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = df.collect()

    If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

    For example: The following example loads the ORC files in the stage location specified by path to the table T1 .

    // The table "T1" should exist before calling copyInto().
    session.read.orc(path).copyInto("T1")
    path

    The path to the ORC file (including the stage name).

    returns

    A CopyableDataFrame

    Since

    0.1.0

  21. def parquet ( path: String ) : CopyableDataFrame

    Returns a DataFrame that is set up to load data from the specified Parquet file.

    Returns a DataFrame that is set up to load data from the specified Parquet file.

    This method only supports reading data from files in Snowflake stages.

    Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

    For example:

    // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
    val df = session.read.parquet(path).where(col("$1:num") > 1)
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = df.collect()

    If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

    For example: The following example loads the Parquet files in the stage location specified by path to the table T1 .

    // The table "T1" should exist before calling copyInto().
    session.read.parquet(path).copyInto("T1")
    path

    The path to the Parquet file (including the stage name).

    returns

    A CopyableDataFrame

    Since

    0.1.0

  22. def schema ( schema: StructType ) : DataFrameReader

    Returns a DataFrameReader instance with the specified schema configuration for the data to be read.

    Returns a DataFrameReader instance with the specified schema configuration for the data to be read.

    To define the schema for the data that you want to read, use a types.StructType object.

    schema

    Schema configuration for the data to be read.

    returns

    A DataFrameReader

    Since

    0.1.0

  23. final def synchronized [ T0 ] ( arg0: ⇒ T0 ) : T0
    Definition Classes
    AnyRef
  24. def table ( name: String ) : DataFrame

    Returns a DataFrame that is set up to load data from the specified table.

    Returns a DataFrame that is set up to load data from the specified table.

    For the name argument, you can specify an unqualified name (if the table is in the current database and schema) or a fully qualified name ( db.schema.name ).

    Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

    name

    Name of the table to use.

    returns

    A DataFrame

    Since

    0.1.0

  25. def toString () : String
    Definition Classes
    AnyRef → Any
  26. final def wait () : Unit
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... )
  27. final def wait ( arg0: Long , arg1: Int ) : Unit
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... )
  28. final def wait ( arg0: Long ) : Unit
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... ) @native ()
  29. def xml ( path: String ) : CopyableDataFrame

    Returns a DataFrame that is set up to load data from the specified XML file.

    Returns a DataFrame that is set up to load data from the specified XML file.

    This method only supports reading data from files in Snowflake stages.

    Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

    For example:

    // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
    val df = session.read.xml(path).where(col("xmlget($1, 'num', 0):\"$\"") > 1)
    // Load the data into the DataFrame and return an Array of Rows containing the results.
    val results = df.collect()

    If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

    For example: The following example loads the XML files in the stage location specified by path to the table T1 .

    // The table "T1" should exist before calling copyInto().
    session.read.xml(path).copyInto("T1")
    path

    The path to the XML file (including the stage name).

    returns

    A CopyableDataFrame

    Since

    0.1.0

Inherited from AnyRef

Inherited from Any

Ungrouped