class DataFrameReader extends AnyRef
Provides methods to load data in various supported formats from a Snowflake stage to a DataFrame. The paths provided to the DataFrameReader must refer to Snowflake stages.
To use this object:
- Access an instance of a DataFrameReader by calling the Session.read method.
 - Specify any format-specific options and copy options by calling the option or options method. These methods return a DataFrameReader that is configured with these options. (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)
 - Specify the schema of the data that you plan to load by constructing a types.StructType object and passing it to the schema method. This method returns a DataFrameReader that is configured to read data that uses the specified schema.
 - Specify the format of the data by calling the method named after the format (e.g. csv , json , etc.). These methods return a DataFrame that is configured to load data in the specified format.
 - 
          Call a
          
           DataFrame
          
          method that performs an action.
          
- For example, to load the data from the file, call DataFrame.collect .
 - 
            As another example, to save the data from the file to a table, call
            
             CopyableDataFrame.copyInto(tableName:String)*
            
            .
      This uses the COPY INTO
            
<table_name>command. 
 
The following examples demonstrate how to use a DataFrameReader.
Example 1: Loading the first two columns of a CSV file and skipping the first header line.
// Import the package for StructType. import com.snowflake.snowpark.types._ val filePath = "@mystage1" // Define the schema for the data in the CSV file. val userSchema = StructType(Seq(StructField("a", IntegerType), StructField("b", StringType))) // Create a DataFrame that is configured to load data from the CSV file. val csvDF = session.read.option("skip_header", 1).schema(userSchema).csv(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = csvDF.collect()
Example 2: Loading a gzip compressed json file.
val filePath = "@mystage2/data.json.gz" // Create a DataFrame that is configured to load data from the gzipped JSON file. val jsonDF = session.read.option("compression", "gzip").json(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = jsonDF.collect()
If you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.
Example 3: Loading only the CSV files from a stage location.
import com.snowflake.snowpark.types._ // Define the schema for the data in the CSV files. val userSchema: StructType = StructType(Seq(StructField("a", IntegerType),StructField("b", StringType))) // Create a DataFrame that is configured to load data from the CSV files in the stage. val csvDF = session.read.option("pattern", ".*[.]csv").schema(userSchema).csv("@stage_location") // Load the data into the DataFrame and return an Array of Rows containing the results. val results = csvDF.collect()
         In addition, if you want to load the files from the stage into a specified table with COPY INTO
         
          <table_name>
         
         command, you can use a
         
          copyInto()
         
         method e.g.
         
          CopyableDataFrame.copyInto(tableName:String)*
         
         .
        
         
          Example 4:
         
         Loading data from a JSON file in a stage to a table by using COPY INTO
         
          <table_name>
         
         .
        
val filePath = "@mystage1" // Create a DataFrame that is configured to load data from the JSON file. val jsonDF = session.read.json(filePath) // Load the data into the specified table `T1`. // The table "T1" should exist before calling copyInto(). jsonDF.copyInto("T1")
- Since
 - 
         
0.1.0
 
- Alphabetic
 - By Inheritance
 
- DataFrameReader
 - AnyRef
 - Any
 
- Hide All
 - Show All
 
- Public
 - All
 
Value Members
- 
           
           
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             !=
            
            
             (
             
              arg0:
              
               Any
              
             
             )
            
            
             :
             
              Boolean
             
            
           
           
- Definition Classes
 - AnyRef → Any
 
 - 
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             ##
            
            
             ()
            
            
             :
             
              Int
             
            
           
           
- Definition Classes
 - AnyRef → Any
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             ==
            
            
             (
             
              arg0:
              
               Any
              
             
             )
            
            
             :
             
              Boolean
             
            
           
           
- Definition Classes
 - AnyRef → Any
 
 - 
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             asInstanceOf
            
            
             [
             
              T0
             
             ]
            
            
             :
             
              T0
             
            
           
           
- Definition Classes
 - Any
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             avro
            
            
             (
             
              path:
              
               String
              
             
             )
            
            
             :
             
              CopyableDataFrame
             
            
           
           
Returns a DataFrame that is set up to load data from the specified Avro file.
Returns a DataFrame that is set up to load data from the specified Avro file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
session.read.avro(path).where(col("$1:num") > 1)
If you want to use the
COPY INTO <table_name>command to load data from staged files to a specified table, call thecopyInto()method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the Avro files in the stage location specified by
pathto the tableT1.// The table "T1" should exist before calling copyInto(). session.read.avro(path).copyInto("T1")
- path
 - 
              
The path to the Avro file (including the stage name).
 - returns
 
- Since
 - 
              
0.1.0
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             clone
            
            
             ()
            
            
             :
             
              AnyRef
             
            
           
           
- Attributes
 - protected[ lang ]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws ( ... ) @native () @HotSpotIntrinsicCandidate ()
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             csv
            
            
             (
             
              path:
              
               String
              
             
             )
            
            
             :
             
              CopyableDataFrame
             
            
           
           
Returns a CopyableDataFrame that is set up to load data from the specified CSV file.
Returns a CopyableDataFrame that is set up to load data from the specified CSV file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
val filePath = "@mystage1/myfile.csv" // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.schema(userSchema).csv(fileInAStage).filter(col("a") < 2) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>command to load data from staged files to a specified table, call thecopyInto()method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the CSV files in the stage location specified by
pathto the tableT1.// The table "T1" should exist before calling copyInto(). session.read.schema(userSchema).csv(path).copyInto("T1")
- path
 - 
              
The path to the CSV file (including the stage name).
 - returns
 
- Since
 - 
              
0.1.0
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             eq
            
            
             (
             
              arg0:
              
               AnyRef
              
             
             )
            
            
             :
             
              Boolean
             
            
           
           
- Definition Classes
 - AnyRef
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             equals
            
            
             (
             
              arg0:
              
               Any
              
             
             )
            
            
             :
             
              Boolean
             
            
           
           
- Definition Classes
 - AnyRef → Any
 
 - 
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             getClass
            
            
             ()
            
            
             :
             
              Class
             
             [_]
            
           
           
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native () @HotSpotIntrinsicCandidate ()
 
 - 
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             hashCode
            
            
             ()
            
            
             :
             
              Int
             
            
           
           
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native () @HotSpotIntrinsicCandidate ()
 
 - 
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             isInstanceOf
            
            
             [
             
              T0
             
             ]
            
            
             :
             
              Boolean
             
            
           
           
- Definition Classes
 - Any
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             json
            
            
             (
             
              path:
              
               String
              
             
             )
            
            
             :
             
              CopyableDataFrame
             
            
           
           
Returns a DataFrame that is set up to load data from the specified JSON file.
Returns a DataFrame that is set up to load data from the specified JSON file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.json(path).where(col("$1:num") > 1) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>command to load data from staged files to a specified table, call thecopyInto()method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the JSON files in the stage location specified by
pathto the tableT1.// The table "T1" should exist before calling copyInto(). session.read.json(path).copyInto("T1")
- path
 - 
              
The path to the JSON file (including the stage name).
 - returns
 
- Since
 - 
              
0.1.0
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             ne
            
            
             (
             
              arg0:
              
               AnyRef
              
             
             )
            
            
             :
             
              Boolean
             
            
           
           
- Definition Classes
 - AnyRef
 
 - 
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             notify
            
            
             ()
            
            
             :
             
              Unit
             
            
           
           
- Definition Classes
 - AnyRef
 - Annotations
 - @native () @HotSpotIntrinsicCandidate ()
 
 - 
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             notifyAll
            
            
             ()
            
            
             :
             
              Unit
             
            
           
           
- Definition Classes
 - AnyRef
 - Annotations
 - @native () @HotSpotIntrinsicCandidate ()
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             option
            
            
             (
             
              key:
              
               String
              
             
             ,
             
              value:
              
               Any
              
             
             )
            
            
             :
             
              DataFrameReader
             
            
           
           
Sets the specified option in the DataFrameReader.
Sets the specified option in the DataFrameReader.
Use this method to configure any format-specific options and copy options . (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)
Example 1: Loading a LZO compressed Parquet file.
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.option("compression", "lzo").parquet(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
Example 2: Loading an uncompressed JSON file.
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.option("compression", "none").json(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
Example 3: Loading the first two columns of a colon-delimited CSV file in which the first line is the header:
import com.snowflake.snowpark.types._ // Define the schema for the data in the CSV files. val userSchema = StructType(Seq(StructField("a", IntegerType), StructField("b", StringType))) // Create a DataFrame that is configured to load data from the CSV file. val csvDF = session.read.option("field_delimiter", ":").option("skip_header", 1).schema(userSchema).csv(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = csvDF.collect()
In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.
Example 4: Loading only the CSV files from a stage location.
import com.snowflake.snowpark.types._ // Define the schema for the data in the CSV files. val userSchema: StructType = StructType(Seq(StructField("a", IntegerType),StructField("b", StringType))) // Create a DataFrame that is configured to load data from the CSV files in the stage. val csvDF = session.read.option("pattern", ".*[.]csv").schema(userSchema).csv("@stage_location") // Load the data into the DataFrame and return an Array of Rows containing the results. val results = csvDF.collect()
- key
 - 
              
Name of the option (e.g.
compression,skip_header, etc.). - value
 - 
              
Value of the option.
 - returns
 
- Since
 - 
              
0.1.0
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             options
            
            
             (
             
              configs:
              
               Map
              
              [
              
               String
              
              ,
              
               Any
              
              ]
             
             )
            
            
             :
             
              DataFrameReader
             
            
           
           
Sets multiple specified options in the DataFrameReader.
Sets multiple specified options in the DataFrameReader.
Use this method to configure any format-specific options and copy options . (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)
In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.
Example 1: Loading a LZO compressed Parquet file and removing any white space from the fields.
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.option(Map("compression"-> "lzo", "trim_space" -> true)).parquet(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
- configs
 - 
              
Map of the names of options (e.g.
compression,skip_header, etc.) and their corresponding values. - returns
 
- Since
 - 
              
0.1.0
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             orc
            
            
             (
             
              path:
              
               String
              
             
             )
            
            
             :
             
              CopyableDataFrame
             
            
           
           
Returns a DataFrame that is set up to load data from the specified ORC file.
Returns a DataFrame that is set up to load data from the specified ORC file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.orc(path).where(col("$1:num") > 1) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>command to load data from staged files to a specified table, call thecopyInto()method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the ORC files in the stage location specified by
pathto the tableT1.// The table "T1" should exist before calling copyInto(). session.read.orc(path).copyInto("T1")
- path
 - 
              
The path to the ORC file (including the stage name).
 - returns
 
- Since
 - 
              
0.1.0
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             parquet
            
            
             (
             
              path:
              
               String
              
             
             )
            
            
             :
             
              CopyableDataFrame
             
            
           
           
Returns a DataFrame that is set up to load data from the specified Parquet file.
Returns a DataFrame that is set up to load data from the specified Parquet file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.parquet(path).where(col("$1:num") > 1) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>command to load data from staged files to a specified table, call thecopyInto()method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the Parquet files in the stage location specified by
pathto the tableT1.// The table "T1" should exist before calling copyInto(). session.read.parquet(path).copyInto("T1")
- path
 - 
              
The path to the Parquet file (including the stage name).
 - returns
 
- Since
 - 
              
0.1.0
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             schema
            
            
             (
             
              schema:
              
               StructType
              
             
             )
            
            
             :
             
              DataFrameReader
             
            
           
           
Returns a DataFrameReader instance with the specified schema configuration for the data to be read.
Returns a DataFrameReader instance with the specified schema configuration for the data to be read.
To define the schema for the data that you want to read, use a types.StructType object.
- schema
 - 
              
Schema configuration for the data to be read.
 - returns
 
- Since
 - 
              
0.1.0
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             synchronized
            
            
             [
             
              T0
             
             ]
            
            
             (
             
              arg0: ⇒
              
               T0
              
             
             )
            
            
             :
             
              T0
             
            
           
           
- Definition Classes
 - AnyRef
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             table
            
            
             (
             
              name:
              
               String
              
             
             )
            
            
             :
             
              DataFrame
             
            
           
           
Returns a DataFrame that is set up to load data from the specified table.
Returns a DataFrame that is set up to load data from the specified table.
For the
nameargument, you can specify an unqualified name (if the table is in the current database and schema) or a fully qualified name (db.schema.name).Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
- name
 - 
              
Name of the table to use.
 - returns
 
- Since
 - 
              
0.1.0
 
 - 
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             toString
            
            
             ()
            
            
             :
             
              String
             
            
           
           
- Definition Classes
 - AnyRef → Any
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             wait
            
            
             (
             
              arg0:
              
               Long
              
             
             ,
             
              arg1:
              
               Int
              
             
             )
            
            
             :
             
              Unit
             
            
           
           
- Definition Classes
 - AnyRef
 - Annotations
 - @throws ( ... )
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             wait
            
            
             (
             
              arg0:
              
               Long
              
             
             )
            
            
             :
             
              Unit
             
            
           
           
- Definition Classes
 - AnyRef
 - Annotations
 - @throws ( ... ) @native ()
 
 - 
           
           
           
            
             
              
             
            
           
           
            
             final
            
            
             def
            
           
           
            
             wait
            
            
             ()
            
            
             :
             
              Unit
             
            
           
           
- Definition Classes
 - AnyRef
 - Annotations
 - @throws ( ... )
 
 - 
           
           
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             xml
            
            
             (
             
              path:
              
               String
              
             
             )
            
            
             :
             
              CopyableDataFrame
             
            
           
           
Returns a DataFrame that is set up to load data from the specified XML file.
Returns a DataFrame that is set up to load data from the specified XML file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.xml(path).where(col("xmlget($1, 'num', 0):\"$\"") > 1) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>command to load data from staged files to a specified table, call thecopyInto()method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the XML files in the stage location specified by
pathto the tableT1.// The table "T1" should exist before calling copyInto(). session.read.xml(path).copyInto("T1")
- path
 - 
              
The path to the XML file (including the stage name).
 - returns
 
- Since
 - 
              
0.1.0
 
 
Deprecated Value Members
- 
           
           
           
            
             
              
             
            
           
           
            
            
            
             def
            
           
           
            
             finalize
            
            
             ()
            
            
             :
             
              Unit
             
            
           
           
- Attributes
 - protected[ lang ]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws ( classOf[java.lang.Throwable] ) @Deprecated
 - Deprecated