class DataFrameReader extends AnyRef
Provides methods to load data in various supported formats from a Snowflake stage to a DataFrame. The paths provided to the DataFrameReader must refer to Snowflake stages.
To use this object:
- Access an instance of a DataFrameReader by calling the Session.read method.
- Specify any format-specific options and copy options by calling the option or options method. These methods return a DataFrameReader that is configured with these options. (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)
- Specify the schema of the data that you plan to load by constructing a types.StructType object and passing it to the schema method. This method returns a DataFrameReader that is configured to read data that uses the specified schema.
- Specify the format of the data by calling the method named after the format (e.g. csv , json , etc.). These methods return a DataFrame that is configured to load data in the specified format.
-
Call a
DataFrame
method that performs an action.
- For example, to load the data from the file, call DataFrame.collect .
-
As another example, to save the data from the file to a table, call
CopyableDataFrame.copyInto(tableName:String)*
.
This uses the COPY INTO
<table_name>
command.
The following examples demonstrate how to use a DataFrameReader.
Example 1: Loading the first two columns of a CSV file and skipping the first header line.
// Import the package for StructType. import com.snowflake.snowpark.types._ val filePath = "@mystage1" // Define the schema for the data in the CSV file. val userSchema = StructType(Seq(StructField("a", IntegerType), StructField("b", StringType))) // Create a DataFrame that is configured to load data from the CSV file. val csvDF = session.read.option("skip_header", 1).schema(userSchema).csv(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = csvDF.collect()
Example 2: Loading a gzip compressed json file.
val filePath = "@mystage2/data.json.gz" // Create a DataFrame that is configured to load data from the gzipped JSON file. val jsonDF = session.read.option("compression", "gzip").json(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = jsonDF.collect()
If you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.
Example 3: Loading only the CSV files from a stage location.
import com.snowflake.snowpark.types._ // Define the schema for the data in the CSV files. val userSchema: StructType = StructType(Seq(StructField("a", IntegerType),StructField("b", StringType))) // Create a DataFrame that is configured to load data from the CSV files in the stage. val csvDF = session.read.option("pattern", ".*[.]csv").schema(userSchema).csv("@stage_location") // Load the data into the DataFrame and return an Array of Rows containing the results. val results = csvDF.collect()
In addition, if you want to load the files from the stage into a specified table with COPY INTO
<table_name>
command, you can use a
copyInto()
method e.g.
CopyableDataFrame.copyInto(tableName:String)*
.
Example 4:
Loading data from a JSON file in a stage to a table by using COPY INTO
<table_name>
.
val filePath = "@mystage1" // Create a DataFrame that is configured to load data from the JSON file. val jsonDF = session.read.json(filePath) // Load the data into the specified table `T1`. // The table "T1" should exist before calling copyInto(). jsonDF.copyInto("T1")
- Since
-
0.1.0
- Alphabetic
- By Inheritance
- DataFrameReader
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=
(
arg0:
Any
)
:
Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##
()
:
Int
- Definition Classes
- AnyRef → Any
-
final
def
==
(
arg0:
Any
)
:
Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf
[
T0
]
:
T0
- Definition Classes
- Any
-
def
avro
(
path:
String
)
:
CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified Avro file.
Returns a DataFrame that is set up to load data from the specified Avro file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
session.read.avro(path).where(col("$1:num") > 1)
If you want to use the
COPY INTO <table_name>
command to load data from staged files to a specified table, call thecopyInto()
method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the Avro files in the stage location specified by
path
to the tableT1
.// The table "T1" should exist before calling copyInto(). session.read.avro(path).copyInto("T1")
- path
-
The path to the Avro file (including the stage name).
- returns
- Since
-
0.1.0
-
def
clone
()
:
AnyRef
- Attributes
- protected[ lang ]
- Definition Classes
- AnyRef
- Annotations
- @throws ( ... ) @native () @HotSpotIntrinsicCandidate ()
-
def
csv
(
path:
String
)
:
CopyableDataFrame
Returns a CopyableDataFrame that is set up to load data from the specified CSV file.
Returns a CopyableDataFrame that is set up to load data from the specified CSV file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
val filePath = "@mystage1/myfile.csv" // Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.schema(userSchema).csv(fileInAStage).filter(col("a") < 2) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>
command to load data from staged files to a specified table, call thecopyInto()
method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the CSV files in the stage location specified by
path
to the tableT1
.// The table "T1" should exist before calling copyInto(). session.read.schema(userSchema).csv(path).copyInto("T1")
- path
-
The path to the CSV file (including the stage name).
- returns
- Since
-
0.1.0
-
final
def
eq
(
arg0:
AnyRef
)
:
Boolean
- Definition Classes
- AnyRef
-
def
equals
(
arg0:
Any
)
:
Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass
()
:
Class
[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native () @HotSpotIntrinsicCandidate ()
-
def
hashCode
()
:
Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native () @HotSpotIntrinsicCandidate ()
-
final
def
isInstanceOf
[
T0
]
:
Boolean
- Definition Classes
- Any
-
def
json
(
path:
String
)
:
CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified JSON file.
Returns a DataFrame that is set up to load data from the specified JSON file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.json(path).where(col("$1:num") > 1) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>
command to load data from staged files to a specified table, call thecopyInto()
method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the JSON files in the stage location specified by
path
to the tableT1
.// The table "T1" should exist before calling copyInto(). session.read.json(path).copyInto("T1")
- path
-
The path to the JSON file (including the stage name).
- returns
- Since
-
0.1.0
-
final
def
ne
(
arg0:
AnyRef
)
:
Boolean
- Definition Classes
- AnyRef
-
final
def
notify
()
:
Unit
- Definition Classes
- AnyRef
- Annotations
- @native () @HotSpotIntrinsicCandidate ()
-
final
def
notifyAll
()
:
Unit
- Definition Classes
- AnyRef
- Annotations
- @native () @HotSpotIntrinsicCandidate ()
-
def
option
(
key:
String
,
value:
Any
)
:
DataFrameReader
Sets the specified option in the DataFrameReader.
Sets the specified option in the DataFrameReader.
Use this method to configure any format-specific options and copy options . (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)
Example 1: Loading a LZO compressed Parquet file.
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.option("compression", "lzo").parquet(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
Example 2: Loading an uncompressed JSON file.
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.option("compression", "none").json(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
Example 3: Loading the first two columns of a colon-delimited CSV file in which the first line is the header:
import com.snowflake.snowpark.types._ // Define the schema for the data in the CSV files. val userSchema = StructType(Seq(StructField("a", IntegerType), StructField("b", StringType))) // Create a DataFrame that is configured to load data from the CSV file. val csvDF = session.read.option("field_delimiter", ":").option("skip_header", 1).schema(userSchema).csv(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = csvDF.collect()
In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.
Example 4: Loading only the CSV files from a stage location.
import com.snowflake.snowpark.types._ // Define the schema for the data in the CSV files. val userSchema: StructType = StructType(Seq(StructField("a", IntegerType),StructField("b", StringType))) // Create a DataFrame that is configured to load data from the CSV files in the stage. val csvDF = session.read.option("pattern", ".*[.]csv").schema(userSchema).csv("@stage_location") // Load the data into the DataFrame and return an Array of Rows containing the results. val results = csvDF.collect()
- key
-
Name of the option (e.g.
compression
,skip_header
, etc.). - value
-
Value of the option.
- returns
- Since
-
0.1.0
-
def
options
(
configs:
Map
[
String
,
Any
]
)
:
DataFrameReader
Sets multiple specified options in the DataFrameReader.
Sets multiple specified options in the DataFrameReader.
Use this method to configure any format-specific options and copy options . (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)
In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.
Example 1: Loading a LZO compressed Parquet file and removing any white space from the fields.
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.option(Map("compression"-> "lzo", "trim_space" -> true)).parquet(filePath) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
- configs
-
Map of the names of options (e.g.
compression
,skip_header
, etc.) and their corresponding values. - returns
- Since
-
0.1.0
-
def
orc
(
path:
String
)
:
CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified ORC file.
Returns a DataFrame that is set up to load data from the specified ORC file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.orc(path).where(col("$1:num") > 1) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>
command to load data from staged files to a specified table, call thecopyInto()
method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the ORC files in the stage location specified by
path
to the tableT1
.// The table "T1" should exist before calling copyInto(). session.read.orc(path).copyInto("T1")
- path
-
The path to the ORC file (including the stage name).
- returns
- Since
-
0.1.0
-
def
parquet
(
path:
String
)
:
CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified Parquet file.
Returns a DataFrame that is set up to load data from the specified Parquet file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.parquet(path).where(col("$1:num") > 1) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>
command to load data from staged files to a specified table, call thecopyInto()
method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the Parquet files in the stage location specified by
path
to the tableT1
.// The table "T1" should exist before calling copyInto(). session.read.parquet(path).copyInto("T1")
- path
-
The path to the Parquet file (including the stage name).
- returns
- Since
-
0.1.0
-
def
schema
(
schema:
StructType
)
:
DataFrameReader
Returns a DataFrameReader instance with the specified schema configuration for the data to be read.
Returns a DataFrameReader instance with the specified schema configuration for the data to be read.
To define the schema for the data that you want to read, use a types.StructType object.
- schema
-
Schema configuration for the data to be read.
- returns
- Since
-
0.1.0
-
final
def
synchronized
[
T0
]
(
arg0: ⇒
T0
)
:
T0
- Definition Classes
- AnyRef
-
def
table
(
name:
String
)
:
DataFrame
Returns a DataFrame that is set up to load data from the specified table.
Returns a DataFrame that is set up to load data from the specified table.
For the
name
argument, you can specify an unqualified name (if the table is in the current database and schema) or a fully qualified name (db.schema.name
).Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
- name
-
Name of the table to use.
- returns
- Since
-
0.1.0
-
def
toString
()
:
String
- Definition Classes
- AnyRef → Any
-
final
def
wait
(
arg0:
Long
,
arg1:
Int
)
:
Unit
- Definition Classes
- AnyRef
- Annotations
- @throws ( ... )
-
final
def
wait
(
arg0:
Long
)
:
Unit
- Definition Classes
- AnyRef
- Annotations
- @throws ( ... ) @native ()
-
final
def
wait
()
:
Unit
- Definition Classes
- AnyRef
- Annotations
- @throws ( ... )
-
def
xml
(
path:
String
)
:
CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified XML file.
Returns a DataFrame that is set up to load data from the specified XML file.
This method only supports reading data from files in Snowflake stages.
Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).
For example:
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage. val df = session.read.xml(path).where(col("xmlget($1, 'num', 0):\"$\"") > 1) // Load the data into the DataFrame and return an Array of Rows containing the results. val results = df.collect()
If you want to use the
COPY INTO <table_name>
command to load data from staged files to a specified table, call thecopyInto()
method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).For example: The following example loads the XML files in the stage location specified by
path
to the tableT1
.// The table "T1" should exist before calling copyInto(). session.read.xml(path).copyInto("T1")
- path
-
The path to the XML file (including the stage name).
- returns
- Since
-
0.1.0
Deprecated Value Members
-
def
finalize
()
:
Unit
- Attributes
- protected[ lang ]
- Definition Classes
- AnyRef
- Annotations
- @throws ( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated