DataFrameReader

class DataFrameReader extends AnyRef

Provides methods to load data in various supported formats from a Snowflake stage to a DataFrame. The paths provided to the DataFrameReader must refer to Snowflake stages.

To use this object:

Access an instance of a DataFrameReader by calling the Session.read method.
Specify any format-specific options and copy options by calling the option or options method. These methods return a DataFrameReader that is configured with these options. (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)
Specify the schema of the data that you plan to load by constructing a types.StructType object and passing it to the schema method. This method returns a DataFrameReader that is configured to read data that uses the specified schema.
Specify the format of the data by calling the method named after the format (e.g. csv , json , etc.). These methods return a DataFrame that is configured to load data in the specified format.
Call a DataFrame method that performs an action.
- For example, to load the data from the file, call DataFrame.collect .
- As another example, to save the data from the file to a table, call CopyableDataFrame.copyInto(tableName:String)* . This uses the COPY INTO <table_name> command.

The following examples demonstrate how to use a DataFrameReader.

Example 1: Loading the first two columns of a CSV file and skipping the first header line.

// Import the package for StructType.
import com.snowflake.snowpark.types._
val filePath = "@mystage1"
// Define the schema for the data in the CSV file.
val userSchema = StructType(Seq(StructField("a", IntegerType), StructField("b", StringType)))
// Create a DataFrame that is configured to load data from the CSV file.
val csvDF = session.read.option("skip_header", 1).schema(userSchema).csv(filePath)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = csvDF.collect()

Example 2: Loading a gzip compressed json file.

val filePath = "@mystage2/data.json.gz"
// Create a DataFrame that is configured to load data from the gzipped JSON file.
val jsonDF = session.read.option("compression", "gzip").json(filePath)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = jsonDF.collect()

If you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.

Example 3: Loading only the CSV files from a stage location.

import com.snowflake.snowpark.types._
// Define the schema for the data in the CSV files.
val userSchema: StructType = StructType(Seq(StructField("a", IntegerType),StructField("b", StringType)))
// Create a DataFrame that is configured to load data from the CSV files in the stage.
val csvDF = session.read.option("pattern", ".*[.]csv").schema(userSchema).csv("@stage_location")
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = csvDF.collect()

In addition, if you want to load the files from the stage into a specified table with COPY INTO <table_name> command, you can use a copyInto() method e.g. CopyableDataFrame.copyInto(tableName:String)* .

Example 4: Loading data from a JSON file in a stage to a table by using COPY INTO <table_name> .

val filePath = "@mystage1"
// Create a DataFrame that is configured to load data from the JSON file.
val jsonDF = session.read.json(filePath)
// Load the data into the specified table `T1`.
// The table "T1" should exist before calling copyInto().
jsonDF.copyInto("T1")

Since: 0.1.0

Linear Supertypes

AnyRef , Any

Ordering

Alphabetic
By Inheritance

Inherited

DataFrameReader
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new DataFrameReader ( session: Session )

session

Snowflake Session

Value Members

final def != ( arg0: Any ) : Boolean

Definition Classes

AnyRef → Any
final def ## () : Int

Definition Classes

AnyRef → Any
final def == ( arg0: Any ) : Boolean

Definition Classes

AnyRef → Any
final def asInstanceOf [ T0 ] : T0

Definition Classes

Any
def avro ( path: String ) : CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified Avro file.
Returns a DataFrame that is set up to load data from the specified Avro file.

This method only supports reading data from files in Snowflake stages.

Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

For example:
```
session.read.avro(path).where(col("$1:num") > 1)
```
If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

For example: The following example loads the Avro files in the stage location specified by path to the table T1 .
```
// The table "T1" should exist before calling copyInto().
session.read.avro(path).copyInto("T1")
```
path

The path to the Avro file (including the stage name).

returns

A CopyableDataFrame

Since

0.1.0
def clone () : AnyRef

Attributes

protected[ lang ]

Definition Classes

AnyRef

Annotations

@throws ( ... ) @native () @HotSpotIntrinsicCandidate ()
def csv ( path: String ) : CopyableDataFrame
Returns a CopyableDataFrame that is set up to load data from the specified CSV file.
Returns a CopyableDataFrame that is set up to load data from the specified CSV file.

This method only supports reading data from files in Snowflake stages.

Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

For example:
```
val filePath = "@mystage1/myfile.csv"
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
val df = session.read.schema(userSchema).csv(fileInAStage).filter(col("a") < 2)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = df.collect()
```
If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

For example: The following example loads the CSV files in the stage location specified by path to the table T1 .
```
// The table "T1" should exist before calling copyInto().
session.read.schema(userSchema).csv(path).copyInto("T1")
```
path

The path to the CSV file (including the stage name).

returns

A CopyableDataFrame

Since

0.1.0
final def eq ( arg0: AnyRef ) : Boolean

Definition Classes

AnyRef
def equals ( arg0: Any ) : Boolean

Definition Classes

AnyRef → Any
final def getClass () : Class [_]

Definition Classes

AnyRef → Any

Annotations

@native () @HotSpotIntrinsicCandidate ()
def hashCode () : Int

Definition Classes

AnyRef → Any

Annotations

@native () @HotSpotIntrinsicCandidate ()
final def isInstanceOf [ T0 ] : Boolean

Definition Classes

Any
def json ( path: String ) : CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified JSON file.
Returns a DataFrame that is set up to load data from the specified JSON file.

This method only supports reading data from files in Snowflake stages.

Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

For example:
```
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
val df = session.read.json(path).where(col("$1:num") > 1)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = df.collect()
```
If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

For example: The following example loads the JSON files in the stage location specified by path to the table T1 .
```
// The table "T1" should exist before calling copyInto().
session.read.json(path).copyInto("T1")
```
path

The path to the JSON file (including the stage name).

returns

A CopyableDataFrame

Since

0.1.0
final def ne ( arg0: AnyRef ) : Boolean

Definition Classes

AnyRef
final def notify () : Unit

Definition Classes

AnyRef

Annotations

@native () @HotSpotIntrinsicCandidate ()
final def notifyAll () : Unit

Definition Classes

AnyRef

Annotations

@native () @HotSpotIntrinsicCandidate ()

def option ( key: String , value: Any ) : DataFrameReader

Sets the specified option in the DataFrameReader.

Use this method to configure any format-specific options and copy options . (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)

Example 1: Loading a LZO compressed Parquet file.

// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
val df = session.read.option("compression", "lzo").parquet(filePath)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = df.collect()

Example 2: Loading an uncompressed JSON file.

// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
val df = session.read.option("compression", "none").json(filePath)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = df.collect()

Example 3: Loading the first two columns of a colon-delimited CSV file in which the first line is the header:

import com.snowflake.snowpark.types._
// Define the schema for the data in the CSV files.
val userSchema = StructType(Seq(StructField("a", IntegerType), StructField("b", StringType)))
// Create a DataFrame that is configured to load data from the CSV file.
val csvDF = session.read.option("field_delimiter", ":").option("skip_header", 1).schema(userSchema).csv(filePath)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = csvDF.collect()

In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.

Example 4: Loading only the CSV files from a stage location.

import com.snowflake.snowpark.types._
// Define the schema for the data in the CSV files.
val userSchema: StructType = StructType(Seq(StructField("a", IntegerType),StructField("b", StringType)))
// Create a DataFrame that is configured to load data from the CSV files in the stage.
val csvDF = session.read.option("pattern", ".*[.]csv").schema(userSchema).csv("@stage_location")
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = csvDF.collect()

key: Name of the option (e.g. compression , skip_header , etc.).
value: Value of the option.
returns: A DataFrameReader

Since: 0.1.0

def options ( configs: Map [ String , Any ] ) : DataFrameReader
Sets multiple specified options in the DataFrameReader.
Sets multiple specified options in the DataFrameReader.

Use this method to configure any format-specific options and copy options . (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)

In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.

Example 1: Loading a LZO compressed Parquet file and removing any white space from the fields.
```
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
val df = session.read.option(Map("compression"-> "lzo", "trim_space" -> true)).parquet(filePath)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = df.collect()
```
configs

Map of the names of options (e.g. compression , skip_header , etc.) and their corresponding values.

returns

A DataFrameReader

Since

0.1.0
def orc ( path: String ) : CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified ORC file.
Returns a DataFrame that is set up to load data from the specified ORC file.

This method only supports reading data from files in Snowflake stages.

Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

For example:
```
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
val df = session.read.orc(path).where(col("$1:num") > 1)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = df.collect()
```
If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

For example: The following example loads the ORC files in the stage location specified by path to the table T1 .
```
// The table "T1" should exist before calling copyInto().
session.read.orc(path).copyInto("T1")
```
path

The path to the ORC file (including the stage name).

returns

A CopyableDataFrame

Since

0.1.0
def parquet ( path: String ) : CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified Parquet file.
Returns a DataFrame that is set up to load data from the specified Parquet file.

This method only supports reading data from files in Snowflake stages.

Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

For example:
```
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
val df = session.read.parquet(path).where(col("$1:num") > 1)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = df.collect()
```
If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

For example: The following example loads the Parquet files in the stage location specified by path to the table T1 .
```
// The table "T1" should exist before calling copyInto().
session.read.parquet(path).copyInto("T1")
```
path

The path to the Parquet file (including the stage name).

returns

A CopyableDataFrame

Since

0.1.0
def schema ( schema: StructType ) : DataFrameReader
Returns a DataFrameReader instance with the specified schema configuration for the data to be read.

Returns a DataFrameReader instance with the specified schema configuration for the data to be read.

To define the schema for the data that you want to read, use a types.StructType object.

schema

Schema configuration for the data to be read.

returns

A DataFrameReader

Since

0.1.0
final def synchronized [ T0 ] ( arg0: ⇒ T0 ) : T0

Definition Classes

AnyRef
def table ( name: String ) : DataFrame
Returns a DataFrame that is set up to load data from the specified table.

Returns a DataFrame that is set up to load data from the specified table.

For the name argument, you can specify an unqualified name (if the table is in the current database and schema) or a fully qualified name ( db.schema.name ).

Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

name

Name of the table to use.

returns

A DataFrame

Since

0.1.0
def toString () : String

Definition Classes

AnyRef → Any
final def wait ( arg0: Long , arg1: Int ) : Unit

Definition Classes

AnyRef

Annotations

@throws ( ... )
final def wait ( arg0: Long ) : Unit

Definition Classes

AnyRef

Annotations

@throws ( ... ) @native ()
final def wait () : Unit

Definition Classes

AnyRef

Annotations

@throws ( ... )
def xml ( path: String ) : CopyableDataFrame
Returns a DataFrame that is set up to load data from the specified XML file.
Returns a DataFrame that is set up to load data from the specified XML file.

This method only supports reading data from files in Snowflake stages.

Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect , DataFrame.count , etc.).

For example:
```
// Create a DataFrame that uses a DataFrameReader to load data from a file in a stage.
val df = session.read.xml(path).where(col("xmlget($1, 'num', 0):\"$\"") > 1)
// Load the data into the DataFrame and return an Array of Rows containing the results.
val results = df.collect()
```
If you want to use the COPY INTO <table_name> command to load data from staged files to a specified table, call the copyInto() method (e.g. CopyableDataFrame.copyInto(tableName:String)* ).

For example: The following example loads the XML files in the stage location specified by path to the table T1 .
```
// The table "T1" should exist before calling copyInto().
session.read.xml(path).copyInto("T1")
```
path

The path to the XML file (including the stage name).

returns

A CopyableDataFrame

Since

0.1.0

Deprecated Value Members

def finalize () : Unit

Attributes

protected[ lang ]

Definition Classes

AnyRef

Annotations

@throws ( classOf[java.lang.Throwable] ) @Deprecated

Deprecated

Packages

DataFrameReader

class DataFrameReader extends AnyRef

Instance Constructors

Value Members

Deprecated Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

DataFrameReader 

class DataFrameReader extends AnyRef

Instance Constructors

Value Members

Deprecated Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

DataFrameReader