Class DataFrameReader


  • public class DataFrameReader
    extends Object
    Provides methods to load data in various supported formats from a Snowflake stage to a DataFrame. The paths provided to the DataFrameReader must refer to Snowflake stages.
    Since:
    1.1.0
    • Method Detail

      • table

        public DataFrame table​(String name)
        Returns a DataFrame that is set up to load data from the specified table.

        For the name argument, you can specify an unqualified name (if the table is in the current database and schema) or a fully qualified name (`db.schema.name`).

        Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect, DataFrame.count, etc.).

        Parameters:
        name - Name of the table to use.
        Returns:
        A new DataFrame
        Since:
        1.1.0
      • schema

        public DataFrameReader schema​(StructType schema)
        Returns a DataFrameReader instance with the specified schema configuration for the data to be read.

        To define the schema for the data that you want to read, use a types.StructType object.

        Parameters:
        schema - Schema configuration for the data to be read.
        Returns:
        A reference of this DataFrameReader
        Since:
        1.1.0
      • csv

        public CopyableDataFrame csv​(String path)
        Returns a CopyableDataFrame that is set up to load data from the specified CSV file.

        This method only supports reading data from files in Snowflake stages.

        Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect, DataFrame.count, etc.).

        For example:

        
         String filePath = "@myStage/myFile.csv";
         DataFrame df = session.read().schema(userSchema).csv(filePath);
         
        If you want to use the `COPY INTO 'table_name'` command to load data from staged files to a specified table, call the `copyInto()` method (e.g. CopyableDataFrame.copyInto).

        For example: The following example loads the CSV files in the stage location specified by `path` to the table `T1`.

        
         session.read().schema(userSchema).csv(path).copyInto("T1")
         
        Parameters:
        path - The path to the CSV file (including the stage name).
        Returns:
        A CopyableDataFrame
        Since:
        1.1.0
      • json

        public CopyableDataFrame json​(String path)
        Returns a CopyableDataFrame that is set up to load data from the specified JSON file.

        This method only supports reading data from files in Snowflake stages.

        Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect, DataFrame.count, etc.).

        For example:

        
         DataFrame df = session.read().json(path);
         
        If you want to use the `COPY INTO 'table_name'` command to load data from staged files to a specified table, call the `copyInto()` method (e.g. CopyableDataFrame.copyInto).

        For example: The following example loads the json files in the stage location specified by `path` to the table `T1`.

        
         session.read().json(path).copyInto("T1")
         
        Parameters:
        path - The path to the JSON file (including the stage name).
        Returns:
        A new DataFrame
        Since:
        1.1.0
      • avro

        public CopyableDataFrame avro​(String path)
        Returns a CopyableDataFrame that is set up to load data from the specified Avro file.

        This method only supports reading data from files in Snowflake stages.

        Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect, DataFrame.count, etc.).

        For example:

        
         session.read().avro(path).where(Functions.sqlExpr("$1:col").gt(Functions.lit(1)));
         
        If you want to use the `COPY INTO 'table_name'` command to load data from staged files to a specified table, call the `copyInto()` method (e.g. CopyableDataFrame.copyInto).

        For example: The following example loads the avro files in the stage location specified by `path` to the table `T1`.

        
         session.read().avro(path).copyInto("T1")
         
        Parameters:
        path - The path to the Avro file (including the stage name).
        Returns:
        A new DataFrame
        Since:
        1.1.0
      • parquet

        public CopyableDataFrame parquet​(String path)
        Returns a CopyableDataFrame that is set up to load data from the specified Parquet file.

        This method only supports reading data from files in Snowflake stages.

        Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect, DataFrame.count, etc.).

        For example:

        
         session.read().parquet(path).where(Functions.sqlExpr("$1:col").gt(Functions.lit(1)));
         
        If you want to use the `COPY INTO 'table_name'` command to load data from staged files to a specified table, call the `copyInto()` method (e.g. CopyableDataFrame.copyInto).

        For example: The following example loads the parquet files in the stage location specified by `path` to the table `T1`.

        
         session.read().parquet(path).copyInto("T1")
         
        Parameters:
        path - The path to the Parquet file (including the stage name).
        Returns:
        A DataFrame
        Since:
        1.1.0
      • orc

        public CopyableDataFrame orc​(String path)
        Returns a CopyableDataFrame that is set up to load data from the specified ORC file.

        This method only supports reading data from files in Snowflake stages.

        Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect, DataFrame.count, etc.).

        For example:

        
         session.read().orc(path).where(Functions.sqlExpr("$1:col").gt(Functions.lit(1)));
         
        If you want to use the `COPY INTO 'table_name'` command to load data from staged files to a specified table, call the `copyInto()` method (e.g. CopyableDataFrame.copyInto).

        For example: The following example loads the ORC files in the stage location specified by `path` to the table `T1`.

        
         session.read().orc(path).copyInto("T1")
         
        Parameters:
        path - The path to the ORC file (including the stage name).
        Returns:
        A DataFrame
        Since:
        1.1.0
      • xml

        public CopyableDataFrame xml​(String path)
        Returns a CopyableDataFrame that is set up to load data from the specified XML file.

        This method only supports reading data from files in Snowflake stages.

        Note that the data is not loaded in the DataFrame until you call a method that performs an action (e.g. DataFrame.collect, DataFrame.count, etc.).

        For example:

        
         session.read().parquet(path).where(Functions
           .sqlExpr("xmlget($1, 'num', 0):\"$\"").gt(Functions.lit(1)));
         
        If you want to use the `COPY INTO 'table_name'` command to load data from staged files to a specified table, call the `copyInto()` method (e.g. CopyableDataFrame.copyInto).

        For example: The following example loads the XML files in the stage location specified by `path` to the table `T1`.

        
         session.read().xml(path).copyInto("T1")
         
        Parameters:
        path - The path to the XML file (including the stage name).
        Returns:
        A DataFrame
        Since:
        1.1.0
      • option

        public DataFrameReader option​(String key,
                                      Object value)
        Sets the specified option in the DataFrameReader.

        Use this method to configure any format-specific options and copy options (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)

        In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.

        
         session.read().option("field_delimiter", ";").option("skip_header", 1)
           .schema(schema).csv(path);
         
        Parameters:
        key - Name of the option (e.g. compression, skip_header, etc.).
        value - Value of the option.
        Returns:
        A reference of this DataFrameReader
        Since:
        1.1.0
      • options

        public DataFrameReader options​(Map<String,​Object> configs)
        Sets multiple specified options in the DataFrameReader.

        Use this method to configure any format-specific options and copy options (Note that although specifying copy options can make error handling more robust during the reading process, it may have an effect on performance.)

        In addition, if you want to load only a subset of files from the stage, you can use the pattern option to specify a regular expression that matches the files that you want to load.

        
         Map<String, Object> configs = new HashMap<>();
         configs.put("field_delimiter", ";");
         configs.put("skip_header", 1);
         session.read().options(configs).schema(schema).csv(path);
         
        Parameters:
        configs - Map of the names of options (e.g. compression, skip_header, etc.) and their corresponding values.
        Returns:
        A reference of this DataFrameReader
        Since:
        1.1.0