Class CopyableDataFrame

  • All Implemented Interfaces:
    Cloneable

    public class CopyableDataFrame
    extends DataFrame
    DataFrame for loading data from files in a stage to a table. Objects of this type are returned by the DataFrameReader methods that load data from files (e.g. DataFrameReader.csv()).

    To save the data from the staged files to a table, call the `copyInto()` methods. This method uses the COPY INTO `table_name` command to copy the data to a specified table.

    Since:
    1.1.0
    • Method Detail

      • copyInto

        public void copyInto​(String tableName)
        Executes a `COPY INTO 'table_name'` command to load data from files in a stage into a specified table.

        copyInto is an action method (like the 'collect' method), so calling the method executes the SQL statement to copy the data.

        For example, the following code loads data from the path specified by `myFileStage` to the table `T`:

        
         session.read().schema(userSchema).csv(myFileStage).copyInto("T");
         
        Parameters:
        tableName - Name of the table where the data should be saved.
        Since:
        1.1.0
      • copyInto

        public void copyInto​(String tableName,
                             Column[] transformations)
        Executes a `COPY INTO 'table_name'` command with the specified transformations to load data from files in a stage into a specified table.

        copyInto is an action method (like the 'collect' method), so calling the method executes the SQL statement to copy the data.

        When copying the data into the table, you can apply transformations to the data from the files to: Rename the columns, Change the order of the columns, Omit or insert columns, Cast the value in a column to a specific type

        You can use the same techniques described in Transforming Data During Load expressed as a Seq of Column expressions that correspond to the SELECT statement parameters in the `COPY INTO 'table_name'` command.

        For example, the following code loads data from the path specified by `myFileStage` to the table `T`. The example transforms the data from the file by inserting the value of the first column into the first column of table `T` and inserting the length of that value into the second column of table `T`.

        
         Column[] transformations = {Functions.col("$1"), Functions.length(Functions.col("$1"))};
         session.read().schema(userSchema).csv(myFileStage).copyInto("T", transformations)
         
        Parameters:
        tableName - Name of the table where the data should be saved.
        transformations - Seq of Column expressions that specify the transformations to apply (similar to transformation parameters).
        Since:
        1.1.0
      • copyInto

        public void copyInto​(String tableName,
                             Column[] transformations,
                             Map<String,​?> options)
        Executes a `COPY INTO 'table_name'` command with the specified transformations to load data from files in a stage into a specified table.

        copyInto is an action method (like the 'collect' method), so calling the method executes the SQL statement to copy the data.

        In addition, you can specify format type options or copy options that determine how the copy operation should be performed.

        When copying the data into the table, you can apply transformations to the data from the files to: Rename the columns, Change the order of the columns, Omit or insert columns, Cast the value in a column to a specific type

        You can use the same techniques described in Transforming Data During Load expressed as a Seq of Column expressions that correspond to the SELECT statement parameters in the `COPY INTO 'table_name'` command.

        For example, the following code loads data from the path specified by `myFileStage` to the table `T`. The example transforms the data from the file by inserting the value of the first column into the first column of table `T` and inserting the length of that value into the second column of table `T`. The example also uses a Map to set the FORCE and skip_header options for the copy operation.

        
         Map<String, Object> options = new HashMap<>();
         options.put("FORCE", "TRUE");
         options.put("skip_header", 1);
         Column[] transformations = {Functions.col("$1"), Functions.length(Functions.col("$1"))};
         session.read().schema(userSchema).csv(myFileStage).copyInto("T", transformations, options);
         
        Parameters:
        tableName - Name of the table where the data should be saved.
        transformations - Seq of Column expressions that specify the transformations to apply (similar to transformation parameters).
        options - Map of the names of options (e.g. compression, skip_header, etc.) and their corresponding values.NOTE: By default, the CopyableDataFrame object uses the options set in the DataFrameReader used to create that object. You can use this options parameter to override the default options or set additional options.
        Since:
        1.1.0
      • copyInto

        public void copyInto​(String tableName,
                             String[] targetColumnNames,
                             Column[] transformations,
                             Map<String,​?> options)
        Executes a `COPY INTO 'table_name'` command with the specified transformations to load data from files in a stage into a specified table.

        copyInto is an action method (like the 'collect' method), so calling the method executes the SQL statement to copy the data.

        In addition, you can specify format type options or copy options that determine how the copy operation should be performed.

        When copying the data into the table, you can apply transformations to the data from the files to: Rename the columns, Change the order of the columns, Omit or insert columns, Cast the value in a column to a specific type

        You can use the same techniques described in Transforming Data During Load expressed as a Seq of Column expressions that correspond to the SELECT statement parameters in the `COPY INTO 'table_name'` command.

        You can specify a subset of the table columns to copy into. The number of provided column names must match the number of transformations.

        For example, suppose the target table `T` has 3 columns: "ID", "A" and "A_LEN". "ID" is an `AUTOINCREMENT` column, which should be exceluded from this copy into action. The following code loads data from the path specified by `myFileStage` to the table `T`. The example transforms the data from the file by inserting the value of the first column into the column `A` and inserting the length of that value into the column `A_LEN`. The example also uses a Map to set the FORCE and skip_header options for the copy operation.

        
         Map<String, Object> options = new HashMap<>();
         options.put("FORCE", "TRUE");
         options.put("skip_header", 1);
         Column[] transformations = {Functions.col("$1"), Functions.length(Functions.col("$1"))};
         String[] targetColumnNames = {"A", "A_LEN"};
         session.read().schema(userSchema).csv(myFileStage).copyInto("T", targetColumnNames, transformations, options);
         
        Parameters:
        tableName - Name of the table where the data should be saved.
        targetColumnNames - Name of the columns in the table where the data should be saved.
        transformations - Seq of Column expressions that specify the transformations to apply (similar to transformation parameters).
        options - Map of the names of options (e.g. compression, skip_header, etc.) and their corresponding values.NOTE: By default, the CopyableDataFrame object uses the options set in the DataFrameReader used to create that object. You can use this options parameter to override the default options or set additional options.
        Since:
        1.1.0
      • clone

        public CopyableDataFrame clone()
        Returns a clone of this CopyableDataFrame.
        Overrides:
        clone in class DataFrame
        Returns:
        A CopyableDataFrame
        Since:
        1.1.0
      • async

        public CopyableDataFrameAsyncActor async()
        Returns a CopyableDataFrameAsyncActor object that can be used to execute CopyableDataFrame actions asynchronously.
        Overrides:
        async in class DataFrame
        Returns:
        A CopyableDataFrameAsyncActor object
        Since:
        1.2.0