Load data from cloud storage: Microsoft Azure

Introduction

This tutorial shows you how to load data from Microsoft Azure cloud storage into Snowflake using SQL. You can access a pre-loaded Snowsight template worksheet to complete these tasks.

Note

Snowflake bills a minimal amount for the on-disk storage used for any sample data in this tutorial. The tutorial provides steps to drop objects and minimize storage cost. Snowflake requires a virtual warehouse to load the data and execute queries. A running virtual warehouse consumes Snowflake credits.

If you are using a 30-day trial account, which provides free credits, you won’t incur any costs.

What you will learn

In this tutorial you will learn how to:

  • Use a role that has the privileges to create and use the Snowflake objects required by this tutorial.
  • Use a warehouse to access resources.
  • Select a database and schema to use for the session.
  • Create a table.
  • Create a storage integration for your cloud platform.
  • Create a stage for your storage integration.
  • Load data into the table from the stage.
  • Query the data in the table.

Prerequisites

This tutorial assumes the following:

Step 1. Sign in using Snowsight

To access Snowsight over the public Internet, do the following:

  1. In a supported web browser, navigate to https://app.snowflake.com.
  2. Provide your account identifier or account URL. If you’ve previously signed in to Snowsight, you might see an account name that you can select.
  3. Sign in using your Snowflake account credentials.

Step 2. Open the SQL worksheet for loading data from Microsoft Azure

You can use worksheets to write and run SQL commands on your database. You can access a pre-loaded template worksheet for this tutorial. The worksheet has the SQL commands that you will run to create database objects, load data, and query the data. Because it is a template worksheet, you will be invited to enter your own values for certain SQL parameters. For more information about worksheets, see Getting started with worksheets.

The worksheet for this tutorial is not pre-loaded into the trial account. To open the worksheet for this tutorial, follow these steps:

To open the pre-loaded template worksheet, follow these steps:

  1. In the navigation menu, select Projects » Templates.

  2. Find and open Load data from Microsoft Azure.

    The beginning of your worksheet looks similar to the following image:

SQL load from cloud template worksheet, which contains the SQL commands for this tutorial, along with descriptive comments.

Step 3. Set the role and warehouse to use

The role you use determines the privileges you have. In this tutorial, use the ACCOUNTADMIN system role so that you can view and manage objects in your account. For more information, see Using the ACCOUNTADMIN Role.

A warehouse provides the compute resources that you need to execute DML operations, load data, and run queries. These resources include CPU, memory, and temporary storage. You can use the SNOWFLAKE_LEARNING_WH warehouse for this tutorial. For more information, see Virtual warehouses.

To set the role and warehouse to use, do the following:

  1. In the open worksheet, place your cursor in the USE ROLE line.

    USE ROLE ACCOUNTADMIN;
  2. At the top of the worksheet, select Run.

    Note

    In this tutorial, run SQL statements one at a time. Don’t select Run all.

  3. Place your cursor in the USE WAREHOUSE line, then select Run.

    USE WAREHOUSE SNOWFLAKE_LEARNING_WH;

Step 4. Set up a table that you can load

A database is a repository for your data. The data is stored in tables that you can manage and query. A schema is a logical grouping of database objects, such as tables and views. For example, a schema might contain the database objects required for a specific application. For more information, see Databases, Tables and Views - Overview.

In this tutorial, you use the database SNOWFLAKE_LEARNING_DB, a schema that concatenates your username with _LOAD_DATA_FROM_MICROSOFT_AZURE, and a table that you create named calendar.

To select this database and schema for use in the session and create the table, do the following:

  1. In the open worksheet, place your cursor in the USE DATABASE line, then select Run.

    USE DATABASE SNOWFLAKE_LEARNING_DB;
  2. Place your cursor in each SET line, then select Run.

    SET user_name = current_user();
    SET schema_name = CONCAT($user_name, '_LOAD_DATA_FROM_MICROSOFT_AZURE');
  3. Place your cursor in the USE SCHEMA IDENTIFIER line, then select Run.

    USE SCHEMA IDENTIFIER($schema_name);
  4. Place your cursor in the CREATE TABLE lines, complete the table definition, add an optional comment, and select Run. For example, the following table contains six columns:

    CREATE OR REPLACE TABLE calendar
      (
      full_date DATE
      ,day_name VARCHAR(10)
      ,month_name VARCHAR(10)
      ,day_number VARCHAR(2)
      ,full_year VARCHAR(4)
      ,holiday BOOLEAN
      )
      COMMENT = 'Table to be loaded from Azure calendar data file';
  5. To confirm that the table was created successfully, place your cursor in the SELECT line, then select Run.

    SELECT * FROM calendar;

    The output shows the columns of the table you created. Currently, the table doesn’t have any rows.

Step 5. Create a storage integration

Before you can load data from cloud storage, you must configure a storage integration that is specific to your cloud provider. The following example is specific to Microsoft Azure storage.

Storage integrations are named, first-class Snowflake objects that avoid the need for passing explicit cloud provider credentials such as secret keys or access tokens. Integration objects store an Azure identity and access management (IAM) user ID called the app registration.

To create a storage integration for Azure, do the following:

  1. Use the Azure portal to configure an Azure container for loading data. For details, see Configure an Azure container for loading data.

  2. In the open worksheet, place your cursor in the in the CREATE STORAGE INTEGRATION lines, define the required parameters, and select Run. For example:

    CREATE OR REPLACE STORAGE INTEGRATION azure_data_integration
      TYPE = EXTERNAL_STAGE
      STORAGE_PROVIDER = 'AZURE'
      AZURE_TENANT_ID = '075f576e-6f9b-4955-8e99-4086736225d9'
      ENABLED = TRUE
      STORAGE_ALLOWED_LOCATIONS = ('azure://tutorial99.blob.core.windows.net/snow-tutorial-container/');

    Set AZURE_TENANT_ID to the Office 365 tenant ID for the storage account that contains the allowed storage locations that you want to use. You can find this ID in the Azure portal under Microsoft Entra ID > Properties > Tenant ID. (Microsoft Entra ID is the new name for Azure Active Directory.)

    Set STORAGE_ALLOWED_LOCATIONS to the path for the Azure container where your source data file is stored. Use the format shown in this example, where tutorial99 is the storage account name and snow-tutorial-container is the container name.

  3. Place your cursor in the DESCRIBE INTEGRATION line, specify the name of the storage integration you created, and select Run.

    DESCRIBE INTEGRATION azure_data_integration;

    This command retrieves the AZURE_CONSENT_URL and AZURE_MULTI_TENANT_APP_NAME for the client application that was created automatically for your Snowflake account. You will use these values to configure permissions for Snowflake in the Azure portal.

    The output for this command looks similar to the following:

    Output of DESCRIBE INTEGRATION command, with the following columns: property, property_type, property_value, property_default.
  4. Place your cursor in the SHOW INTEGRATIONS line and select Run. This command returns information about the storage integration you created.

    SHOW INTEGRATIONS;

    The output for this command looks similar to the following:

    Output of SHOW INTEGRATIONS command, with the following columns: name, type, category, enabled, comment, created_on.
  5. Use the Azure portal to configure permissions for the client application (which was created automatically for your trial account) to access storage containers. Follow Step 2: Grant Snowflake Access to the Storage Locations under Configure an Azure container for loading data.

Step 6. Create a stage

A stage is a location that holds data files to load into a Snowflake database. This tutorial creates a stage that can load data from a specific type of cloud storage, such as an Azure container.

To create a stage, do the following:

  1. In the open worksheet, place your cursor in the CREATE STAGE lines, specify a name, the storage integration you created, the bucket URL, and the correct file format, then select Run. For example:

    CREATE OR REPLACE STAGE cloud_data_db.azure_data.azuredata_stage
      STORAGE_INTEGRATION = azure_data_integration
      URL = 'azure://tutorial99.blob.core.windows.net/snow-tutorial-container/'
      FILE_FORMAT = (TYPE = CSV);
  2. Return information about the stage you created:

    SHOW STAGES;

    The output for this command looks similar to the following:

    Output of SHOW STAGES command, with the following columns: created_on, name, database_name, schema_name, url.

Step 7. Load data from the stage

Load the table from the stage you created by using the COPY INTO <table> command. For more information about loading from Azure containers, see Copy data from an Azure stage.

To load the data into the table, place your cursor in the COPY INTO lines, specify the table name, the stage you created, and name of the file (or files) you want to load, then select Run. For example:

COPY INTO cloud_data_db.azure_data.calendar
  FROM @cloud_data_db.azure_data.azuredata_stage
    FILES = ('calendar.txt');

Your output looks similar to the following image:

Five rows are copied into the table. The output has the following columns: file, status, rows_parsed, rows_loaded, error_limit.

Step 8. Query the table

Now that the data is loaded, you can run queries on the calendar table.

To run a query in the open worksheet, select the line or lines of the SELECT command, and then select Run. For example, run the following query:

SELECT * FROM calendar;

Your output looks similar to the following image:

All the rows in the table are selected. This example has full_date, day_name, month_name, day_num, year_num, and holiday columns.

Step 9. Cleanup, summary, and additional resources

Congratulations! You have successfully completed this tutorial for trial accounts.

Take a few minutes to review a short summary and the key points covered in the tutorial. You might also want to consider cleaning up by dropping any objects you created in the tutorial. For example, you might want to drop the table you created and loaded:

DROP TABLE calendar;

As long as they are no longer needed, you can also drop the other objects you created, such as the storage integration and stage. For details, see Data Definition Language (DDL) commands.

Summary and key points

In summary, you used a pre-loaded template worksheet in Snowsight to complete the following steps:

  1. Set the role and warehouse to use.
  2. Select a database and schema to use for the session.
  3. Create a table.
  4. Create a storage integration and configure permissions on cloud storage.
  5. Create a stage and load the data from the stage into the table.
  6. Query the data.

Here are some key points to remember about loading and querying data:

  • You need the required permissions to create and manage objects in your account. In this tutorial, you use the ACCOUNTADMIN system role for these privileges.

    This role is not normally used to create objects. Instead, we recommend creating a hierarchy of roles aligned with business functions in your organization. For more information, see Using the ACCOUNTADMIN Role.

  • You need a warehouse for the resources required to create and manage objects and run SQL commands. This tutorial uses the SNOWFLAKE_LEARNING_WH warehouse included with the template environment.

  • You used a database to store the data and a schema to group the database objects logically.

  • You created a storage integration and a stage to load data from a CSV file stored in an Azure container.

  • After the data was loaded into your database, you queried it using SELECT statements.

What’s next?

Continue learning about Snowflake using the following resources: