Getting started tutorial: dbt Projects on Snowflake¶
This tutorial guides you through creating a workspace for dbt Projects on Snowflake that is connected to a GitHub repository that you fork from our getting-started-with-dbt-on-snowflake repository in Snowflake Labs. You then use the workspace to update dbt project files, and test and run the dbt project, which materializes the data model output of the dbt project in target Snowflake databases and schemas. You deploy the project to create a dbt project object on Snowflake. Finally, you set up a task to execute the project on a schedule that you define.
Prerequisites¶
GitHub
A GitHub account that can create a repository and manage access to that repository.
Git on the command line. For more information about installation, see Set up Git.
Snowflake
A Snowflake account and user with privileges as described in Access control for dbt projects on Snowflake.
Privileges to create and edit the following objects or access to an administrator who can create each of them on your behalf:
An API integration
If your GitHub repository is private, a secret
A network rule
An external access integration that references the network rule
Your user object
Set up your environment¶
Complete the following steps to set up your environment for this tutorial:
Fork and clone the dbt Projects on Snowflake getting started repository
(Optional) Create a warehouse for executing workspace actions
Create a database and schema for integrations and model materializations
Create an API integration in Snowflake for connecting to GitHub
Create an external access integration in Snowflake for dbt dependencies
Fork and clone the dbt Projects on Snowflake getting started repository¶
Go to https://github.com/Snowflake-Labs/getting-started-with-dbt-on-snowflake, select the down arrow next to Fork, and then select Create a new fork.
Specify the owner and name of your forked repository and other details. Later in the tutorial, we use the following URL to represent your forked repository:
https://github.com/my-github-account/getting-started-with-dbt-on-snowflake.git
(Optional) Create a warehouse for executing workspace actions¶
A dedicated warehouse assigned to your workspace can help you log, trace, and identify actions initiated from within that workspace. In this tutorial, we use a warehouse named TASTY_BYTES_DBT_WH. Alternatively, you can use an existing warehouse in your account. For more information about creating a warehouse, see Creating a warehouse.
The Tasty Bytes data model that you create for source data is fairly large, so we recommend using an XL warehouse.
To create a warehouse, run the following SQL command:
CREATE WAREHOUSE tasty_bytes_dbt_wh WAREHOUSE_SIZE = XLARGE;
Create a database and schema for integrations and model materializations¶
This tutorial uses a database named TASTY_BYTES_DBT_DB. Within that database, you create a schema named INTEGRATIONS to store the objects that Snowflake needs for GitHub integration. You create schemas named DEV and PROD to store materialized objects that your dbt project creates.
To create the database and schema, run the following SQL commands:
CREATE DATABASE tasty_bytes_dbt_db;
CREATE SCHEMA tasty_bytes_dbt_db.integrations;
CREATE SCHEMA tasty_bytes_dbt_db.dev;
CREATE SCHEMA tasty_bytes_dbt_db.prod;
Create an API integration in Snowflake for connecting to GitHub¶
Snowflake needs an API integration to interact with GitHub.
If your repository is private, you must also create a secret in Snowflake to store GitHub credentials for your repository. You then specify the secret in the API integration definition as one of the ALLOWED_AUTHENTICATION_SECRETS. You also specify this secret when you create the workspace for your dbt project later in this tutorial.
Creating a secret requires a personal access token for your repository. For more information about creating a token, see Managing your personal access tokens in GitHub documentation.
In this tutorial, we use a secret named TB_DBT_GIT_SECRET. For more information about creating a secret, see Create a secret with credentials for authenticating.
To create a secret for GitHub, run the following SQL commands:
USE tasty_bytes_dbt_db.integrations;
CREATE OR REPLACE SECRET tasty_bytes_dbt_db.integrations.tb_dbt_git_secret
TYPE = password
USERNAME = 'your-gh-username'
PASSWORD = 'YOUR_PERSONAL_ACCESS_TOKEN';
To create an API integration for GitHub that uses the secret you just created, run the following SQL command. Replace https://github.com/my-github-account
with the SSH URL of the GitHub account for your forked repository:
CREATE OR REPLACE API INTEGRATION tb_dbt_git_api_integration
API_PROVIDER = git_https_api
API_ALLOWED_PREFIXES = ('https://github.com/my-github-account')
-- Comment out the following line if your forked repository is public
ALLOWED_AUTHENTICATION_SECRETS = (tasty_bytes_dbt_db.integrations.tb_dbt_git_secret)
ENABLED = TRUE;
Create an external access integration in Snowflake for dbt dependencies¶
When you run dbt commands in a workspace, dbt might need to access remote URLs to download dependencies. For example, dbt might need to download packages from the dbt Package hub or from GitHub.
Most dbt projects specify dependencies in their packages.yml
file. You must install these dependencies in the dbt project workspace. You can’t update a deployed dbt project object with dependencies.
To get dependency files from remote URLs, Snowflake needs an external access integration that relies on a network rule.
For more information about external access integrations in Snowflake, see Creating and using an external access integration.
To create a newtwork rule and an external access integration, run the following SQL commands:
-- Create NETWORK RULE for external access integration
CREATE OR REPLACE NETWORK RULE dbt_network_rule
MODE = EGRESS
TYPE = HOST_PORT
-- Minimal URL allowlist that is required for dbt deps
VALUE_LIST = (
'hub.getdbt.com',
'codeload.github.com'
);
-- Create EXTERNAL ACCESS INTEGRATION for dbt access to external dbt package locations
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION dbt_ext_access
ALLOWED_NETWORK_RULES = (dbt_network_rule)
ENABLED = TRUE;
Create a workspace connected to your Git repository¶
In this step, you create a workspace in Snowsight that is connected to your GitHub repository. For more information about workspaces, see Workspaces.
Sign in to Snowsight.
From the left-side navigation menu, select Projects »
Workspaces
.From the Workspaces list above the workspace files area, under Create Workspace, select From Git repository. (The Workspaces list has a default selection of My Workspace.)
For Repository URL, enter the SSH URL of your forked GitHub repository; for example, https://github.com/my-github-account/getting-started-with-dbt-on-snowflake.git
For Workspace name, enter a name. Later in this tutorial, we use tasty_bytes_dbt.
Under API integration, select the name of the API integration that you created earlier; for example, TB_DBT_GIT_API_INTEGRATION.
If your GitHub repository is public, select Public repository, and then select Create.
If your GitHub repository is private, and you created a secret for your API integration during setup, do the following:
Select Personal access token.
Under Credentials secret, select Select database and schema.
Select the database from the list (for example, TASTY_BYTES_DBT_DB), and then select the schema from the list (for example, INTEGRATIONS) where you stored the API integration.
Select Select secret, and then select your secret from the list; for example, tb_dbt_git_secret.
Select Create.
Snowflake connects to the GitHub repository that you specified and opens your new workspace. A single folder in the workspace named
tasty_bytes_dbt_demo
contains the dbt project that you will work with.
Verify the contents of the profiles.yml file in your dbt project root¶
Each dbt project folder in your Snowflake workspace must contain a profiles.yml
file that specifies a target warehouse
, database
, schema
, and role
in Snowflake for the project. The type
must be set to snowflake
. dbt requires an account
and user
, but these can be left with an empty or arbitrary string because the dbt project runs in Snowflake under the current account and user context.
When you run dbt commands, your workspace reads the profiles.yml
file. When you have at least one valid target
specified in profiles.yml
, each target is available to select from the Profile list in the menu bar above the workspace editing pane. When you run a dbt command, the workspace uses the selected profile (target
) to run the command.
Open the tasty_bytes_dbt_demo/profiles.yml
file, and then verify that your contents match the following example. If you specified different database or warehouse names earlier, replace them with your own.
tasty_bytes:
target: dev
outputs:
dev:
type: snowflake
account: 'not needed'
user: 'not needed'
role: accountadmin
database: tasty_bytes_dbt_db
schema: dev
warehouse: tasty_bytes_dbt_wh
prod:
type: snowflake
account: 'not needed'
user: 'not needed'
role: accountadmin
database: tasty_bytes_dbt_db
schema: prod
warehouse: tasty_bytes_dbt_wh
Run the SQL commands in tasty_bytes_setup.sql to set up source data¶
As source data for its transformations, the dbt project in your repository uses the foundational data model for the fictitious Tasty Bytes food truck brand. The SQL script to create the data model is in the workspace.
In your workspace, navigate to the
tasty_bytes_dbt_demo/setup/tasty_bytes_setup.sql
file, and then open it.From the context selector in the upper right of the workspace editor, select the warehouse you created earlier; for example, TASTY_BYTES_DBT_WH.
The SQL file contains commands that you already ran in this tutorial. Near the beginning of the file, find the following commands, and then comment them out so that you don’t run them again and create duplicate resources:
CREATE OR REPLACE WAREHOUSE ...; CREATE OR REPLACE API INTEGRATION ...; CREATE OR REPLACE NETWORK RULE ...; CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION ...;
Run the uncommented SQL commands in the file.
Tip
Use cmd + Shift + Enter to run all uncommented commands.
The Output tab displays the following message:
tb_101 setup is now complete
Enable logging, tracing, and metrics¶
You can capture logging and tracing events for a dbt project object and for the task that runs it on a schedule, if applicable. For more information, see Monitor dbt Projects on Snowflake.
To enable this feature, you must set logging, tracing, and metrics on the schema where the dbt project object and task are deployed.
The following commands in the tasty_bytes_setup.sql
file enable logging, tracing, and metrics for the DEV and PROD schemas in the TASTY_BYTES_DBT_DB database. You ran these in the previous step. They are shown here for reference so that you can enable logging, tracing, and metrics for your own projects.
ALTER SCHEMA tasty_bytes_dbt_db.dev SET LOG_LEVEL = 'INFO';
ALTER SCHEMA tasty_bytes_dbt_db.dev SET TRACE_LEVEL = 'ALWAYS';
ALTER SCHEMA tasty_bytes_dbt_db.dev SET METRIC_LEVEL = 'ALL';
ALTER SCHEMA tasty_bytes_dbt_db.prod SET LOG_LEVEL = 'INFO';
ALTER SCHEMA tasty_bytes_dbt_db.prod SET TRACE_LEVEL = 'ALWAYS';
ALTER SCHEMA tasty_bytes_dbt_db.prod SET METRIC_LEVEL = 'ALL';
Execute the dbt deps command for your project¶
You can use the workspace to execute common dbt commands for a project. For a list of available commands, see Supported dbt commands. To run a command, you select the dbt Project, Profile, and dbt command from the lists above the workspace editor. You then select the execute button. Use the down arrow next to the execute button to specify additional arguments that the dbt command supports.
When you execute any dbt command within the workspace, the Output tab shows the command that executes on Snowflake (in green) and the stdout for that command so that you can monitor command success or failure.
The first command you must execute for any dbt project is deps
, which updates the dependencies for your project specified in your project’s packages.yml
file. Other commands will fail unless you have updated dependencies. For more information, see dbt dependencies.
Below the workspace editor, open the Output tab so that you can see stdout after you run dbt commands from the workspace.
From the menu bar above the workspace editor, confirm that the default Project (tasty_bytes_dbt_demo) is selected. You can have any Profile selected. This project has the profiles
dev
andprod
defined in theprofiles.yml
file.Select the Profile list, and then choose Enable secondary roles to enable secondary roles for this project and this session.
Note
You must enable secondary roles initially to run dbt commands for each project during each Snowsight session. Enterprise Edition accounts might be configured to disallow secondary roles for a specific user or all users. For more information, see Secondary roles are required per project, per session.
From the command list, select Deps.
Next to the execute button, select the down arrow.
In the dbt Deps window, leave Run with defaults selected, and then enter the name of the External Access Integration you created during setup in the space provided; for example, dbt_ext_access.
To run the command, select Deps.
The Output tab displays the SQL command that runs on Snowflake, which is similar to the following:
execute dbt project from workspace "USER$"."PUBLIC"."tasty_bytes_dbt" project_root='tasty_bytes_dbt_demo' args='deps --target dev' external_access_integrations = (dbt_ext_access)
When the command finishes, stdout messages appear that are similar to the following:
14:47:19 Running with dbt=1.8.9 14:47:19 Updating lock file in file path: /tmp/dbt/package-lock.yml 14:47:19 Installing dbt-labs/dbt_utils 14:47:19 Installed from version 1.3.0 14:47:19 Up to date! Uploading /tmp/dbt/package-lock.yml to snow://workspace/USER$ADMIN.PUBLIC."tasty_bytes_dbt"/versions/live/dbt//package-lock.yml
The
package_lock.yml
file is created and appears in your list of workspace files with an A next to it. This indicates that the file was added in the workspace for your dbt project, with contents that are similar to the following example:packages: - package: dbt-labs/dbt_utils version: 1.3.0
Compile the dbt project, view the DAG, and view compiled SQL¶
Compiling a project in dbt creates executable SQL from modeled SQL files and a visual representation of the directed acyclic graph (DAG) for the project in the workspace. For more information about dbt project compilation, see compile in dbt documentation.
After you compile the project in the workspace, you can view the DAG. You you also can open any SQL file in the models
folder to see the model SQL and the compiled SQL in side-by-side tabs.
Select the project and target that you want to compile.
From the command list, select Compile, and then select the execute button (optionally, you can select the down arrow and specify compile command arguments).
In the area below the workspace editor, select the DAG tab.
You can use the DAG pane to visualize your dbt project transformations from source files to materialized data model objects in Snowflake.
Click and drag anywhere in the pane to pan the view.
Use the + and – buttons to zoom in and out.
To view the contents of an object’s source file in the editor, select a tile for any object.
To see compiled SQL in a split-pane view in the workspace editor:
In the DAG, select the tile for a dbt SQL model file; for example, orders.
–OR–
From the workspace file listing, select any file in the
models
sudirectory of your dbt project to open it in the workspace editor.Choose View Compiled SQL in the upper-right of the workspace editor to see the compiled SQL in a split-pane view.
Run the dbt dev project and verify the materialized Snowflake objects¶
Executing the dbt run
command executes your compiled SQL against the target database and schema using the Snowflake warehouse and role that are specified in the profiles.yml
file of the project. In this step, we’ll materialize the output of the Dev
target in your dbt demo project. You then create a SQL worksheet named dbt_sandbox.sql
in the workspace where you can run SQL to verify object creation.
Important
Choosing the dbt Run or Build command for a project from within a workspace materializes target output using both the role
defined in the project’s profiles.yml
file and all secondary roles for your Snowflake user. Both the user and the role specified must have the required privileges to use the warehouse
, perform operations on the database
and schema
that are specified in the project’s profiles.yml
file, and perform operations on any other Snowflake objects that the dbt model specifies.
From the Profile list, select Dev.
From the command list, select Run, and then select the execute button.
The output pane shows the completion status of the run.
In your tasty_bytes_dbt_demo project, navigate to the
examples
folder, select the + next to the folder name, and then select SQL File.Enter dbt_sandbox.sql, and then press Enter.
In the workspace tab for
dbt_sandbox.sql
, run the following query:SHOW TABLES IN DATABASE tasty_bytes_dbt_db;
In the Status and Results pane, you should see the tables CUSTOMER_LOYALTY_METRICS, ORDERS, and SALES_METRICS_BY_LOCATION.
To see the views that your dbt project run created, run the following command :
SHOW VIEWS IN DATABASE tasty_bytes_dbt_db;
Push your file updates from the workspace to your repository¶
Now that you have updated your workspace and compiled, tested, run, and deployed your project as a dbt project object, you can push the changes you made in the workspace to your GitHub repository.
With your workspace open, select Changes.
The workspace file listing is filtered to show only files that have changed since you synchronized with the Git repository.
A indicates a file added in the workspace and not to the Git repository.
M indicates a modified file.
D indicates a deleted file.
Select a file to view its diff with GitHub since the last pull (in this case, when the workspace was created).
On the menu bar above the workspace file listing, verify that the branch selector is set to main for this tutorial.
Select the Push button, and then type a commit message in the box provided; for example, Updating project with initial changes from dbt on Snowflake.
Select Push.
A push to your repository might take several minutes.
Deploy the dbt project object from the workspace¶
Deploying your dbt project from a workspace creates a dbt project object. You can use the object to schedule, run, and monitor a dbt project in Snowflake outside of the workspace.
When you deploy your dbt project object from the workspace to a Snowflake database and schema, you can create or overwrite an object that you previously created.
On the right side of the workspace editor, select Connect » Deploy dbt project.
Select Select database and schema, and then select the TASTY_BYTES_DBT_DB database and the DEV schema.
Under Select or Create dbt Object, select Create dbt Object.
Under Enter Name, type TASTY_BYTES_DBT_PROJECT, and then select Deploy.
The Output tab displays the command that runs on Snowflake, which is similar to the following example:
create or replace DBT PROJECT "TASTY_BYTES_DBT_DB"."DEV"."TASTY_BYTES_DBT_PROJECT" from snow://workspace/USER$MYUSER.PUBLIC."tasty_bytes_dbt_demo"/versions/live/dbt tasty_bytes_dbt_project successfully created.
The Connect menu now displays the name of the the dbt project object that you created, with the following options:
Redeploy dbt project - Updates the dbt project object with the current workspace version of the project by using ALTER. This increments the version of the dbt project object by one. For more information, see Versioning for dbt project objects and files.
Disconnect - Disconnects the workspace from the dbt project object, but doesn’t delete the dbt project object.
View project - Opens the dbt project object in the object explorer, where you can view the CREATE DBT PROJECT command for the dbt project object and run history for the project.
Create schedule - Provides options for you to create a task that runs the dbt project object on a schedule. For more information, see Create a task to schedule dbt project execution.
View schedules - Opens a list of schedules (tasks) that run the dbt project object, with the option to view task details in the object explorer.
To verify the creation of the project, do one or both of the following tasks:
From the menu for the dbt project, select View project to open the dbt project object in the object explorer.
–OR–
From the
dbt_sandbox.sql
file worksheet that you created earlier, run the following command:SHOW DBT PROJECTS LIKE 'tasty%';
Create a task to schedule dbt project execution¶
Now that you have deployed your dbt project object, you can use the workspace or SQL to set up a task that executes a dbt command on your dbt project object.
The following steps set up a schedule to execute the dbt project every hour at one minute after the hour. The task executes the dbt run
command with the --select
option to run the customer_loyalty_metrics
model in the dbt project.
From the dbt project menu in the upper right of the workspace editor, choose Create schedule.
In the Schedule a dbt run dialog box, do the following:
For Schedule name, enter a name for the task; for example, run_prepped_data_dbt.
For Frequency, leave Hourly at 01 for your time zone selected.
Under dbt properties:
For Operation, select run.
For Profile, select dev.
For Additional flags, enter
--select customer_loyalty_metrics
.
Choose Create.
Snowflake creates a task that runs an EXECUTE DBT PROJECT command using these parameters. For more information about tasks and task options, see Introduction to tasks and CREATE TASK.
From the dbt project menu, select View schedules, and then choose your schedule from the list.
The object explorer opens to your database with the Task Details pane opened for the task. The Task Definition shows a CREATE TASK command similar to the following:
CREATE OR REPLACE TASK tasty_bytes_dbt_db.dev.run_prepped_data_dbt WAREHOUSE=tasty_bytes_dbt_wh SCHEDULE ='USING CRON 1 * * * * America/Los_Angeles' AS EXECUTE DBT PROJECT tasty_bytes_dbt_project ARGS='run --select customer_loyalty_metrics --target dev';
Clean up¶
You can delete the databases, workspaces, and warehouse that you created to clean up after this tutorial.
Run the following SQL commands from your dbt_sandbox.sql
worksheet to remove the warehouse, the TASTY_BYTES_DBT_DB and TB_101 databases that you created, and all schemas and objects created in the databases:
DROP WAREHOUSE IF EXISTS tasty_bytes_dbt_wh; DROP DATABASE IF EXISTS tasty_bytes_dbt_db; DROP DATABASE IF EXISTS tb_101;
To delete your tasty_bytes_dbt_demo workspace: