Using Snowflake Copilot

This topic gives an introduction to what Snowflake Copilot is and how to use it in your data analysis workflow.

Introduction

Snowflake Copilot is an LLM-powered assistant that simplifies data analysis while maintaining robust data governance, and seamlessly integrates into your existing Snowflake workflow.

Snowflake Copilot is powered by a model fine-tuned by Snowflake that runs securely inside Snowflake Cortex, Snowflake’s intelligent, fully managed AI service. This approach means that your enterprise data and metadata always stay securely inside Snowflake. Snowflake Copilot also fully respects RBAC and provides suggestions based only on the datasets that you can access.

Snowflake Copilot uses natural language requests to enable data analysis from start to finish. To start, Copilot can help answer questions about how your data is structured and guide you in exploring a new dataset. You can then ask Copilot to generate and refine SQL queries to extract useful information from your data. Snowflake Copilot can even help improve your SQL query by recommending optimizations or suggesting fixes for possible issues.

Snowflake Copilot can also help improve your SQL fluency or understanding of Snowflake features. Ask questions about how to perform a task in Snowflake and Copilot will return answers based on the Snowflake documentation.

You can interact with Copilot in SQL Worksheets in Snowsight. Using the Copilot panel, you can enter a question, and Snowflake Copilot will reply with an answer. You can run suggested SQL queries in your worksheet.

Supported use cases

  • Explore your data by asking open-ended questions to learn about the structure and nuances of a new dataset.

  • Generate SQL queries with questions in plain English.

  • Try out the SQL query suggested by Snowflake Copilot with the click of a button. You can also edit the query before running it.

  • Build complex queries through a conversation with Snowflake Copilot by asking follow-up questions to refine the suggested SQL query and dig deeper into the analysis.

  • Learn about Snowflake by asking questions about Snowflake concepts, capabilities, and features.

  • Improve your queries by asking Snowflake Copilot to help you assess query efficiency, find optimizations, or explain what the query does.

  • Provide feedback (thumbs up or thumbs down) on each response from Snowflake Copilot, which will be used to improve the product.

Limitations

  • Limited language support: Today, only English and SQL are supported.

  • No access to your data: Snowflake Copilot does not have access to the data inside your tables. If you want to filter on a particular value of a column, you should provide that value. For example, if you ask Snowflake Copilot to return all rows with a column A value equal to “X”, you should provide the value “X” in your request. See the Construct and run a SQL Statement example.

  • Cross database or schema queries are not supported: You can work around this by creating and using views that join data from different schemas and databases.

  • Delayed response: Snowflake Copilot might take a second to complete a response, depending on the length of the response provided.

  • SQL suggestions may not always work: Snowflake Copilot may sometimes suggest queries that contain invalid SQL syntax or non-existent tables or columns. Please provide feedback using the thumbs up or thumbs down buttons for the particular response. This feedback helps us improve this feature.

  • Delay in detecting new databases, schemas, and tables: It may take up to 3-4 hours for Snowflake Copilot to recognize newly created databases, schemas, and tables.

  • Limited number of tables and columns considered: To generate a response, Snowflake Copilot first searches for tables and columns most relevant for your request. The search results are then ranked by relevancy and only the top 10 tables and top 10 columns from each of those tables in the results are considered when generating a response.

How to use Snowflake Copilot

Snowflake Copilot is ready to use with no additional setup. Remember the following points when using Snowflake Copilot:

  • Each chat session with Snowflake Copilot is associated with a particular worksheet. Opening a new worksheet opens a new chat session.

  • You must have a database and schema in use during your session to use Snowflake Copilot. Copilot uses them to generate relevant responses.

  • Snowflake Copilot uses the names of your databases, schemas, tables, and columns and also the data types of your columns to determine what data is available to query.

  • If Snowflake Copilot cannot answer your question based on the selected database and schema, it may try to use other ways to answer, such as the Snowflake documentation or general SQL knowledge. If you get an unexpected response, you can leave feedback using the thumbs up and thumbs down buttons.

  • If you need to refer to a table name or a column name in your question, prefix the name with @. Referring to specific tables and columns can help Snowflake Copilot provide more accurate responses.

  • For optimal performance, use meaningful names for databases, schemas, tables, and columns, and ensure that columns are assigned the appropriate data type.

Follow these steps to start using Snowflake Copilot:

  1. Create a new worksheet or open an existing worksheet.

  2. Select Ask Copilot in the lower-right corner of the worksheet. The Snowflake Copilot panel opens on the right side of the worksheet.

  3. Make sure a database and a schema are selected for the current worksheet. If not, you can select them by using either the selector on the top of the worksheet or the selector below the Snowflake Copilot message box.

  4. In the message box, type in your question and then select the send icon or press Enter to submit it. Snowflake Copilot provides a response in the panel.

  5. If the response from Snowflake Copilot includes SQL statements:

    • Select Run to run the query. This adds the query to your worksheet and runs it.

    • Select Add to edit the query before running it. This adds the query to your worksheet.

Examples

The following sections provide examples that demonstrate how to:

These examples use a sample dataset from the Snowflake Marketplace.

Prerequisites

The examples in this section use the Cybersyn Github Archive dataset from the Snowflake Marketplace:

  1. Install the Cybersyn Github Archive dataset in your account.

  2. Create a new worksheet or open an existing worksheet.

  3. Select Ask Copilot in the lower-right corner of the worksheet.

  4. Select the Cybersyn Github Archive database and schema.

Explore your data

The following example demonstrates how to use Snowflake Copilot to explore a dataset.

  1. Enter an open-ended question such as “What types of questions can I ask about this dataset?”

  2. Press Enter and Snowflake Copilot will generate a response based on the database and schema you’ve selected.

  3. Ask further clarifying questions about the data, such as “What type of events can I filter by?” or “Are any of these tables joinable?”

  4. If the response from Snowflake Copilot includes a SQL statement, you can select Add to add the query to the end of your worksheet and edit it before running or select Run to add the query and run it automatically.

Construct and run a SQL Statement

The following example demonstrates how to use Snowflake Copilot to generate SQL queries.

  1. Enter the question “How many stars were given in the past year?” in the Snowflake Copilot message box, and press Enter. Snowflake Copilot responds with a SQL query that answers your question.

  2. Select Add to add the query to the end of your worksheet.

  3. Enter the question “Show me this for each month,” and press Enter. Snowflake Copilot responds with a SQL query that answers your question.

  4. Select Run to add the query to your worksheet and run the query.

Snowflake Copilot does not have access to the data inside your tables. If you want Snowflake Copilot to construct a SQL statement that filters based on a specific value of a column, you must provide the value to filter on.

  1. Enter the question “what are all the repo names that start with ‘snowflake’?” in the message box and press Enter. Snowflake Copilot responds with a SQL query that uses the filter value you provided.

  2. Select Add to edit the query before running or select Run to add the query to your worksheet and run it.

Explain a SQL statement

The following example demonstrates how to use Snowflake Copilot to explain a SQL statement you’re working on.

  • In the Snowflake Copilot message box, type the following question and SQL query:

    Can you explain this query to me step-by-step?
    
    Copy
    SELECT
      github_repos.repo_name,
      COUNT(github_stars.repo_id) AS total_stars
    FROM
      github_repos
      JOIN github_stars ON github_repos.repo_id = github_stars.repo_id
    GROUP BY
      github_repos.repo_name
    ORDER BY
      total_stars DESC;
    
    Copy

Snowflake Copilot responds with a step-by-step explanation of the provided query.

Ask questions about SQL and Snowflake

Snowflake Copilot has access to Snowflake documentation and can answer general questions about Snowflake or SQL. Here are some example questions you can try:

  • How do I write a SQL join?

  • What is Snowpark Cortex?

  • How do I ingest data into Snowflake?

Tips for using Snowflake Copilot

  • Creating curated views can significantly improve the performance of Snowflake Copilot.

    Follow these guidelines when creating the views:

    Guideline

    Example

    Use descriptive and easy-to-understand names for the views and their columns.

    When choosing the names, use the business and data taxonomy you are likely to use while using Snowflake Copilot.

    If a column contains the date for a specific sale, name the column sale_date.

    Make sure all columns have the appropriate data type.

    If a column contains the date for a specific sale, make sure it has the DATE type.

    Define commonly used metrics/expressions as new columns.

    If profit is defined as revenue - cost, create a column (revenue - cost) AS profit in your view.

    If possible, capture common and complex joins.

    If two tables products and sales are often joined, make sure that your view joins these tables.

    If there are multiple join paths between commonly joined tables, use the preferred join path in your view.

  • Be as specific as possible when you ask a question. Imagine that you are asking a question to a human who may have limited knowledge of your data.

  • If you want to filter on specific values inside columns, you might need to actively guide Snowflake Copilot. You can ask Snowflake Copilot for a query that returns all the distinct values in a column.

Costs

Snowflake Copilot will be free until May 31, 2024. Details on pricing and billing are coming soon.