Categories:

Query syntax

Working with Joins

A join combines rows from two tables to create a new combined row that can be used in the query.

Introduction

Joins are useful when the data in the tables is related. For example, one table might hold information about projects, and one table might hold information about employees working on those projects.

SELECT * FROM projects ORDER BY project_ID;
+------------+------------------+
| PROJECT_ID | PROJECT_NAME     |
|------------+------------------|
|       1000 | COVID-19 Vaccine |
|       1001 | Malaria Vaccine  |
|       1002 | NewProject       |
+------------+------------------+
Copy
SELECT * FROM employees ORDER BY employee_ID;
+-------------+-----------------+------------+
| EMPLOYEE_ID | EMPLOYEE_NAME   | PROJECT_ID |
|-------------+-----------------+------------|
|    10000001 | Terry Smith     | 1000       |
|    10000002 | Maria Inverness | 1000       |
|    10000003 | Pat Wang        | 1001       |
|    10000004 | NewEmployee     | NULL       |
+-------------+-----------------+------------+
Copy

The two joined tables usually contain one or more columns in common so that the rows in one table can be associated with the corresponding rows in the other table. For example, each row in the projects table might have a unique project ID number, and each row in the employees table might include the ID number of the project that the employee is currently assigned to.

The join operation specifies (explicitly or implicitly) how to relate rows in one table to the corresponding rows in the other table, typically by referencing the common column(s), such as project ID. For example, the following joins the project and employee tables shown above:

SELECT p.project_ID, project_name, employee_ID, employee_name, e.project_ID
    FROM projects AS p JOIN employees AS e
        ON e.project_ID = p.project_ID
    ORDER BY p.project_ID, e.employee_ID;
+------------+------------------+-------------+-----------------+------------+
| PROJECT_ID | PROJECT_NAME     | EMPLOYEE_ID | EMPLOYEE_NAME   | PROJECT_ID |
|------------+------------------+-------------+-----------------+------------|
|       1000 | COVID-19 Vaccine |    10000001 | Terry Smith     | 1000       |
|       1000 | COVID-19 Vaccine |    10000002 | Maria Inverness | 1000       |
|       1001 | Malaria Vaccine  |    10000003 | Pat Wang        | 1001       |
+------------+------------------+-------------+-----------------+------------+
Copy

Although a single join operation can join only two tables, joins can be chained together. The result of a join is a table-like object, and that table-like object can then be joined to another table-like object. Conceptually, the idea is similar to the following (this is not the actual syntax):

table1 join (table2 join table 3)
Copy

In this pseudo-code, table2 and table3 are joined first. The table that results from that join is then joined with table1.

Joins can be applied not only to tables, but also to other table-like objects. You can join:

  • A table.

  • A view (materialized or non-materialized).

  • A table literal.

  • An expression that evaluates to the equivalent of a table (containing one or more columns and zero or more rows). For example:

    • The result set returned by a table function.

    • The result set returned by a subquery that returns a table.

When this topic refers to joining a table, it generally means joining any table-like object.

Note

Snowflake can improve performance by eliminating unnecessary joins. For details, see Understanding How Snowflake Can Eliminate Redundant Joins.

Types of Joins

Snowflake supports the following types of joins:

  • Inner join.

  • Outer join.

  • Cross join.

  • Natural join.

  • ASOF JOIN.

Inner Join

An inner join pairs each row in one table with the matching row(s) in the other table.

The example below uses an inner join:

SELECT p.project_ID, project_name, employee_ID, employee_name, e.project_ID
    FROM projects AS p INNER JOIN employees AS e
        ON e.project_id = p.project_id
    ORDER BY p.project_ID, e.employee_ID;
+------------+------------------+-------------+-----------------+------------+
| PROJECT_ID | PROJECT_NAME     | EMPLOYEE_ID | EMPLOYEE_NAME   | PROJECT_ID |
|------------+------------------+-------------+-----------------+------------|
|       1000 | COVID-19 Vaccine |    10000001 | Terry Smith     | 1000       |
|       1000 | COVID-19 Vaccine |    10000002 | Maria Inverness | 1000       |
|       1001 | Malaria Vaccine  |    10000003 | Pat Wang        | 1001       |
+------------+------------------+-------------+-----------------+------------+
Copy

In this example, the output table contains two columns named “Project_ID”. One Project_ID column is from the projects table, and one is from the employees table. For each row in the output table, the values in the two Project_ID columns match because the query specified e.project_id = p.project_id.

The output includes only valid pairs (i.e. rows that match the join condition). In this example there is no row for the project named “NewProject” (which has no employees assigned yet) or the employee named “NewEmployee” (who hasn’t been assigned to any projects yet).

Outer Join

An outer join lists all rows in the specified table, even if those rows have no match in the other table. For example, a left outer join between projects and employees lists all projects, including projects that do not yet have any employee assigned.

SELECT p.project_name, e.employee_name
    FROM projects AS p LEFT OUTER JOIN employees AS e
        ON e.project_ID = p.project_ID
    ORDER BY p.project_name, e.employee_name;
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Maria Inverness |
| COVID-19 Vaccine | Terry Smith     |
| Malaria Vaccine  | Pat Wang        |
| NewProject       | NULL            |
+------------------+-----------------+
Copy

The project named “NewProject” is included in this output even though there is no matching row in the employees table. Because there are no matching employee names for the project named “NewProject”, the employee name is set to NULL.

A right outer join lists all employees (regardless of project).

SELECT p.project_name, e.employee_name
    FROM projects AS p RIGHT OUTER JOIN employees AS e
        ON e.project_ID = p.project_ID
    ORDER BY p.project_name, e.employee_name;
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Maria Inverness |
| COVID-19 Vaccine | Terry Smith     |
| Malaria Vaccine  | Pat Wang        |
| NULL             | NewEmployee     |
+------------------+-----------------+
Copy

A full outer join lists all projects and all employees.

SELECT p.project_name, e.employee_name
    FROM projects AS p FULL OUTER JOIN employees AS e
        ON e.project_ID = p.project_ID
    ORDER BY p.project_name, e.employee_name;
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Maria Inverness |
| COVID-19 Vaccine | Terry Smith     |
| Malaria Vaccine  | Pat Wang        |
| NewProject       | NULL            |
| NULL             | NewEmployee     |
+------------------+-----------------+
Copy

Cross Join

A cross join combines each row in the first table with each row in the second table, creating every possible combination of rows (called a “Cartesian product”). Because most of the result rows contain parts of rows that are not actually related, a cross join is rarely useful by itself. In fact, cross joins are usually the result of accidentally omitting the join condition.

The result of a cross join can be very large (and expensive). If the first table has N rows and the second table has M rows, then the result is N x M rows. For example, if the first table has 100 rows and the second table has 1000 rows, then the result set contains 100,000 rows.

The following query shows a cross join:

SELECT p.project_name, e.employee_name
    FROM projects AS p CROSS JOIN employees AS e
    ORDER BY p.project_ID, e.employee_ID;
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Terry Smith     |
| COVID-19 Vaccine | Maria Inverness |
| COVID-19 Vaccine | Pat Wang        |
| COVID-19 Vaccine | NewEmployee     |
| Malaria Vaccine  | Terry Smith     |
| Malaria Vaccine  | Maria Inverness |
| Malaria Vaccine  | Pat Wang        |
| Malaria Vaccine  | NewEmployee     |
| NewProject       | Terry Smith     |
| NewProject       | Maria Inverness |
| NewProject       | Pat Wang        |
| NewProject       | NewEmployee     |
+------------------+-----------------+
Copy

Note that this query contains no ON clause and no filter.

The output of a cross join can be made more useful by applying a filter in the WHERE clause:

SELECT p.project_name, e.employee_name
    FROM projects AS p CROSS JOIN employees AS e
    WHERE e.project_ID = p.project_ID
    ORDER BY p.project_ID, e.employee_ID;
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Terry Smith     |
| COVID-19 Vaccine | Maria Inverness |
| Malaria Vaccine  | Pat Wang        |
+------------------+-----------------+
Copy

The result of this cross join and filter is the same as the result of the following inner join:

SELECT p.project_name, e.employee_name
    FROM projects AS p INNER JOIN employees AS e
        ON e.project_ID = p.project_ID
    ORDER BY p.project_ID, e.employee_ID;
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Terry Smith     |
| COVID-19 Vaccine | Maria Inverness |
| Malaria Vaccine  | Pat Wang        |
+------------------+-----------------+
Copy

Important

Although the two queries in this example produce the same output when they use the same condition (e.project_id = p.project_id) in different clauses (WHERE vs. FROM ... ON ...), it is possible to construct pairs of queries that use the same condition but that do not produce the same output.

The most common examples involve outer joins. If you execute table1 LEFT OUTER JOIN table2, then for rows in table1 that have no match, the columns that would have come from table2 contain NULL. A filter like WHERE table2.ID = table1.ID filters out rows in which either table2.id or table1.id contains a NULL, while an explicit outer join in the FROM ... ON ... clause does not filter out rows with NULL values. In other words, an outer join with a filter might not actually act like an outer join.

Natural Join

A natural join is used when two tables contain columns that have the same name and in which the data in those columns corresponds. In the employees and projects tables shown above, both tables have columns named “project_ID”. A natural join implicitly constructs the ON clause: ON projects.project_ID = employees.project_ID.

If two tables have multiple columns in common, then all the common columns are used in the ON clause. For example, if you had two tables that each had columns named “city” and “province”, then a natural join would construct the following ON clause:

ON table2.city = table1.city AND table2.province = table1.province.

The output of a natural join includes only one copy of each of the shared columns. For example, the following query produces a natural join containing all columns in the two tables, except that it omits all but one copy of the redundant project_ID column:

SELECT *
    FROM projects NATURAL JOIN employees
    ORDER BY employee_ID;
+------------+------------------+-------------+-----------------+
| PROJECT_ID | PROJECT_NAME     | EMPLOYEE_ID | EMPLOYEE_NAME   |
|------------+------------------+-------------+-----------------|
|       1000 | COVID-19 Vaccine |    10000001 | Terry Smith     |
|       1000 | COVID-19 Vaccine |    10000002 | Maria Inverness |
|       1001 | Malaria Vaccine  |    10000003 | Pat Wang        |
+------------+------------------+-------------+-----------------+
Copy

A natural join can be combined with an outer join.

A natural join cannot be combined with an ON clause because the join condition is already implied. However, you can use a WHERE clause to filter the results of a natural join.

Implementing Joins

Syntactically, there are two ways to join tables:

  • Use the JOIN operator in the ON sub-clause of the FROM clause.

  • Use the WHERE with the FROM clause.

Snowflake recommends using the ON sub-clause in the FROM clause. The syntax is more flexible. And specifying the predicate in the ON clause avoids the problem of accidentally filtering rows with NULLs when using a WHERE clause to specify the join condition for an outer join.