Categories:

Query Syntax

Working with Joins

A join combines rows from two tables to create a new combined row that can be used in the query.

In this Topic:

Introduction

When two tables are joined, the data in the two tables is related in some way. For example, one table might hold information about projects, and one table might hold information about the employees working on those projects.

+------------+------------------+
| Project_ID | Project_Name     |
+------------+------------------+
|       1000 | COVID-19 Vaccine |
|       1001 | Malaria Vaccine  |
|       1002 | NewProject       |
+------------+------------------+

+-------------+------------------+------------+
| Employee_ID | Employee_Name    | Project_ID |
+-------------+------------------+------------+
|    10000001 | Terry Smith      |       1000 |
|    10000002 | Maria Inverness  |       1000 |
|    10000003 | Pat Wang         |       1001 |
|    99999999 | NewEmployee      |       NULL |
+-------------+------------------+------------+

The two joined tables usually contain one or more columns in common so that the rows in one table can be associated with the corresponding rows in the other table. For example, each row in the projects table might have a unique project ID number, and each row in the employees table might include the ID number of the project that the employee is currently assigned to.

The join operation specifies (explicitly or implicitly) how to relate rows in one table to the corresponding rows in the other table, typically by referencing the common column(s), such as project ID. For example, the following joins the project and employee tables shown above:

SELECT *
    FROM projects AS p JOIN employees AS e
        ON e.project_ID = p.project_ID;

Although a single join operation can join only two tables, joins can be chained together. The result of a join is a table-like object, and that table-like object can then be joined to another table-like object. Conceptually, the idea is similar to the following (this is not the actual syntax):

table1 join (table2 join table 3)

In this pseudo-code, table2 and table3 are joined first. The table that results from that join is then joined with table1.

Joins can be applied not only to tables, but also to other table-like objects. You can join:

  • A table.

  • A view (materialized or non-materialized).

  • A table literal.

  • An expression that evaluates to the equivalent of a table (containing one or more columns and zero or more rows). For example:

    • The result set returned by a table function.

    • The result set returned by a subquery that returns a table.

When this topic refers to joining a table, it generally means joining any table-like object.

Types of Joins

Snowflake supports the following types of joins:

  • Inner join.

  • Outer join.

  • Cross join.

  • Natural join.

Inner Join

Each row in one table is paired with the matching row(s) in the other table.

Suppose that you use the tables shown above and the query shown below:

SELECT *
    FROM projects AS p INNER JOIN employees AS e
        ON e.project_id = p.project_id;

The result would be similar to:

+-------------+------------------+------------+------------+------------------+
| Employee_ID | Employee_Name    | Project_ID | Project_ID | Project_name     |
+-------------+------------------+------------+------------+------------------+
|    10000001 | Terry Smith      |       1000 |       1000 | COVID-19 Vaccine |
|    10000002 | Maria Inverness  |       1001 |       1001 | Malaria Vaccine  |
|    10000003 | Pat Wang         |       1001 |       1001 | Malaria Vaccine  |
+-------------+------------------+------------+------------+------------------+

In this example, the output table contains two columns named “Project_ID”. One Project_ID column is from the projects table, and one is from the employees table. For each row in the output table, the values in the two Project_ID columns match because the query specified e.project_id = p.project_id.

The output includes only valid pairs, i.e. rows that match the join condition. In this example there is no row for the new project (which has no employees assigned yet) or the new employee (who hasn’t been assigned to any projects yet).

Outer Join

An outer join lists all rows in the specified table, even if those rows have no match in the other table. For example, a left outer join between projects and employees would list all projects, including projects that did not yet have any employee assigned.

SELECT p.project_name, e.employee_name
    FROM projects AS p LEFT OUTER JOIN employees AS e
        ON e.project_ID = p.project_ID;

The result would be similar to:

+------------------+------------------+
| Project_name     | Employee_Name    |
+------------------+------------------+
| COVID-19 Vaccine | Terry Smith      |
| Malaria Vaccine  | Maria Inverness  |
| Malaria Vaccine  | Pat Wang         |
| New Project      | NULL             |
+------------------+------------------+

The project named “New Project” is included in this output even though there is no matching row in the employees table. Because there are no matching employee names for the project named “New Project”, the employee name is set to NULL.

A right outer join would lists all employees (regardless of project).

A full outer join lists all projects and all employees. The output would look similar to the following:

+------------------+------------------+
| Project_name     | Employee_Name    |
+------------------+------------------+
| COVID-19 Vaccine | Terry Smith      |
| Malaria Vaccine  | Maria Inverness  |
| Malaria Vaccine  | Pat Wang         |
| New Project      | NULL             |
| NULL             | NewEmployee      |
+------------------+------------------+

Cross Join

A cross join combines each row in the first table with each row in the second table, creating every possible combination of rows (called a “Cartesian Product”). Because most of the result rows contain parts of rows that are not actually related, a cross join is rarely useful by itself. In fact, cross joins are usually the result of accidentally omitting the join condition.

The result of a cross join can be very large (and expensive). If the first table has N rows and the second table has M rows, then the result is N x M rows. For example, if the first table has 100 rows and the second table has 1000 rows, then the result set contains 100,000 rows.

The following query shows a cross join:

SELECT p.project_name, e.employee_name
    FROM projects AS p CROSS JOIN employees AS e;

Note that this query contains no ON clause and no filter.

The output of a cross join can be made more useful by applying a filter in the WHERE clause:

SELECT p.project_name, e.employee_name
    FROM projects AS p INNER JOIN employees AS e
    WHERE e.project_ID = p.project_ID;

The result of this cross join and filter is the same as the result of the following inner join:

SELECT p.project_name, e.employee_name
    FROM projects AS p INNER JOIN employees AS e
        ON e.project_ID = p.project_ID;

Important

Although the two queries in this example produce the same output when they use the same condition (e.project_id = p.project_id) in different clauses (WHERE vs. FROM ... ON ...), it is possible to construct pairs of queries that use the same condition but that do not produce the same output.

The most common examples involve outer joins. If you execute table1 LEFT OUTER JOIN table2, then for rows in table1 that have no match, the columns that would have come from table2 contain NULL. A filter like WHERE table2.ID = table1.ID filters out rows in which either table2.id or table1.id contains a NULL, while an explicit outer join in the FROM ... ON ... clause does not filter out rows with NULL values. In other words, an outer join with a filter might not actually act like an outer join.

Natural Join

A natural join is used when two tables contain columns that have the same name and in which the data in those columns corresponds. For example, suppose that you have two tables, each of which contains a column named “City” and a column named “Province”. A natural join implicitly constructs the ON clause ON table2.city = table1.city AND table2.province = table1.province.

A natural join also automatically includes only one copy of each of those columns in the output.

For example, the following query would produce a natural join containing all columns in the two tables, except that it would omit the redundant city and province information from one of the tables:

SELECT *
    FROM geographic_data_by_city_and_province NATURAL JOIN demographic_data_by_city_and_province;

A natural join can be combined with an outer join.

A natural join cannot be combined with an ON clause because the join condition is already implied. However, you can use a WHERE clause to filter the results of a natural join.

Implementing Joins

Syntactically, there are two ways to join tables:

  • Use the JOIN operator in the ON sub-clause of the FROM clause.

  • Use the WHERE with the FROM clause.

Snowflake recommends using the FROM clause. The syntax is more flexible. And specifying the predicate in the ON clause avoids the problem of accidentally filtering rows with NULLs when using a WHERE clause to specify the join condition for an outer join.