Categories:

Query syntax

Working with joins¶

A join combines rows from two tables to create a new combined row that can be used in the query.

Introduction¶

Joins are useful when the data in the tables is related. For example, one table might hold information about projects, and one table might hold information about employees working on those projects.

CREATE TABLE projects (
  project_id INT,
  project_name VARCHAR);

INSERT INTO projects VALUES
  (1000, 'COVID-19 Vaccine'),
  (1001, 'Malaria Vaccine'),
  (1002, 'NewProject');

CREATE TABLE employees (
  employee_id INT,
  employee_name VARCHAR,
  project_id INT);

INSERT INTO employees VALUES
  (10000001, 'Terry Smith', 1000),
  (10000002, 'Maria Inverness', 1000),
  (10000003, 'Pat Wang', 1001),
  (10000004, 'NewEmployee', NULL);
Copy

Query the tables to view the data:

SELECT * FROM projects ORDER BY project_ID;
Copy
+------------+------------------+
| PROJECT_ID | PROJECT_NAME     |
|------------+------------------|
|       1000 | COVID-19 Vaccine |
|       1001 | Malaria Vaccine  |
|       1002 | NewProject       |
+------------+------------------+
SELECT * FROM employees ORDER BY employee_ID;
Copy
+-------------+-----------------+------------+
| EMPLOYEE_ID | EMPLOYEE_NAME   | PROJECT_ID |
|-------------+-----------------+------------|
|    10000001 | Terry Smith     |       1000 |
|    10000002 | Maria Inverness |       1000 |
|    10000003 | Pat Wang        |       1001 |
|    10000004 | NewEmployee     |       NULL |
+-------------+-----------------+------------+

The two joined tables usually contain one or more columns in common so that the rows in one table can be associated with the corresponding rows in the other table. For example, in these sample tables, each row in the projects table has a unique project ID number, and each row in the employees table includes the ID number of the project that the employee is currently assigned to.

The join operation specifies, explicitly or implicitly, how to relate rows in one table to the corresponding rows in the other table, typically by referencing one or more common columns, such as project_id. For example, the following joins the projects and employees tables that were created previously:

SELECT p.project_ID, project_name, employee_ID, employee_name, e.project_ID
  FROM projects AS p JOIN employees AS e
    ON e.project_ID = p.project_ID
  ORDER BY p.project_ID, e.employee_ID;
Copy
+------------+------------------+-------------+-----------------+------------+
| PROJECT_ID | PROJECT_NAME     | EMPLOYEE_ID | EMPLOYEE_NAME   | PROJECT_ID |
|------------+------------------+-------------+-----------------+------------|
|       1000 | COVID-19 Vaccine |    10000001 | Terry Smith     |       1000 |
|       1000 | COVID-19 Vaccine |    10000002 | Maria Inverness |       1000 |
|       1001 | Malaria Vaccine  |    10000003 | Pat Wang        |       1001 |
+------------+------------------+-------------+-----------------+------------+

Although a single join operation can join only two tables, joins can be chained together. The result of a join is a table-like object, and that table-like object can then be joined to another table-like object. Conceptually, the idea is similar to the following; this isn’t the actual syntax:

table1 JOIN (table2 JOIN table3)
Copy

In this pseudo-code, table2 and table3 are joined first. The table that results from that join is then joined with table1.

Joins can be applied not only to tables, but also to other table-like objects. You can join:

  • A table.

  • A view (materialized or non-materialized).

  • A table literal.

  • An expression that evaluates to the equivalent of a table (containing one or more columns and zero or more rows). For example:

    • The result set returned by a table function.

    • The result set returned by a subquery that returns a table.

When this topic refers to joining a table, it generally means joining any table-like object.

Note

Snowflake can improve performance by eliminating unnecessary joins. For more information, see Understanding How Snowflake Can Eliminate Redundant Joins.

Types of joins¶

Snowflake supports the following types of joins:

Note

Snowflake also supports ASOF JOIN for analyzing time-series data. For more information, see ASOF JOIN and Analyzing time-series data.

Inner join¶

An inner join pairs each row in one table with the matching rows in the other table.

The following example shows an inner join:

SELECT p.project_ID, project_name, employee_ID, employee_name, e.project_ID
  FROM projects AS p INNER JOIN employees AS e
    ON e.project_id = p.project_id
  ORDER BY p.project_ID, e.employee_ID;
Copy
+------------+------------------+-------------+-----------------+------------+
| PROJECT_ID | PROJECT_NAME     | EMPLOYEE_ID | EMPLOYEE_NAME   | PROJECT_ID |
|------------+------------------+-------------+-----------------+------------|
|       1000 | COVID-19 Vaccine |    10000001 | Terry Smith     |       1000 |
|       1000 | COVID-19 Vaccine |    10000002 | Maria Inverness |       1000 |
|       1001 | Malaria Vaccine  |    10000003 | Pat Wang        |       1001 |
+------------+------------------+-------------+-----------------+------------+

In this example, the output contains two columns named PROJECT_ID. One PROJECT_ID column is from the projects table, and one is from the employees table. For each row in the output, the values in the two PROJECT_ID columns match because the query specified e.project_id = p.project_id.

The output includes only valid pairs; that is, rows that match the join condition. In this example, there is no row for the project named NewProject, which has no employees assigned yet, or the employee named NewEmployee, who hasn’t been assigned to any projects yet.

Outer join¶

An outer join lists all rows in the specified table, even if those rows have no match in the other table. For example, a left outer join between projects and employees lists all projects, including projects that don’t yet have any employee assigned.

SELECT p.project_name, e.employee_name
  FROM projects AS p LEFT OUTER JOIN employees AS e
    ON e.project_ID = p.project_ID
  ORDER BY p.project_name, e.employee_name;
Copy
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Maria Inverness |
| COVID-19 Vaccine | Terry Smith     |
| Malaria Vaccine  | Pat Wang        |
| NewProject       | NULL            |
+------------------+-----------------+

The project named NewProject is included in this output, even though there is no matching row in the employees table. Because there are no matching employee names for the project named NewProject, the employee name is NULL.

A right outer join lists all employees (regardless of project).

SELECT p.project_name, e.employee_name
  FROM projects AS p RIGHT OUTER JOIN employees AS e
    ON e.project_ID = p.project_ID
  ORDER BY p.project_name, e.employee_name;
Copy
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Maria Inverness |
| COVID-19 Vaccine | Terry Smith     |
| Malaria Vaccine  | Pat Wang        |
| NULL             | NewEmployee     |
+------------------+-----------------+

A full outer join lists all projects and all employees.

SELECT p.project_name, e.employee_name
  FROM projects AS p FULL OUTER JOIN employees AS e
    ON e.project_ID = p.project_ID
  ORDER BY p.project_name, e.employee_name;
Copy
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Maria Inverness |
| COVID-19 Vaccine | Terry Smith     |
| Malaria Vaccine  | Pat Wang        |
| NewProject       | NULL            |
| NULL             | NewEmployee     |
+------------------+-----------------+

Cross join¶

A cross join combines each row in the first table with each row in the second table, creating every possible combination of rows, which is called a Cartesian product. Because most of the result rows contain parts of rows that aren’t actually related, a cross join is rarely useful by itself. In fact, cross joins are usually the result of accidentally omitting the join condition.

The result of a cross join can be very large and expensive. If the first table has N rows and the second table has M rows, then the result is N x M rows. For example, if the first table has 100 rows and the second table has 1000 rows, then the result set contains 100,000 rows.

The following query shows a cross join:

Note

This query contains no ON clause and no filter.

SELECT p.project_name, e.employee_name
  FROM projects AS p CROSS JOIN employees AS e
  ORDER BY p.project_ID, e.employee_ID;
Copy
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Terry Smith     |
| COVID-19 Vaccine | Maria Inverness |
| COVID-19 Vaccine | Pat Wang        |
| COVID-19 Vaccine | NewEmployee     |
| Malaria Vaccine  | Terry Smith     |
| Malaria Vaccine  | Maria Inverness |
| Malaria Vaccine  | Pat Wang        |
| Malaria Vaccine  | NewEmployee     |
| NewProject       | Terry Smith     |
| NewProject       | Maria Inverness |
| NewProject       | Pat Wang        |
| NewProject       | NewEmployee     |
+------------------+-----------------+

You can make the output of a cross join more useful by applying a filter in the WHERE clause:

SELECT p.project_name, e.employee_name
  FROM projects AS p CROSS JOIN employees AS e
  WHERE e.project_ID = p.project_ID
  ORDER BY p.project_ID, e.employee_ID;
Copy
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Terry Smith     |
| COVID-19 Vaccine | Maria Inverness |
| Malaria Vaccine  | Pat Wang        |
+------------------+-----------------+

The result of this cross join and filter is the same as the result of the following inner join:

SELECT p.project_name, e.employee_name
  FROM projects AS p INNER JOIN employees AS e
    ON e.project_ID = p.project_ID
  ORDER BY p.project_ID, e.employee_ID;
Copy
+------------------+-----------------+
| PROJECT_NAME     | EMPLOYEE_NAME   |
|------------------+-----------------|
| COVID-19 Vaccine | Terry Smith     |
| COVID-19 Vaccine | Maria Inverness |
| Malaria Vaccine  | Pat Wang        |
+------------------+-----------------+

Important

Although the two queries in this example produce the same output when they use the same condition (e.project_id = p.project_id) in different clauses (WHERE and FROM ... ON ...), it is possible to construct pairs of queries that use the same condition but that don’t produce the same output.

The most common examples involve outer joins. If you run table1 LEFT OUTER JOIN table2, then for rows in table1 that have no match, the columns that would have come from table2 contain NULL. A filter such as WHERE table2.ID = table1.ID filters out rows in which either table2.id or table1.id contains a NULL, while an explicit outer join in the FROM ... ON ... clause doesn’t filter out rows with NULL values. In other words, an outer join with a filter might not act like an outer join.

Natural join¶

A natural join joins two tables on columns that have the same names and compatible data types. Both the employees and the projects table created previously, have a column named project_ID. A natural join implicitly constructs the ON clause: ON projects.project_ID = employees.project_ID.

If two tables have multiple columns in common, then a natural join uses all of the common columns in the constructed ON clause. For example, if two tables each have columns named city and province, then a natural join constructs the following ON clause:

ON table2.city = table1.city AND table2.province = table1.province
Copy

The output of a natural join includes only one copy of each of the shared columns. For example, the following query produces a natural join that contains all of columns in the two tables, except that it omits all but one copy of the redundant project_id columns:

SELECT *
  FROM projects NATURAL JOIN employees
  ORDER BY employee_id;
Copy
+------------+------------------+-------------+-----------------+
| PROJECT_ID | PROJECT_NAME     | EMPLOYEE_ID | EMPLOYEE_NAME   |
|------------+------------------+-------------+-----------------|
|       1000 | COVID-19 Vaccine |    10000001 | Terry Smith     |
|       1000 | COVID-19 Vaccine |    10000002 | Maria Inverness |
|       1001 | Malaria Vaccine  |    10000003 | Pat Wang        |
+------------+------------------+-------------+-----------------+

You can combine a natural join with an outer join.

You can’t combine a natural join ON clause because the join condition is already implied. However, you can use a WHERE clause to filter the results of a natural join.

Implementing joins¶

Syntactically, there are two ways to join tables:

  • Use the JOIN subclause in the ON subclause of the FROM clause.

  • Use the WHERE clause with the FROM clause.

Snowflake recommends using the ON subclause in the FROM clause because the syntax is more flexible. Also, specifying the predicate in the ON subclause avoids the problem of accidentally filtering rows with NULL values when using a WHERE clause to specify the join condition for an outer join.

In addition, you can use the DIRECTED keyword to enforce the join order of the tables. When you specify this keyword, the first, or left, table is scanned before the second, or right, table. For example, o1 INNER DIRECTED JOIN o2 scans the o1 table before the o2 table. If the DIRECTED keyword is added, the join type — for example, INNER or OUTER — is required. For more information, see JOIN.

Note

Directed join is a preview feature that is available to all accounts.