- Categories:
GROUP BY GROUPING SETS¶
GROUP BY GROUPING SETS is a powerful extension of the GROUP BY clause that computes multiple group-by clauses in a single statement. The group set is a set of dimension columns.
GROUP BY GROUPING SETS is equivalent to the UNION
of two or more GROUP BY operations in the same result set:
GROUP BY GROUPING SETS(a)
is equivalent to the single grouping set operationGROUP BY a
.GROUP BY GROUPING SETS(a,b)
is equivalent toGROUP BY a UNION ALL GROUP BY b
.
Syntax¶
SELECT ...
FROM ...
[ ... ]
GROUP BY GROUPING SETS ( groupSet [ , groupSet [ , ... ] ] )
[ ... ]
Where:
groupSet ::= { <column_alias> | <position> | <expr> }
Parameters¶
Usage notes¶
Snowflake allows up to 128 grouping sets in the same query block.
The output typically contains some NULL values. Because
GROUP BY ROLLUP
merges the results of two or more result sets, each of which was grouped by different criteria, some columns that have a single value in one result set might have many corresponding values in the other result set. For example, if you do aUNION
of a set of employees grouped by department with a set grouped by seniority, the members of the set with the greatest seniority are not necessarily all in the same department, so the value of department_name is set to NULL. The following examples contain NULLs for this reason.
Examples¶
These examples use a table of information about nurses who are trained to assist in disasters. All of these nurses have a license as nurses (e.g. an RN has a license as a “Registered Nurse”), and an additional license in a disaster-related specialty, such as search and rescue, radio communications, etc. This example simplifies and uses just two categories of licenses:
Nursing: RN (Registered Nurse) and LVN (Licensed Vocational Nurse).
Amateur (“ham”) Radio: Ham radio licenses include “Technician”, “General”, and “Amateur Extra”.
Here are the commands to create and load the table:
CREATE or replace TABLE nurses ( ID INTEGER, full_name VARCHAR, medical_license VARCHAR, -- LVN, RN, etc. radio_license VARCHAR -- Technician, General, Amateur Extra ) ; INSERT INTO nurses (ID, full_name, medical_license, radio_license) VALUES (201, 'Thomas Leonard Vicente', 'LVN', 'Technician'), (202, 'Tamara Lolita VanZant', 'LVN', 'Technician'), (341, 'Georgeann Linda Vente', 'LVN', 'General'), (471, 'Andrea Renee Nouveau', 'RN', 'Amateur Extra') ;
This query uses GROUP BY GROUPING SETS
:
SELECT COUNT(*), medical_license, radio_license FROM nurses GROUP BY GROUPING SETS (medical_license, radio_license);Output:
The first two rows show the count of RNs and LVNs (two types of nursing licenses). The NULL values in the RADIO_LICENSE column for those two rows are deliberate; the query grouped all of the LVNs together (and all the RNs together) regardless of their radio license, so the results can’t show one value in the RADIO_LICENSE column for each row that necessarily applies to all the LVNs or RNs grouped in that row.
The next three rows show the number of nurses with each type of ham radio license (“Technician”, “General”, and “Amateur Extra”). The NULL value for MEDICAL_LICENSE in each of those three rows is deliberate because no single medical license necessarily applies to all members of each of those rows.
+----------+-----------------+---------------+ | COUNT(*) | MEDICAL_LICENSE | RADIO_LICENSE | |----------+-----------------+---------------| | 3 | LVN | NULL | | 1 | RN | NULL | | 2 | NULL | Technician | | 1 | NULL | General | | 1 | NULL | Amateur Extra | +----------+-----------------+---------------+
The next example shows what happens when some columns contain NULL values. Start by adding three new nurses who don’t yet have ham radio licenses.
INSERT INTO nurses (ID, full_name, medical_license, radio_license) VALUES (101, 'Lily Vine', 'LVN', NULL), (102, 'Larry Vancouver', 'LVN', NULL), (172, 'Rhonda Nova', 'RN', NULL) ;Then run the same query as before:
SELECT COUNT(*), medical_license, radio_license FROM nurses GROUP BY GROUPING SETS (medical_license, radio_license);Output:
The first 5 lines are the same as in the previous query.
The last line might be confusing at first – why is there a line that has NULL in both columns? And if all the values are NULL, why is the COUNT(*) equal to 3?
The answer is that the NULL in the RADIO_LICENSE column of that row occurs because three nurses don’t have any radio license. (“SELECT DISTINCT RADIO_LICENSE FROM nurses” now returns four distinct values: “Technician”, “General”, “Amateur Extra”, and “NULL”.)
The NULL in the MEDICAL_LICENSES column occurs for the same reason that NULL values occur in the earlier query results: the nurses counted in this row have different MEDICAL_LICENSES, so no one value (“RN” or “LVN”) necessarily applies to all of the nurses counted in this row.
+----------+-----------------+---------------+ | COUNT(*) | MEDICAL_LICENSE | RADIO_LICENSE | |----------+-----------------+---------------| | 5 | LVN | NULL | | 2 | RN | NULL | | 2 | NULL | Technician | | 1 | NULL | General | | 1 | NULL | Amateur Extra | | 3 | NULL | NULL | +----------+-----------------+---------------+
If you’d like, you can compare this output to the output of a GROUP BY
without the GROUPING SETS
clause:
SELECT COUNT(*), medical_license, radio_license FROM nurses GROUP BY medical_license, radio_license;Output:
+----------+-----------------+---------------+ | COUNT(*) | MEDICAL_LICENSE | RADIO_LICENSE | |----------+-----------------+---------------| | 2 | LVN | Technician | | 1 | LVN | General | | 1 | RN | Amateur Extra | | 2 | LVN | NULL | | 1 | RN | NULL | +----------+-----------------+---------------+