Snowpark Migration Accelerator: Group By¶
Descrição¶
A cláusula GROUP BY
agrupa linhas com base em expressões especificadas e calcula funções agregadas para cada grupo. O Databricks SQL oferece opções avançadas de agrupamento por meio das cláusulas GROUPING SETS
, CUBE
e ROLLUP
, que permitem várias agregações no mesmo conjunto de dados. Você pode combinar expressões regulares de agrupamento com essas opções avançadas na cláusula GROUP BY
e aninhá-las em GROUPING SETS
. (Referência da linguagem SQL do Databricks GROUP BY)
Agrupa linhas que compartilham os mesmos valores em colunas especificadas e calcula funções agregadas (como SUM, COUNT ou AVG) para cada grupo. A cláusula GROUP BY pode incluir:
O nome de uma coluna
Um número que se refere a uma posição na lista SELECT
Qualquer expressão válida
Extensões:
GROUPBYCUBE, GROUPBYGROUPINGSETS, e GROUPBYROLLUP
Referência da linguagem SQL do Snowflake GROUP BY
Sintaxe¶
GROUP BY ALL
GROUP BY group_expression [, ...] [ WITH ROLLUP | WITH CUBE ]
GROUP BY { group_expression | { ROLLUP | CUBE | GROUPING SETS } ( grouping_set [, ...] ) } [, ...]
grouping_set
{ expression |
( [ expression [, ...] ] ) }
SELECT ...
FROM ...
[ ... ]
GROUP BY groupItem [ , groupItem [ , ... ] ]
[ ... ]
SELECT ...
FROM ...
[ ... ]
GROUP BY ALL
[ ... ]
groupItem ::= { <column_alias> | <position> | <expr> }
SELECT ...
FROM ...
[ ... ]
GROUP BY CUBE ( groupCube [ , groupCube [ , ... ] ] )
[ ... ]
groupCube ::= { <column_alias> | <position> | <expr> }
SELECT ...
FROM ...
[ ... ]
GROUP BY GROUPING SETS ( groupSet [ , groupSet [ , ... ] ] )
[ ... ]
groupSet ::= { <column_alias> | <position> | <expr> }
SELECT ...
FROM ...
[ ... ]
GROUP BY ROLLUP ( groupRollup [ , groupRollup [ , ... ] ] )
[ ... ]
groupRollup ::= { <column_alias> | <position> | <expr> }
Amostra de padrões da origem¶
Dados de configuração¶
Databricks¶
CREATE TEMP VIEW dealer (id, city, car_model, quantity) AS
VALUES (100, 'Fremont', 'Honda Civic', 10),
(100, 'Fremont', 'Honda Accord', 15),
(100, 'Fremont', 'Honda CRV', 7),
(200, 'Dublin', 'Honda Civic', 20),
(200, 'Dublin', 'Honda Accord', 10),
(200, 'Dublin', 'Honda CRV', 3),
(300, 'San Jose', 'Honda Civic', 5),
(300, 'San Jose', 'Honda Accord', 8);
Snowflake¶
CREATE TEMP TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
INSERT INTO dealer VALUES
(100, 'Fremont', 'Honda Civic', 10),
(100, 'Fremont', 'Honda Accord', 15),
(100, 'Fremont', 'Honda CRV', 7),
(200, 'Dublin', 'Honda Civic', 20),
(200, 'Dublin', 'Honda Accord', 10),
(200, 'Dublin', 'Honda CRV', 3),
(300, 'San Jose', 'Honda Civic', 5),
(300, 'San Jose', 'Honda Accord', 8);
Código do padrão¶
Databricks¶
-- 1. Sum of quantity per dealership. Group by `id`.
SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;
-- 2. Use column position in GROUP by clause.
SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;
-- 3. Multiple aggregations.
-- 3.1. Sum of quantity per dealership.
-- 3.2. Max quantity per dealership.
SELECT id, sum(quantity) AS sum, max(quantity) AS max
FROM dealer GROUP BY id ORDER BY id;
-- 4. Count the number of distinct dealers in cities per car_model.
SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
-- 5. Count the number of distinct dealers in cities per car_model, using GROUP BY ALL
SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY ALL;
-- 6. Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
SELECT id,
sum(quantity) FILTER (WHERE car_model IN ('Honda Civic', 'Honda CRV')) AS `sum(quantity)`
FROM dealer
GROUP BY id ORDER BY id;
-- 7. Aggregations using multiple sets of grouping columns in a single statement.
-- Following performs aggregations based on four sets of grouping columns.
-- 7.1. city, car_model
-- 7.2. city
-- 7.3. car_model
-- 7.4. Empty grouping set. Returns quantities for all city and car models.
SELECT city, car_model, sum(quantity) AS sum
FROM dealer
GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
ORDER BY city;
-- 8.Group by processing with `ROLLUP` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), ())
SELECT city, car_model, sum(quantity) AS sum
FROM dealer
GROUP BY city, car_model WITH ROLLUP
ORDER BY city, car_model;
-- 9. Group by processing with `CUBE` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
SELECT city, car_model, sum(quantity) AS sum
FROM dealer
GROUP BY city, car_model WITH CUBE
ORDER BY city, car_model;
id |
sum(quantity) |
---|---|
100 |
32 |
200 |
33 |
300 |
13 |
id |
sum(quantity) |
---|---|
100 |
32 |
200 |
33 |
300 |
13 |
id |
sum |
max |
---|---|---|
100 |
32 |
15 |
200 |
33 |
20 |
300 |
13 |
8 |
car_model |
count |
---|---|
Honda Civic |
3 |
Honda CRV |
2 |
Honda Accord |
3 |
car_model |
count |
---|---|
Honda Civic |
3 |
Honda CRV |
2 |
Honda Accord |
3 |
id |
sum(quantity) |
---|---|
100 |
17 |
200 |
23 |
300 |
5 |
city |
car_model |
sum |
---|---|---|
NULL |
Honda Civic |
35 |
NULL |
Honda Accord |
33 |
NULL |
NULL |
78 |
NULL |
Honda CRV |
10 |
Dublin |
Honda Civic |
20 |
Dublin |
NULL |
33 |
Dublin |
Honda CRV |
3 |
Dublin |
Honda Accord |
10 |
Fremont |
Honda Accord |
15 |
Fremont |
Honda Civic |
10 |
Fremont |
NULL |
32 |
Fremont |
Honda CRV |
7 |
São José |
Honda Accord |
8 |
São José |
NULL |
13 |
São José |
Honda Civic |
5 |
city |
car_model |
sum |
---|---|---|
NULL |
NULL |
78 |
Dublin |
NULL |
33 |
Dublin |
Honda Accord |
10 |
Dublin |
Honda CRV |
3 |
Dublin |
Honda Civic |
20 |
Fremont |
NULL |
32 |
Fremont |
Honda Accord |
15 |
Fremont |
Honda CRV |
7 |
Fremont |
Honda Civic |
10 |
São José |
NULL |
13 |
São José |
Honda Accord |
8 |
São José |
Honda Civic |
5 |
city |
car_model |
sum |
---|---|---|
NULL |
NULL |
78 |
NULL |
Honda Accord |
33 |
NULL |
Honda CRV |
10 |
NULL |
Honda Civic |
35 |
Dublin |
NULL |
33 |
Dublin |
Honda Accord |
10 |
Dublin |
Honda CRV |
3 |
Dublin |
Honda Civic |
20 |
Fremont |
NULL |
32 |
Fremont |
Honda Accord |
15 |
Fremont |
Honda CRV |
7 |
Fremont |
Honda Civic |
10 |
São José |
NULL |
13 |
São José |
Honda Accord |
8 |
São José |
Honda Civic |
5 |
Snowflake¶
-- 1. Sum of quantity per dealership. Group by `id`.
SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;
-- 2. Use column position in GROUP by clause.
SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;
-- 3. Multiple aggregations.
-- 3.1. Sum of quantity per dealership.
-- 3.2. Max quantity per dealership.
SELECT id, sum(quantity) AS sum, max(quantity) AS max
FROM dealer GROUP BY id ORDER BY id;
-- 4. Count the number of distinct dealers in cities per car_model.
SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
-- 5. Count the number of distinct dealers in cities per car_model, using GROUP BY ALL
SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY ALL;
-- 6. Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
SELECT
id,
SUM(CASE WHEN car_model='Honda Civic' OR car_model='Honda CRV' THEN quantity ELSE NULL END) AS `sum(quantity)`
FROM dealer
GROUP BY id ORDER BY id;
-- 7. Aggregations using multiple sets of grouping columns in a single statement.
-- Following performs aggregations based on four sets of grouping columns.
-- 7.1. city, car_model
-- 7.2. city
-- 7.3. car_model
-- 7.4. Empty grouping set. Returns quantities for all city and car models.
SELECT city, car_model, sum(quantity) AS sum
FROM dealer
GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
ORDER BY city NULLS FIRST;
-- 8. Group by processing with `ROLLUP` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), ())
SELECT city, car_model, sum(quantity) AS sum
FROM dealer
GROUP BY ROLLUP (city, car_model)
ORDER BY city NULLS FIRST, car_model NULLS FIRST;
-- 9. Group by processing with `CUBE` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
SELECT city, car_model, sum(quantity) AS sum
FROM dealer
GROUP BY CUBE (city, car_model)
ORDER BY city NULLS FIRST, car_model NULLS FIRST;
id |
sum(quantity) |
---|---|
100 |
32 |
200 |
33 |
300 |
13 |
id |
sum(quantity) |
---|---|
100 |
32 |
200 |
33 |
300 |
13 |
id |
sum |
max |
---|---|---|
100 |
32 |
15 |
200 |
33 |
20 |
300 |
13 |
8 |
car_model |
count |
---|---|
Honda Civic |
3 |
Honda CRV |
2 |
Honda Accord |
3 |
car_model |
count |
---|---|
Honda Civic |
3 |
Honda CRV |
2 |
Honda Accord |
3 |
id |
sum(quantity) |
---|---|
100 |
17 |
200 |
23 |
300 |
5 |
city |
car_model |
sum |
---|---|---|
NULL |
Honda Civic |
35 |
NULL |
Honda Accord |
33 |
NULL |
NULL |
78 |
NULL |
Honda CRV |
10 |
Dublin |
Honda Civic |
20 |
Dublin |
NULL |
33 |
Dublin |
Honda CRV |
3 |
Dublin |
Honda Accord |
10 |
Fremont |
Honda Accord |
15 |
Fremont |
Honda Civic |
10 |
Fremont |
NULL |
32 |
Fremont |
Honda CRV |
7 |
São José |
Honda Accord |
8 |
São José |
NULL |
13 |
São José |
Honda Civic |
5 |
city |
car_model |
sum |
---|---|---|
NULL |
NULL |
78 |
Dublin |
NULL |
33 |
Dublin |
Honda Accord |
10 |
Dublin |
Honda CRV |
3 |
Dublin |
Honda Civic |
20 |
Fremont |
NULL |
32 |
Fremont |
Honda Accord |
15 |
Fremont |
Honda CRV |
7 |
Fremont |
Honda Civic |
10 |
São José |
NULL |
13 |
São José |
Honda Accord |
8 |
São José |
Honda Civic |
5 |
city |
car_model |
sum |
---|---|---|
NULL |
NULL |
78 |
NULL |
Honda Accord |
33 |
NULL |
Honda CRV |
10 |
NULL |
Honda Civic |
35 |
Dublin |
NULL |
33 |
Dublin |
Honda Accord |
10 |
Dublin |
Honda CRV |
3 |
Dublin |
Honda Civic |
20 |
Fremont |
NULL |
32 |
Fremont |
Honda Accord |
15 |
Fremont |
Honda CRV |
7 |
Fremont |
Honda Civic |
10 |
São José |
NULL |
13 |
São José |
Honda Accord |
8 |
São José |
Honda Civic |
5 |
Problemas conhecidos¶
Não foram encontrados problemas