SELECT¶

Applies to
  • Hive SQL

  • Spark SQL

  • Databricks SQL

Description¶

Spark supports a SELECT statement and conforms to the ANSI SQL standard. Queries are used to retrieve result sets from one or more tables. (Spark SQL Language Reference SELECT)

Warning

This grammar is partially supported in Snowflake. Translation pending for these CREATE VIEW elements:

[ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] } ]
[ CLUSTER BY { expression [ , ... ] } ]
[ DISTRIBUTE BY { expression [, ... ] } ]
[ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
[ PIVOT clause ]
[ UNPIVOT clause ]
[ LATERAL VIEW clause ] [ ... ]
[ regex_column_names ]
[ TRANSFORM (...) ]
[ LIMIT non_literal_expression ]

from_item :=
join_relation
table_value_function
LATERAL(subquery)
file_format.`file_path`

select_statement { INTERSECT | EXCEPT } { ALL | DISTINCT } select_statement
Copy

Grammar Syntax¶

[ WITH with_query [ , ... ] ]
select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
    [ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] } ]
    [ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] } ]
    [ CLUSTER BY { expression [ , ... ] } ]
    [ DISTRIBUTE BY { expression [, ... ] } ]
    [ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
    [ LIMIT { ALL | expression } ]

select_statement :=
SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ [ named_expression | regex_column_names ] [ , ... ] | TRANSFORM (...) ] }
    FROM { from_item [ , ... ] }
    [ PIVOT clause ]
    [ UNPIVOT clause ]
    [ LATERAL VIEW clause ] [ ... ] 
    [ WHERE boolean_expression ]
    [ GROUP BY expression [ , ... ] ]
    [ HAVING boolean_expression ]
    
with_query :=
expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( query )

from_item :=
table_relation |
join_relation |
table_value_function |
inline_table |
LATERAL(subquery) |
file_format.`file_path` 
Copy

Sample Source Patterns¶

GROUP BY¶

The WITH { CUBE | ROLLUP } syntax is transformed to its CUBE(expr1, ...) or ROLLUP(expr1, ...) equivalent

Input Code:¶

-- Basic case of GROUP BY
SELECT id, sum(quantity) FROM dealer GROUP BY 1;

-- Grouping by GROUPING SETS
SELECT city, car_model, sum(quantity) AS sum FROM dealer
    GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ());

-- Grouping by ROLLUP
SELECT city, car_model, sum(quantity) AS sum FROM dealer
    GROUP BY ROLLUP(city, car_model);

SELECT city, car_model, sum(quantity) AS sum FROM dealer
    GROUP BY city, car_model WITH ROLLUP;

-- Grouping by CUBE
SELECT city, car_model, sum(quantity) AS sum FROM dealer
    GROUP BY CUBE(city, car_model);

SELECT city, car_model, sum(quantity) AS sum FROM dealer
    GROUP BY city, car_model WITH CUBE;
Copy

Output Code:¶

-- Basic case of GROUP BY
SELECT id,
    SUM(quantity) FROM
    dealer
GROUP BY 1;

-- Grouping by GROUPING SETS
SELECT city, car_model,
    SUM(quantity) AS sum FROM
    dealer
    GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), () !!!RESOLVE EWI!!! /*** SSC-EWI-0073 - PENDING FUNCTIONAL EQUIVALENCE REVIEW FOR 'EmptyGroupingSet' NODE ***/!!!);

-- Grouping by ROLLUP
SELECT city, car_model,
    SUM(quantity) AS sum FROM
    dealer
    GROUP BY
    ROLLUP(city, car_model);

SELECT city, car_model,
    SUM(quantity) AS sum FROM
    dealer
GROUP BY
    ROLLUP(city, car_model);

-- Grouping by CUBE
SELECT city, car_model,
    SUM(quantity) AS sum FROM
    dealer
    GROUP BY CUBE(city, car_model) !!!RESOLVE EWI!!! /*** SSC-EWI-0073 - PENDING FUNCTIONAL EQUIVALENCE REVIEW FOR 'CUBE' NODE ***/!!!;

SELECT city, car_model,
    SUM(quantity) AS sum FROM
    dealer
GROUP BY
    CUBE(city, car_model);
Copy

Hints¶

Snowflake performs automatic optimization of JOINs and partitioning, meaning that hints are unnecessary, they are preserved as comments in the output code.

Input Code:¶

SELECT
/*+ REBALANCE */ /*+ COALESCE(2) */
*
FROM my_table;
Copy

Output Code:¶

SELECT
/*+ REBALANCE */ /*+ COALESCE(2) */
*
FROM
my_table;
Copy

CTE¶

The AS keyword is optional in Spark/Databricks, however in Snowflake is required so it is added.

Input Code:¶

WITH my_cte (
   SELECT id, name FROM my_table
)
SELECT *
FROM my_cte
WHERE id = 1;
Copy

Output Code:¶

WITH my_cte AS (
     SELECT id, name FROM
        my_table
  )
SELECT *
FROM
     my_cte
WHERE id = 1;
Copy

LIMIT¶

LIMIT ALL is removed as it is not needed in Snowflake, LIMIT with a literal value is preserved as-is.

Input Code:¶

SELECT * FROM my_table LIMIT ALL;

SELECT * FROM my_table LIMIT 5;
Copy

Output Code:¶

SELECT * FROM
my_table;

SELECT * FROM
my_table
LIMIT 5;
Copy

ORDER BY¶

Note

This clause is fully supported in Snowflake

WHERE¶

Note

This clause is fully supported in Snowflake

HAVING¶

Note

This clause is fully supported in Snowflake

FROM table_relation¶

Note

This clause is fully supported in Snowflake

FROM inline_table¶

Note

This clause is fully supported in Snowflake

UNION [ALL | DISTINCT]¶

Note

This clause is fully supported in Snowflake

INTERSECT (no keywords)¶

Note

This clause is fully supported in Snowflake

EXCEPT (no keywords)¶

Note

This clause is fully supported in Snowflake