Snowpark Migration Accelerator: 병합

설명

The MERGE statement combines data from one or more source tables with a target table, allowing you to perform updates and inserts in a single operation. Based on conditions you define, it determines whether to update existing rows or insert new ones in the target table. This makes it more efficient than using separate INSERT, UPDATE, and DELETE statements. The MERGE statement always produces consistent results when run multiple times with the same data.

In Spark, you can find the MERGE syntax in the Spark documentation.

MERGE INTO target_table_name [target_alias]
   USING source_table_reference [source_alias]
   ON merge_condition
   { WHEN MATCHED [ AND matched_condition ] THEN matched_action |
     WHEN NOT MATCHED [BY TARGET] [ AND not_matched_condition ] THEN not_matched_action |
     WHEN NOT MATCHED BY SOURCE [ AND not_matched_by_source_condition ] THEN not_matched_by_source_action } [...]

matched_action
 { DELETE |
   UPDATE SET * |
   UPDATE SET { column = { expr | DEFAULT } } [, ...] }

not_matched_action
 { INSERT * |
   INSERT (column1 [, ...] ) VALUES ( expr | DEFAULT ] [, ...] )

not_matched_by_source_action
 { DELETE |
   UPDATE SET { column = { expr | DEFAULT } } [, ...] }

In Snowflake, the MERGE statement follows this syntax (For additional details, refer to the Snowflake documentation):

MERGE INTO <target_table> USING <source> ON <join_expr> { matchedClause | notMatchedClause } [ ... ]

matchedClause ::=
  WHEN MATCHED [ AND <case_predicate> ] THEN { UPDATE SET <col_name> = <expr> [ , <col_name2> = <expr2> ... ] | DELETE } [ ... ]

notMatchedClause ::=
   WHEN NOT MATCHED [ AND <case_predicate> ] THEN INSERT [ ( <col_name> [ , ... ] ) ] VALUES ( <expr> [ , ... ] )

The key distinction is that Snowflake lacks a direct equivalent to the WHEN NOT MATCHED BY SOURCE clause. A workaround solution is required to achieve similar functionality in Snowflake.

샘플 소스 패턴

샘플 보조 데이터

참고

작동 방식을 더 잘 이해할 수 있도록 다음 코드 예제를 실행해 보았습니다.

CREATE OR REPLACE people_source (
  person_id  INTEGER NOT NULL PRIMARY KEY,
  first_name STRING NOT NULL,
  last_name  STRING NOT NULL,
  title      STRING NOT NULL,
);

CREATE OR REPLACE TABLE people_target (
  person_id  INTEGER NOT NULL PRIMARY KEY,
  first_name STRING NOT NULL,
  last_name  STRING NOT NULL,
  title      STRING NOT NULL DEFAULT 'NONE'
);


INSERT INTO people_target VALUES (1, 'John', 'Smith', 'Mr');
INSERT INTO people_target VALUES (2, 'alice', 'jones', 'Mrs');
INSERT INTO people_source VALUES (2, 'Alice', 'Jones', 'Mrs.');
INSERT INTO people_source VALUES (3, 'Jane', 'Doe', 'Miss');
INSERT INTO people_source VALUES (4, 'Dave', 'Brown', 'Mr');
CREATE OR REPLACE TABLE people_source (
    person_id  INTEGER NOT NULL PRIMARY KEY,
    first_name VARCHAR(20) NOT NULL,
    last_name VARCHAR(20) NOT NULL,
    title VARCHAR(10) NOT NULL
);

CREATE OR REPLACE TABLE people_target (
    person_id  INTEGER NOT NULL PRIMARY KEY,
    first_name VARCHAR(20) NOT NULL,
    last_name VARCHAR(20) NOT NULL,
    title VARCHAR(10) NOT NULL DEFAULT 'NONE'
);


INSERT INTO people_target VALUES (1, 'John', 'Smith', 'Mr');
INSERT INTO people_target VALUES (2, 'alice', 'jones', 'Mrs');
INSERT INTO people_source VALUES (2, 'Alice', 'Jones', 'Mrs.');
INSERT INTO people_source VALUES (3, 'Jane', 'Doe', 'Miss');
INSERT INTO people_source VALUES (4, 'Dave', 'Brown', 'Mr');

MERGE 문 - 케이스 삽입 및 업데이트

Spark

MERGE INTO people_target pt
USING people_source ps
ON    (pt.person_id = ps.person_id)
WHEN MATCHED THEN UPDATE
  SET pt.first_name = ps.first_name,
      pt.last_name = ps.last_name,
      pt.title = DEFAULT
WHEN NOT MATCHED THEN INSERT
  (pt.person_id, pt.first_name, pt.last_name, pt.title)
  VALUES (ps.person_id, ps.first_name, ps.last_name, ps.title);


SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        1|John      |Smith    |Mr   |
        2|Alice     |Jones    |NONE |
        3|Jane      |Doe      |Miss |
        4|Dave      |Brown    |Mr   |

Snowflake

MERGE INTO people_target2 pt
USING people_source ps
ON    (pt.person_id = ps.person_id)
WHEN MATCHED THEN UPDATE
  SET pt.first_name = ps.first_name,
      pt.last_name = ps.last_name,
      pt.title = DEFAULT
WHEN NOT MATCHED THEN INSERT
  (pt.person_id, pt.first_name, pt.last_name, pt.title)
  VALUES (ps.person_id, ps.first_name, ps.last_name, ps.title);


SELECT * FROM PUBLIC.people_target ORDER BY person_id;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        1|John      |Smith    |Mr   |
        2|Alice     |Jones    |NONE |
        3|Jane      |Doe      |Miss |
        4|Dave      |Brown    |Mr   |

The INSERT and UPDATE operations work the same way in Snowflake. In both SQL dialects, you can use DEFAULT as an expression to set a column to its default value.

Spark에서는 열을 명시적으로 나열하지 않고도 삽입 및 업데이트 작업을 수행할 수 있습니다. 열을 지정하지 않으면 작업은 테이블의 모든 열에 영향을 줍니다. 이 기능이 올바르게 작동하려면 소스 테이블과 대상 테이블의 열 구조가 동일해야 합니다. 열 구조가 일치하지 않으면 구문 분석 오류가 발생합니다.

UPDATE SET *
-- This is equivalent to UPDATE SET col1 = source.col1 [, col2 = source.col2 ...]

INSERT *
-- This command copies all columns from the source table to the target table, matching columns by name. It is the same as explicitly listing all columns in both the INSERT and VALUES clauses.

Since Snowflake doesn't support these options, the migration process will instead list all columns from the target table.

### MERGE Statement - Delete Case


```{code} sql
:force:
MERGE INTO people_target pt
USING people_source ps
ON    (pt.person_id = ps.person_id)
WHEN MATCHED AND pt.person_id < 3 THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT *;

SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        1|John      |Smith    |Mr   |
        3|Jane      |Doe      |Miss |
        4|Dave      |Brown    |Mr   |

Snowflake

MERGE INTO people_target pt
USING people_source ps
ON    (pt.person_id = ps.person_id)
WHEN MATCHED AND pt.person_id < 3 THEN DELETE
WHEN NOT MATCHED THEN INSERT
  (pt.person_id, pt.first_name, pt.last_name, pt.title)
  VALUES (ps.person_id, ps.first_name, ps.last_name, ps.title);


SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        1|John      |Smith    |Mr   |
        3|Jane      |Doe      |Miss |
        4|Dave      |Brown    |Mr   |

The DELETE action in Snowflake works the same way as in other databases. You can also add additional conditions to the MATCHED and NOT MATCHED clauses.

WHEN NOT MATCHED BY TARGET and WHEN NOT MATCHED are equivalent clauses that can be used interchangeably in SQL merge statements.

MERGE 문 - WHEN NOT MATCHED BY SOURCE

WHEN NOT MATCHED BY SOURCE clauses are triggered when a row in the target table has no matching rows in the source table. This occurs when both the merge_condition and the optional not_match_by_source_condition evaluate to true. For more details, see the Spark documentation.

Snowflake does not support this clause directly. To handle this limitation, you can use the following workaround for both DELETE and UPDATE actions.

MERGE INTO people_target pt
USING people_source ps
ON pt.person_id = ps.person_id
WHEN NOT MATCHED BY SOURCE THEN DELETE;


SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        2|Alice     |Jones    |NONE |

Snowflake

MERGE INTO people_target pt
USING (
    SELECT
        pt.person_id
    FROM
        people_target pt LEFT
    JOIN people_source ps ON pt.person_id = ps.person_id
    WHERE
        ps.person_id is null
) s_src
    ON s_src.person_id = pt.person_id
WHEN MATCHED THEN DELETE;

SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        2|Alice     |Jones    |NONE |

The DELETE action in Snowflake works the same way as in other databases. You can also add additional conditions to the MATCHED and NOT MATCHED clauses.

알려진 문제

1. MERGE 는 두 언어에서 매우 유사합니다

Apache Spark는 추가 기능을 제공하지만, 이전 예제에서 설명한 것처럼 다른 접근법을 사용하여 Snowflake에서 유사한 기능을 구현할 수 있습니다.