Snowpark Migration Accelerator: 마이그레이션 랩¶

참고

This is also part of the Snowflake End-to-End Migration Quickstart available in the Snowflake quickstarts.

Moving the logic and data in a data warehouse is essential to getting an operational database on a new platform. But to take advantage of the new platform in a functional way, any pipelines running moving data in or out of that data platform need to be repointed or replatformed as well. This can often be challenging as there are usually a variety of pipelines being used. This HoL will focus on just one for which Snowflake can provide some acceleration. But note that new ETL and pipeline accelerators are constantly being developed.

Let’s talk about the pipeline and the notebook we are moving in this scenario. As a reminder, this a SQL Server database migration, but scoped to a Proof of Concept. A small data mart in SQL Server has been moved by AdventureWorks to Snowflake. There is a basic pipeline script and a reporting notebook that AdventureWorks has included as part of this POC. Here is a summary of each artifact:

The pipeline script is written in Python using Spark. This script is reading an accessible file generated by an older POS system in a local directory at regular intervals run by an orchestration tool. (Something like Airflow, but the orchestration is not part of the POC, so we’re not 100% sure what it is.)
이 노트북은 기존 SQL Server 데이터베이스에서 읽고 몇 가지 요약 메트릭을 보고하는 보고 노트북입니다.

두 가지 모두 매우 복잡하지는 않지만, 둘 다 빙산의 일각에 불과합니다. 다른 데이터 마트와 관련된 파이프라인 스크립트와 노트북이 수백 개 더 있습니다. 이 POC에서는 이 두 가지만 이동시킵니다.

Both of these use Spark and access the SQL Server database. So our goal is essentially to move the operations in Spark into Snowpark. Let’s see how we would do this using the Snowpark Migration Accelerator (SMA). The SMA is a sister tool to SnowConvert and is built on the same foundation. We are going to walk through many steps (most of which will be similar to what we did with SnowConvert), but note that we are still essentially working through the same assessment -> conversion -> validation flow that we have already walked through.

이 랩 환경에 대한 참고 사항¶

This lab uses the Snowpark Migration Accelerator and the Snowflake VS Code Extension. But to make the most of this, you will need to run Python with a PySpark. The simplest way to start this would be to start an environment with the anaconda distribution. This will have most of the packages needed to run the code in this lab.

다음 리소스를 사용할 수 있어야 합니다.

Python 라이브러리
VS Code 확장 프로그램
- Snowflake
- Python
- Jupyter
기타
- PySpark JDBC Driver of SQL Server

하지만 Snowflake 계정, SMA, Snowflake VS Code 확장 프로그램만으로 이 랩을 실행할 수 있습니다. 모든 리소스(특히 소스 코드)를 실행할 수는 없지만, 변환된 모든 요소를 Snowflake에서 사용할 수 있습니다.

이제 현재 가지고 있는 리소스를 평가해 보겠습니다.