Snowpark Migration Accelerator: Laboratório de migração¶

Nota

This is also part of the Snowflake End-to-End Migration Quickstart available in the Snowflake quickstarts.

Moving the logic and data in a data warehouse is essential to getting an operational database on a new platform. But to take advantage of the new platform in a functional way, any pipelines running moving data in or out of that data platform need to be repointed or replatformed as well. This can often be challenging as there are usually a variety of pipelines being used. This HoL will focus on just one for which Snowflake can provide some acceleration. But note that new ETL and pipeline accelerators are constantly being developed.

Let’s talk about the pipeline and the notebook we are moving in this scenario. As a reminder, this a SQL Server database migration, but scoped to a Proof of Concept. A small data mart in SQL Server has been moved by AdventureWorks to Snowflake. There is a basic pipeline script and a reporting notebook that AdventureWorks has included as part of this POC. Here is a summary of each artifact:

The pipeline script is written in Python using Spark. This script is reading an accessible file generated by an older POS system in a local directory at regular intervals run by an orchestration tool. (Something like Airflow, but the orchestration is not part of the POC, so we’re not 100% sure what it is.)
O notebook é um notebook de relatórios que lê do banco de dados existente do SQL Server e gera relatórios com algumas métricas resumidas.

Nenhum deles é muito complexo, mas ambos são apenas a ponta do iceberg. Existem centenas de outros scripts de pipeline e notebooks relacionados a outros data marts. Este POC apenas moverá estes dois.

Both of these use Spark and access the SQL Server database. So our goal is essentially to move the operations in Spark into Snowpark. Let’s see how we would do this using the Snowpark Migration Accelerator (SMA). The SMA is a sister tool to SnowConvert and is built on the same foundation. We are going to walk through many steps (most of which will be similar to what we did with SnowConvert), but note that we are still essentially working through the same assessment -> conversion -> validation flow that we have already walked through.

Notas sobre este ambiente de laboratório¶

This lab uses the Snowpark Migration Accelerator and the Snowflake VS Code Extension. But to make the most of this, you will need to run Python with a PySpark. The simplest way to start this would be to start an environment with the anaconda distribution. This will have most of the packages needed to run the code in this lab.

Você ainda precisará disponibilizar os seguintes recursos:

Bibliotecas Python
Extensões VS Code
- Snowflake
- Python
- Jupyter
Outros
- PySpark JDBC Driver of SQL Server

Dito isso, você ainda pode executar este laboratório apenas com uma conta Snowflake, o SMA e a extensão Snowflake VS Code Você não poderá executar tudo (principalmente o código-fonte), mas poderá usar todos os elementos convertidos no Snowflake.

Agora, vamos começar avaliando o que temos.