Snowpark Migration Accelerator : Laboratoire de migration¶

Note

This is also part of the Snowflake End-to-End Migration Quickstart available in the Snowflake quickstarts.

Moving the logic and data in a data warehouse is essential to getting an operational database on a new platform. But to take advantage of the new platform in a functional way, any pipelines running moving data in or out of that data platform need to be repointed or replatformed as well. This can often be challenging as there are usually a variety of pipelines being used. This HoL will focus on just one for which Snowflake can provide some acceleration. But note that new ETL and pipeline accelerators are constantly being developed.

Let’s talk about the pipeline and the notebook we are moving in this scenario. As a reminder, this a SQL Server database migration, but scoped to a Proof of Concept. A small data mart in SQL Server has been moved by AdventureWorks to Snowflake. There is a basic pipeline script and a reporting notebook that AdventureWorks has included as part of this POC. Here is a summary of each artifact:

The pipeline script is written in Python using Spark. This script is reading an accessible file generated by an older POS system in a local directory at regular intervals run by an orchestration tool. (Something like Airflow, but the orchestration is not part of the POC, so we’re not 100% sure what it is.)
Le notebook est un notebook de rapports qui lit à partir de la base de données et des rapports existants du serveur SQL sur quelques métriques sommaires.

Aucun de ces éléments n’est trop complexe, mais ils ne sont que la partie émergée de l’iceberg. Il existe des centaines d’autres scripts et notebooks de pipelines liés à d’autres Datamarts. Cette POC les déplacera simplement.

Both of these use Spark and access the SQL Server database. So our goal is essentially to move the operations in Spark into Snowpark. Let’s see how we would do this using the Snowpark Migration Accelerator (SMA). The SMA is a sister tool to SnowConvert and is built on the same foundation. We are going to walk through many steps (most of which will be similar to what we did with SnowConvert), but note that we are still essentially working through the same assessment -> conversion -> validation flow that we have already walked through.

Notes sur cet environnement d’atelier¶

This lab uses the Snowpark Migration Accelerator and the Snowflake VS Code Extension. But to make the most of this, you will need to run Python with a PySpark. The simplest way to start this would be to start an environment with the anaconda distribution. This will have most of the packages needed to run the code in this lab.

Vous devrez aussi voir accès aux ressources suivantes :

Bibliothèques Python
Extensions VS Code
- Snowflake
- Python
- Jupyter
Autre
- PySpark JDBC Driver of SQL Server

Ceci étant dit, vous pouvez toujours exécuter cet atelier avec un simple compte Snowflake, SMA, et l’extension Snowflake VS Code. Vous ne pourrez pas tout exécuter (en particulier, le code source), mais vous pourrez utiliser tous les éléments convertis dans Snowflake.

Commençons par évaluer ce que nous avons.