Snowpark Migration Accelerator: Praktische Übungen für die Migration¶
Bemerkung
This is also part of the Snowflake End-to-End Migration Quickstart available in the Snowflake quickstarts.
Moving the logic and data in a data warehouse is essential to getting an operational database on a new platform. But to take advantage of the new platform in a functional way, any pipelines running moving data in or out of that data platform need to be repointed or replatformed as well. This can often be challenging as there are usually a variety of pipelines being used. This HoL will focus on just one for which Snowflake can provide some acceleration. But note that new ETL and pipeline accelerators are constantly being developed.
Let’s talk about the pipeline and the notebook we are moving in this scenario. As a reminder, this a SQL Server database migration, but scoped to a Proof of Concept. A small data mart in SQL Server has been moved by AdventureWorks to Snowflake. There is a basic pipeline script and a reporting notebook that AdventureWorks has included as part of this POC. Here is a summary of each artifact:
The pipeline script is written in Python using Spark. This script is reading an accessible file generated by an older POS system in a local directory at regular intervals run by an orchestration tool. (Something like Airflow, but the orchestration is not part of the POC, so we’re not 100% sure what it is.)
Das Notebook ist ein Berichts-Notebook, das aus der vorhandenen SQL Server-Datenbank und entsprechenden Berichten über einige zusammenfassende Kennzahlen liest.
Keine dieser Aufgaben ist zu komplex, aber beide stellen nur die Spitze des Eisbergs dar. Es gibt Hunderte von weiteren Pipeline-Skripten und -Notebooks, die sich auf andere Data Marts beziehen. Dieses POC verschiebt einfach diese beiden.
Both of these use Spark and access the SQL Server database. So our goal is essentially to move the operations in Spark into Snowpark. Let’s see how we would do this using the Snowpark Migration Accelerator (SMA). The SMA is a sister tool to SnowConvert and is built on the same foundation. We are going to walk through many steps (most of which will be similar to what we did with SnowConvert), but note that we are still essentially working through the same assessment -> conversion -> validation flow that we have already walked through.
Hinweise zu dieser Umgebung für praktische Übungen¶
This lab uses the Snowpark Migration Accelerator and the Snowflake VS Code Extension. But to make the most of this, you will need to run Python with a PySpark. The simplest way to start this would be to start an environment with the anaconda distribution. This will have most of the packages needed to run the code in this lab.
Sie müssen noch die folgenden Ressourcen zur Verfügung stellen:
Python-Bibliotheken
VS-Codeerweiterung
Andere
Ansonsten können Sie die praktischen Übungen immer noch mit einem Snowflake-Konto, dem SMA und der Snowflake VS-Codeerweiterung ausführen. Sie können nicht alles ausführen (insbesondere den Quellcode), aber Sie können alle konvertierten Elemente in Snowflake verwenden.
Beginnen wir nun damit, zu beurteilen, was wir haben.