JDBC Datasources Setup for Snowpark Connect for Spark¶
This section provides a guide and sample code for reading data from external databases and writing data to external databases (such as MySQL and PostgreSQL) using the Snowpark Connect JDBC data source feature. It covers both client-side and Snowflake Notebook setup.
Part 1: Client-side setup (MySQL)¶
This setup is required when running Snowpark Connect from a local client application, such as a Python script or IDE.
Prerequisites¶
Java Runtime Environment (JRE) / Java Development Kit (JDK):
Install a JRE or JDK. The architecture (for example, 64-bit) of your Java installation must match the architecture of your Python installation.
Example source for installation: Adoptium Temurin Releases (if using Java 11).
Set ``JAVA_HOME`` Environment Variable:
Configure the
JAVA_HOMEenvironment variable to point to the root directory of your Java installation.Example (macOS/Linux):
Set ``CLASSPATH`` Environment Variable:
Add the path to your specific database’s JDBC driver
.jarfile to theCLASSPATHenvironment variable. This allows the Java environment to find the necessary driver.Example (for MySQL driver):
Sample client code (read from MySQL)¶
This example demonstrates how to read a table from a MySQL database
using spark_session.read.jdbc().
Sample client code (write to MySQL)¶
This example demonstrates how to write data into a MySQL database using
spark_session.write.jdbc().
Part 2: Snowflake Warehouse Notebook setup (PostgreSQL)¶
This setup is used when running Snowpark Connect directly within a Snowflake Notebook environment.
Setup steps¶
Add the ``snowpark-connect`` Package:
Ensure the
snowflake-snowpark-connectpackage is added to your notebook environment.
Download and Upload JDBC Driver:
Download the appropriate JDBC driver
.jarfile for your external database (for example, PostgreSQL JDBC Driver).Upload the downloaded
.jarfile directly into your notebook environment.
Activate External Integrations (Network Rule & Integration):
Snowflake requires an External Access Integration to allow the notebook to communicate with external network locations. You must define a Network Rule for the host and port of your external database.
Sample Warehouse Notebook code (read from PostgreSQL)¶
This example shows the necessary Python code to initialize the session, load the driver, and read data from PostgreSQL.
Sample Warehouse Notebook code (write to PostgreSQL)¶
This example shows the necessary Python code to initialize the session, load the driver, and write data into PostgreSQL.
Part 3: Snowflake Workspace Notebook setup (PostgreSQL)¶
This setup is used when running Snowpark Connect directly within a Snowflake Workspace Notebook environment.
Setup steps¶
The
snowpark-connectpackage is included in Workspace Notebook by default.Download and Upload JDBC Driver:
Download the appropriate JDBC driver
.jarfile for your external database (for example, PostgreSQL JDBC Driver).Upload the downloaded
.jarfile directly into your notebook environment.
Create External Integration:
Activate External Integrations (Network Rule & Integration):
Snowflake requires an External Access Integration to allow the notebook to communicate with external network locations. You must define a Network Rule for the host and port of your external database.
Sample Workspace Notebook code (read from PostgreSQL)¶
This example shows the necessary Python code to initialize the session, load the driver, and read data from PostgreSQL.
Sample Workspace Notebook code (write to PostgreSQL)¶
This example shows the necessary Python code to initialize the session, load the driver, and write data into PostgreSQL.
Supported datasources¶
SQL Server
MySQL
PostgreSQL