Using pandas DataFrames with the Python Connector¶
pandas is a library for data analysis. With pandas, you use a data structure called a DataFrame to analyze and manipulate two-dimensional data (such as data from a database table).
If you need to get data from a Snowflake database to a pandas DataFrame, you can use the API methods provided with the Snowflake Connector for Python. The connector also provides API methods for writing data from a pandas DataFrame to a Snowflake database.
Currently, the pandas-oriented API methods in the Python connector API work with:
Snowflake Connector 2.1.2 (or higher) for Python.
PyArrow library version 3.0.x.
If you do not have PyArrow installed, you do not need to install PyArrow yourself; installing the Python Connector as documented below automatically installs the appropriate version of PyArrow.
If you already have any version of the PyArrow library other than the recommended version listed above, please uninstall PyArrow before installing the Snowflake Connector for Python. Do not re-install a different version of PyArrow after installing the Snowflake Connector for Python.
pandas 0.25.2 (or higher). Earlier versions might work, but have not been tested.
To install the pandas-compatible version of the Snowflake Connector for Python, execute the command:
pip install "snowflake-connector-python[pandas]"
You must enter the square brackets (
]) as shown in the command. The square brackets specify the
extra part of the package that should be installed.
Use quotes around the name of the package (as shown) to prevent the square brackets from being interpreted as a wildcard.
pip install "snowflake-connector-python[secure-local-storage,pandas]"
Reading data from a Snowflake database to a pandas DataFrame¶
Writing data from a pandas DataFrame to a Snowflake database¶
To write data from a pandas DataFrame to a Snowflake database, do one of the following:
Snowflake to pandas data mapping¶
The table below shows the mapping from Snowflake data types to pandas data types:
Snowflake Data Type
pandas Data Type
FIXED NUMERIC type (scale = 0) except DECIMAL
FIXED NUMERIC type (scale > 0) except DECIMAL
FIXED NUMERIC type DECIMAL
TIMESTAMP_NTZ, TIMESTAMP_LTZ, TIMESTAMP_TZ
If the Snowflake data type is FIXED NUMERIC and the scale is zero, and if the value is NULL, then the value is converted to
float64, not an integer type.
If any conversion causes overflow, the Python connector throws an exception.
Customarily, pandas is imported with the following statement:
import pandas as pd
You might see references to pandas objects as either
Migrating to pandas DataFrames¶
This section is primarily for users who have used pandas (and possibly SQLAlchemy) previously.
Previous pandas users might have code similar to either of the following:
This example shows the original way to generate a pandas DataFrame from the Python connector:
import pandas as pd def fetch_pandas_old(cur, sql): cur.execute(sql) rows = 0 while True: dat = cur.fetchmany(50000) if not dat: break df = pd.DataFrame(dat, columns=cur.description) rows += df.shape print(rows)
This example shows how to use SQLAlchemy to generate a pandas DataFrame:
import pandas as pd def fetch_pandas_sqlalchemy(sql): rows = 0 for chunk in pd.read_sql_query(sql, engine, chunksize=50000): rows += chunk.shape print(rows)
Code that is similar to either of the preceding examples can be converted to use the Python connector pandas API calls listed in Reading Data from a Snowflake Database to a pandas DataFrame (in this topic).
With support for pandas in the Python connector, SQLAlchemy is no longer needed to convert data in a cursor into a DataFrame.
However, you can continue to use SQLAlchemy if you wish; the Python connector maintains compatibility with SQLAlchemy.