Using Pandas DataFrames with the Python Connector

Pandas is a library for data analysis. With Pandas, you use a data structure called a DataFrame to analyze and manipulate two-dimensional data (such as data from a database table).

If you need to get data from a Snowflake database to a Pandas DataFrame, you can use the API methods provided with the Snowflake Connector for Python. The connector also provides API methods for writing data from a Pandas DataFrame to a Snowflake database.

Note

Some of these API methods require a specific version of the PyArrow library. See Requirements for details.

In this Topic:

Requirements

Currently, the Pandas-oriented API methods in the Python connector API work with:

  • Snowflake Connector 2.1.2 (or higher) for Python.

  • PyArrow library version 0.17.0.

    If you do not have PyArrow installed, you do not need to install PyArrow yourself; installing the Python Connector as documented below automatically installs the appropriate version of PyArrow.

    Caution

    If you already have any version of the PyArrow library other than the recommended version listed above, please uninstall PyArrow before installing the Snowflake Connector for Python. Do not re-install a different version of PyArrow after installing the Snowflake Connector for Python.

  • Pandas 0.25.2 (or higher). Earlier versions might work, but have not been tested.

  • pip 19.0 (or higher).

  • Python

    • For MS-Windows: 3.5, 3.6, or 3.7.

    • For linux and macOS: 3.5, 3.6, 3.7, or 3.8.

Installation

To install the Pandas-compatible version of the Snowflake Connector for Python, execute the command:

pip install snowflake-connector-python[pandas]

You must enter the square brackets ([ and ]) as shown in the command. The square brackets specify the extra part of the package that should be installed.

Use quotes around the name of the package (as shown) to prevent the square brackets from being interpreted as a wildcard.

If you need to install other extras (for example, secure-local-storage for caching connections with browser-based SSO), use a comma between the extras:

pip install "snowflake-connector-python[secure-local-storage,pandas]"

Reading Data from a Snowflake Database to a Pandas DataFrame

To read data into a Pandas DataFrame, you use a Cursor to retrieve the data and then call one of these Cursor methods to put the data into a Pandas DataFrame:

Writing Data from a Pandas DataFrame to a Snowflake Database

To write data from a Pandas DataFrame to a Snowflake database, do one of the following:

Snowflake to Pandas Data Mapping

The table below shows the mapping from Snowflake data types to Pandas data types:

Snowflake Data Type

Pandas Data Type

FIXED NUMERIC type (scale = 0) except DECIMAL

(u)int{8,16,32,64} or float64 (for NULL)

FIXED NUMERIC type (scale > 0) except DECIMAL

float64

FIXED NUMERIC type DECIMAL

decimal

FLOAT/DOUBLE

float64

VARCHAR

str

BINARY

str

VARIANT

str

DATE

object (with datetime.date objects)

TIME

pandas.Timestamp(np.datetime64[ns])

TIMESTAMP_NTZ, TIMESTAMP_LTZ, TIMESTAMP_TZ

pandas.Timestamp(np.datetime64[ns])

Notes:

  • If the Snowflake data type is FIXED NUMERIC and the scale is zero, and if the value is NULL, then the value is converted to float64, not an integer type.

  • If any conversion causes overflow, the Python connector throws an exception.

Importing Pandas

Customarily, Pandas is imported with the following statement:

import pandas as pd

You might see references to Pandas objects as either pandas.object or pd.object.

Migrating to Pandas DataFrames

This section is primarily for users who have used Pandas (and possibly SQLAlchemy) previously.

Previous Pandas users might have code similar to either of the following:

  • This example shows the original way to generate a Pandas DataFrame from the Python connector:

    import pandas as pd
    
    def fetch_pandas_old(cur, sql):
        cur.execute(sql)
        rows = 0
        while True:
            dat = cur.fetchmany(50000)
            if not dat:
                break
            df = pd.DataFrame(dat, columns=cur.description)
            rows += df.shape[0]
        print(rows)
    
  • This example shows how to use SQLAlchemy to generate a Pandas DataFrame:

    import pandas as pd
    
    def fetch_pandas_sqlalchemy(sql):
        rows = 0
        for chunk in pd.read_sql_query(sql, engine, chunksize=50000):
            rows += chunk.shape[0]
        print(rows)
    

Code that is similar to either of the preceding examples can be converted to use the Python connector Pandas API calls listed in Reading Data from a Snowflake Database to a Pandas DataFrame (in this topic).

Note

With support for Pandas in the Python connector, SQLAlchemy is no longer needed to convert data in a cursor into a DataFrame.

However, you can continue to use SQLAlchemy if you wish; the Python connector maintains compatibility with SQLAlchemy.