modin.pandas.read_parquet

modin.pandas.read_parquet(path: FilePath, engine: str | None = None, columns: list[str] | None = None, storage_options: StorageOptions = None, use_nullable_dtypes: bool | NoDefault = _NoDefault.no_default, dtype_backend: DtypeBackend | NoDefault = _NoDefault.no_default, filesystem: str = None, filters: list[tuple] | list[list[tuple]] | None = None, **kwargs)[source]

Read parquet file(s) into a Snowpark pandas DataFrame. This API can read files stored locally or on a Snowflake stage.

Snowpark pandas stages files (unless they’re already staged) and then reads them using Snowflake’s parquet reader.

Parameters:
  • path (str) – Local file location or staged file location to read from. Staged file locations starts with a ‘@’ symbol. To read a local file location with a name starting with @, escape it using a @. For more info on staged files, read here.

  • engine ({{'auto', 'pyarrow', 'fastparquet'}}, default None) – This parameter is not supported and will be ignored.

  • storage_options (StorageOptions, default None) – This parameter is not supported and will be ignored.

  • columns (list, default None) – If not None, only these columns will be read from the file.

  • use_nullable_dtypes (bool, default False) – This parameter is not supported and will raise an error.

  • dtype_backend ({'numpy_nullable', 'pyarrow'}, default 'numpy_nullable') – This parameter is not supported and will be ignored.

  • filesystem (fsspec or pyarrow filesystem, default None) – This parameter is not supported and will be ignored.

  • filters (List[Tuple] or List[List[Tuple]], default None) – This parameter is not supported and will be ignored.

  • **kwargs (Any, default None) – This parameter is not supported and will be ignored.

Return type:

Snowpark pandas DataFrame

Raises:

NotImplementedError if a parameter is not supported.

Notes

Both local files and files staged on Snowflake can be passed into path. A single file or a folder that matches a set of files can be passed into path. The order of rows in the dataframe may be different from the order of records in an input file. When reading multiple files, there is no deterministic order in which the files are read.

Examples

Read local parquet file.

>>> import pandas as native_pd
>>> import tempfile
>>> temp_dir = tempfile.TemporaryDirectory()
>>> temp_dir_name = temp_dir.name
Copy
>>> df = native_pd.DataFrame(
...     {"foo": range(3), "bar": range(5, 8)}
...    )
>>> df
   foo  bar
0    0    5
1    1    6
2    2    7
Copy
>>> _ = df.to_parquet(f'{temp_dir_name}/snowpark-pandas.parquet')
>>> restored_df = pd.read_parquet(f'{temp_dir_name}/snowpark-pandas.parquet')
>>> restored_df
   foo  bar
0    0    5
1    1    6
2    2    7
Copy
>>> restored_bar = pd.read_parquet(f'{temp_dir_name}/snowpark-pandas.parquet', columns=["bar"])
>>> restored_bar
   bar
0    5
1    6
2    7
Copy

Read staged parquet file.

>>> _ = session.sql("create or replace temp stage mytempstage").collect()
>>> _ = session.file.put(f'{temp_dir_name}/snowpark-pandas.parquet', '@mytempstage/myprefix')
>>> df2 = pd.read_parquet('@mytempstage/myprefix/snowpark-pandas.parquet')
>>> df2
   foo  bar
0    0    5
1    1    6
2    2    7
Copy

Read parquet files from a local folder.

>>> _ = df.to_parquet(f'{temp_dir_name}/snowpark-pandas2.parquet')
>>> df3 = pd.read_parquet(f'{temp_dir_name}')
>>> df3
   foo  bar
0    0    5
1    1    6
2    2    7
3    0    5
4    1    6
5    2    7
Copy

Read parquet files from a staged location.

>>> _ = session.file.put(f'{temp_dir_name}/snowpark-pandas2.parquet', '@mytempstage/myprefix')
>>> df3 = pd.read_parquet('@mytempstage/myprefix')
>>> df3
   foo  bar
0    0    5
1    1    6
2    2    7
3    0    5
4    1    6
5    2    7
Copy