modin.pandas.read_parquet¶
- modin.pandas.read_parquet(path: FilePath, engine: str | None = None, columns: list[str] | None = None, storage_options: StorageOptions = None, use_nullable_dtypes: bool | NoDefault = _NoDefault.no_default, dtype_backend: DtypeBackend | NoDefault = _NoDefault.no_default, filesystem: str = None, filters: list[tuple] | list[list[tuple]] | None = None, **kwargs)[source]¶
Read parquet file(s) into a Snowpark pandas DataFrame. This API can read files stored locally or on a Snowflake stage.
Snowpark pandas stages files (unless they’re already staged) and then reads them using Snowflake’s parquet reader.
- Parameters:
path (str) – Local file location or staged file location to read from. Staged file locations starts with a ‘@’ symbol. To read a local file location with a name starting with @, escape it using a @. For more info on staged files, read here.
engine ({{'auto', 'pyarrow', 'fastparquet'}}, default None) – This parameter is not supported and will be ignored.
storage_options (StorageOptions, default None) – This parameter is not supported and will be ignored.
columns (list, default None) – If not None, only these columns will be read from the file.
use_nullable_dtypes (bool, default False) – This parameter is not supported and will raise an error.
dtype_backend ({'numpy_nullable', 'pyarrow'}, default 'numpy_nullable') – This parameter is not supported and will be ignored.
filesystem (fsspec or pyarrow filesystem, default None) – This parameter is not supported and will be ignored.
filters (List[Tuple] or List[List[Tuple]], default None) – This parameter is not supported and will be ignored.
**kwargs (Any, default None) – This parameter is not supported and will be ignored.
- Return type:
Snowpark pandas DataFrame
- Raises:
NotImplementedError if a parameter is not supported. –
Notes
Both local files and files staged on Snowflake can be passed into
path
. A single file or a folder that matches a set of files can be passed intopath
. The order of rows in the dataframe may be different from the order of records in an input file. When reading multiple files, there is no deterministic order in which the files are read.Examples
Read local parquet file.
>>> import pandas as native_pd >>> import tempfile >>> temp_dir = tempfile.TemporaryDirectory() >>> temp_dir_name = temp_dir.name
>>> df = native_pd.DataFrame( ... {"foo": range(3), "bar": range(5, 8)} ... ) >>> df foo bar 0 0 5 1 1 6 2 2 7
>>> _ = df.to_parquet(f'{temp_dir_name}/snowpark-pandas.parquet') >>> restored_df = pd.read_parquet(f'{temp_dir_name}/snowpark-pandas.parquet') >>> restored_df foo bar 0 0 5 1 1 6 2 2 7
>>> restored_bar = pd.read_parquet(f'{temp_dir_name}/snowpark-pandas.parquet', columns=["bar"]) >>> restored_bar bar 0 5 1 6 2 7
Read staged parquet file.
>>> _ = session.sql("create or replace temp stage mytempstage").collect() >>> _ = session.file.put(f'{temp_dir_name}/snowpark-pandas.parquet', '@mytempstage/myprefix') >>> df2 = pd.read_parquet('@mytempstage/myprefix/snowpark-pandas.parquet') >>> df2 foo bar 0 0 5 1 1 6 2 2 7
Read parquet files from a local folder.
>>> _ = df.to_parquet(f'{temp_dir_name}/snowpark-pandas2.parquet') >>> df3 = pd.read_parquet(f'{temp_dir_name}') >>> df3 foo bar 0 0 5 1 1 6 2 2 7 3 0 5 4 1 6 5 2 7
Read parquet files from a staged location.
>>> _ = session.file.put(f'{temp_dir_name}/snowpark-pandas2.parquet', '@mytempstage/myprefix') >>> df3 = pd.read_parquet('@mytempstage/myprefix') >>> df3 foo bar 0 0 5 1 1 6 2 2 7 3 0 5 4 1 6 5 2 7