snowflake.ml.fileset.sfcfs.SFFileSystem¶
- class snowflake.ml.fileset.sfcfs.SFFileSystem(*args, **kwargs)¶
Bases:
AbstractFileSystem
A filesystem that allows user to access Snowflake stages and stage files with valid Snowflake locations.
The file system is is based on fsspec (https://filesystem-spec.readthedocs.io/). It is a file system wrapper built on top of SFStageFileSystem. It takes Snowflake stage file path as the input and supports read operation. A valid Snowflake location will have the form “@{database_name}.{schema_name}.{stage_name}/{path_to_file}”.
>>> conn = snowflake.connector.connect(**connection_parameters) >>> sffs = SFFileSystem(sf_connection=conn) >>> sffs.ls("@MYDB.public.FOO/nytrain") ['@MYDB.public.FOO/nytrain/data_0_0_0.csv', '@MYDB.public.FOO/nytrain/data_0_0_1.csv'] >>> with sffs.open('@MYDB.public.FOO/nytrain/nytrain/data_0_0_1.csv', mode='rb') as f: >>> print(f.readline()) b'2014-02-05 14:35:00.00000054,13,2014-02-05 14:35:00 UTC,-74.00688,40.73049,-74.00563,40.70676,2
‘
>>> conn = snowflake.connector.connect(**connection_parameters) >>> sffs = fsspec.filesystem("sfc", sf_connection=conn) >>> sffs.ls("@MYDB.public.FOO/nytrain") ['@MYDB.public.FOO/nytrain/data_0_0_0.csv', '@MYDB.public.FOO/nytrain/data_0_0_1.csv'] >>> with sffs.open('@MYDB.public.FOO/nytrain/nytrain/data_0_0_1.csv', mode='rb') as f: >>> print(f.readline()) b'2014-02-05 14:35:00.00000054,13,2014-02-05 14:35:00 UTC,-74.00688,40.73049,-74.00563,40.70676,2
‘
>>> conn = snowflake.connector.connect(**connection_parameters) >>> with fsspec.open("sfc://@MYDB.public.FOO/nytrain/data_0_0_1.csv", mode='rb', sf_connection=conn) as f: >>> print(f.readline()) b'2014-02-05 14:35:00.00000054,13,2014-02-05 14:35:00 UTC,-74.00688,40.73049,-74.00563,40.70676,2
‘
Initialize file system with a Snowflake Python connection.
- Parameters:
sf_connection – A Snowflake python connection object. Either it or snowpark_session must be non-empty.
snowpark_session – A Snowpark session. Either it or sf_connection must be non-empty.
kwargs – Optional. Other parameters that can be passed on to fsspec. Currently supports: - skip_instance_cache: Int. Controls reuse of instances. - cache_type, cache_options, block_size: Configure file buffering. See more information of these options in https://filesystem-spec.readthedocs.io/en/latest/features.html
- Raises:
ValueError – An error occurred when not exactly one of sf_connection and snowpark_session is given.
SnowflakeMLException – A failure was encountered while recreating the SFFileSystem from a serialized state.
Methods
- info(path: str, **kwargs: Any) Dict[str, Any] ¶
Override fsspec info method. Give details of entry at path.
- ls(path: str, detail: bool = False, **kwargs: Any) Union[List[str], List[Dict[str, Any]]] ¶
Override fsspec ls method. List single “directory” with or without details.
- Parameters:
path – location at which to list files. It should be in the format of “@{database}.{schema}.{stage}/{path}”
detail – if True, each list item is a dict of file properties; otherwise, returns list of filenames.
kwargs – additional arguments passed on.
- Returns:
A list of filename if detail is false, or a list of dict if detail is true.
Example: >>> sffs.ls(“@MYDB.public.FOO/”) [‘@MYDB.public.FOO/nytrain’] >>> sffs.ls(“@MYDB.public.FOO/nytrain”) [‘@MYDB.public.FOO/nytrain/data_0_0_0.csv’, ‘@MYDB.public.FOO/nytrain/data_0_0_1.csv’] >>> sffs.ls(“@MYDB.public.FOO/nytrain/”) [‘@MYDB.public.FOO/nytrain/data_0_0_0.csv’, ‘@MYDB.public.FOO/nytrain/data_0_0_1.csv’]
- optimize_read(files: Optional[List[str]] = None) None ¶
Prefetch and cache the presigned urls for all the given files to speed up the file opening.
All the files introduced here will have their urls cached. Further open() on any of cached urls will lead to a batch refreshment of the cached urls in the same stage if that url is inactive.
- Parameters:
files – A list of file paths that needs their presigned url cached.
Attributes
- async_impl = False¶
- blocksize = 4194304¶
- cachable = True¶
- fsid¶
Persistent filesystem id that can be used to compare filesystems across sessions.
- mirror_sync_methods = False¶
- protocol = 'sfc'¶
- root_marker = ''¶
- sep = '/'¶
- transaction¶
A context within which files are committed together upon exit
Requires the file class to implement .commit() and .discard() for the normal and exception cases.