snowflake.ml.fileset.sfcfs.SFFileSystem¶
- class snowflake.ml.fileset.sfcfs.SFFileSystem(*args, **kwargs)¶
- Bases: - AbstractFileSystem- A filesystem that allows user to access Snowflake stages and stage files with valid Snowflake locations. - The file system is is based on fsspec (https://filesystem-spec.readthedocs.io/). It is a file system wrapper built on top of SFStageFileSystem. It takes Snowflake stage file path as the input and supports read operation. A valid Snowflake location will have the form “@{database_name}.{schema_name}.{stage_name}/{path_to_file}”. - >>> conn = snowflake.connector.connect(**connection_parameters) >>> sffs = SFFileSystem(sf_connection=conn) >>> sffs.ls("@MYDB.public.FOO/nytrain") ['@MYDB.public.FOO/nytrain/data_0_0_0.csv', '@MYDB.public.FOO/nytrain/data_0_0_1.csv'] >>> with sffs.open('@MYDB.public.FOO/nytrain/nytrain/data_0_0_1.csv', mode='rb') as f: >>> print(f.readline()) b'2014-02-05 14:35:00.00000054,13,2014-02-05 14:35:00 UTC,-74.00688,40.73049,-74.00563,40.70676,2 - ‘ - >>> conn = snowflake.connector.connect(**connection_parameters) >>> sffs = fsspec.filesystem("sfc", sf_connection=conn) >>> sffs.ls("@MYDB.public.FOO/nytrain") ['@MYDB.public.FOO/nytrain/data_0_0_0.csv', '@MYDB.public.FOO/nytrain/data_0_0_1.csv'] >>> with sffs.open('@MYDB.public.FOO/nytrain/nytrain/data_0_0_1.csv', mode='rb') as f: >>> print(f.readline()) b'2014-02-05 14:35:00.00000054,13,2014-02-05 14:35:00 UTC,-74.00688,40.73049,-74.00563,40.70676,2 - ‘ - >>> conn = snowflake.connector.connect(**connection_parameters) >>> with fsspec.open("sfc://@MYDB.public.FOO/nytrain/data_0_0_1.csv", mode='rb', sf_connection=conn) as f: >>> print(f.readline()) b'2014-02-05 14:35:00.00000054,13,2014-02-05 14:35:00 UTC,-74.00688,40.73049,-74.00563,40.70676,2 - ‘ - Initialize file system with a Snowflake Python connection. - Parameters:
- sf_connection – A Snowflake python connection object. Either it or snowpark_session must be non-empty. 
- snowpark_session – A Snowpark session. Either it or sf_connection must be non-empty. 
- kwargs – Optional. Other parameters that can be passed on to fsspec. Currently supports: - skip_instance_cache: Int. Controls reuse of instances. - cache_type, cache_options, block_size: Configure file buffering. See more information of these options in https://filesystem-spec.readthedocs.io/en/latest/features.html 
 
- Raises:
- ValueError – An error occurred when not exactly one of sf_connection and snowpark_session is given. 
- SnowflakeMLException – A failure was encountered while recreating the SFFileSystem from a serialized state. 
 
 - Methods - info(path: str, **kwargs: Any) dict[str, Any]¶
- Override fsspec info method. Give details of entry at path. 
 - ls(path: str, detail: bool = False, **kwargs: Any) Union[list[str], list[dict[str, Any]]]¶
- Override fsspec ls method. List single “directory” with or without details. - Parameters:
- path – location at which to list files. It should be in the format of “@{database}.{schema}.{stage}/{path}” 
- detail – if True, each list item is a dict of file properties; otherwise, returns list of filenames. 
- kwargs – additional arguments passed on. 
 
- Returns:
- A list of filename if detail is false, or a list of dict if detail is true. 
 - Example: >>> sffs.ls(“@MYDB.public.FOO/”) [‘@MYDB.public.FOO/nytrain’] >>> sffs.ls(“@MYDB.public.FOO/nytrain”) [‘@MYDB.public.FOO/nytrain/data_0_0_0.csv’, ‘@MYDB.public.FOO/nytrain/data_0_0_1.csv’] >>> sffs.ls(“@MYDB.public.FOO/nytrain/”) [‘@MYDB.public.FOO/nytrain/data_0_0_0.csv’, ‘@MYDB.public.FOO/nytrain/data_0_0_1.csv’] 
 - optimize_read(files: Optional[list[str]] = None) None¶
- Prefetch and cache the presigned urls for all the given files to speed up the file opening. - All the files introduced here will have their urls cached. Further open() on any of cached urls will lead to a batch refreshment of the cached urls in the same stage if that url is inactive. - Parameters:
- files – A list of file paths that needs their presigned url cached. 
 
 - Attributes - async_impl = False¶
 - blocksize = 4194304¶
 - cachable = True¶
 - fsid¶
- Persistent filesystem id that can be used to compare filesystems across sessions. 
 - mirror_sync_methods = False¶
 - protocol: ClassVar[str | tuple[str, ...]] = 'sfc'¶
 - root_marker = ''¶
 - sep = '/'¶
 - transaction¶
- A context within which files are committed together upon exit - Requires the file class to implement .commit() and .discard() for the normal and exception cases. 
 - storage_args: Tuple[Any, ...]¶
 - storage_options: Dict[str, Any]¶