snowflake.ml.data.data_connector.DataConnector¶
- class snowflake.ml.data.data_connector.DataConnector(ingestor: DataIngestor)¶
Bases:
object
Snowflake data reader which provides application integration connectors
Methods
- classmethod from_dataframe(df: DataFrame, ingestor_class: Optional[Type[DataIngestor]] = None, **kwargs: Any) DataConnectorType ¶
This function or method is in private preview since 1.6.0.
- classmethod from_dataset(ds: Dataset, ingestor_class: Optional[Type[DataIngestor]] = None, **kwargs: Any) DataConnectorType ¶
- classmethod from_sources(session: Session, sources: List[Union[DataFrameInfo, DatasetInfo, str]], ingestor_class: Optional[Type[DataIngestor]] = None, **kwargs: Any) DataConnectorType ¶
- to_pandas(limit: Optional[int] = None) pd.DataFrame ¶
Retrieve the Snowflake data as a Pandas DataFrame.
- Parameters:
limit – If specified, the maximum number of rows to load into the DataFrame.
- Returns:
A Pandas DataFrame.
- to_tf_dataset(*, batch_size: int, shuffle: bool = False, drop_last_batch: bool = True) tf.data.Dataset ¶
Transform the Snowflake data into a ready-to-use TensorFlow tf.data.Dataset.
- Parameters:
batch_size – It specifies the size of each data batch which will be yield in the result datapipe
shuffle – It specifies whether the data will be shuffled. If True, files will be shuffled, and rows in each file will also be shuffled.
drop_last_batch – Whether the last batch of data should be dropped. If set to be true, then the last batch will get dropped if its size is smaller than the given batch_size.
- Returns:
A tf.data.Dataset that yields batched tf.Tensors.
- to_torch_datapipe(*, batch_size: int, shuffle: bool = False, drop_last_batch: bool = True) torch_data.IterDataPipe ¶
Transform the Snowflake data into a ready-to-use Pytorch datapipe.
Return a Pytorch datapipe which iterates on rows of data.
- Parameters:
batch_size – It specifies the size of each data batch which will be yield in the result datapipe
shuffle – It specifies whether the data will be shuffled. If True, files will be shuffled, and rows in each file will also be shuffled.
drop_last_batch – Whether the last batch of data should be dropped. If set to be true, then the last batch will get dropped if its size is smaller than the given batch_size.
- Returns:
A Pytorch iterable datapipe that yield data.
- to_torch_dataset(*, batch_size: Optional[int] = None, shuffle: bool = False, drop_last_batch: bool = True) torch_data.IterableDataset ¶
Transform the Snowflake data into a PyTorch Iterable Dataset to be used with a DataLoader.
Return a PyTorch Dataset which iterates on rows of data.
- Parameters:
batch_size – It specifies the size of each data batch which will be yielded in the result dataset. Batching is pushed down to data ingestion level which may be more performant than DataLoader batching.
shuffle – It specifies whether the data will be shuffled. If True, files will be shuffled, and rows in each file will also be shuffled.
drop_last_batch – Whether the last batch of data should be dropped. If set to be true, then the last batch will get dropped if its size is smaller than the given batch_size.
- Returns:
A PyTorch Iterable Dataset that yields data.
Attributes
- data_sources¶