snowflake.snowpark.Session.createDataFrame

Session.createDataFrame(data: Union[List, Tuple, DataFrame, Table], schema: Optional[Union[StructType, Iterable[str], str]] = None, **kwargs: Dict[str, Any]) DataFrame[source]

Creates a new DataFrame containing the specified values from the local data.

If creating a new DataFrame from a pandas Dataframe or a PyArrow Table, we will store the data in a temporary table and return a DataFrame pointing to that temporary table for you to then do further transformations on. This temporary table will be dropped at the end of your session. If you would like to save the pandas DataFrame or PyArrow Table, use the write_pandas() or write_arrow() method instead. Note: When data is a pandas DataFrame or pyarrow Table, schema inference may be affected by chunk size. You can control it by passing the chunk_size keyword argument. For details, see write_pandas() or write_arrow(), which are used internally by this function.

Parameters:
  • data – The local data for building a DataFrame. data can only be a list, tuple or pandas DataFrame. Every element in data will constitute a row in the DataFrame.

  • schema

    A StructType containing names and data types of columns, or a list of column names, or None.

    • When passing a string, it can be either an explicit struct (e.g. "struct<a: int, b: string>") or an implicit struct (e.g. "a: int, b: string"). Internally, the string is parsed and converted into a StructType using Snowpark’s type parsing.

    • When schema is a list of column names or None, the schema of the DataFrame will be inferred from the data across all rows.

    To improve performance, provide a schema. This avoids the need to infer data types with large data sets.

  • **kwargs – Additional keyword arguments passed to write_pandas() or write_arrow() when data is a pandas DataFrame or pyarrow Table, respectively. These can include options such as chunk_size or compression.

Examples:

>>> # create a dataframe with a schema
>>> from snowflake.snowpark.types import IntegerType, StringType, StructField
>>> schema = StructType([StructField("a", IntegerType()), StructField("b", StringType())])
>>> session.create_dataframe([[1, "snow"], [3, "flake"]], schema).collect()
[Row(A=1, B='snow'), Row(A=3, B='flake')]

>>> # create a dataframe by inferring a schema from the data
>>> from snowflake.snowpark import Row
>>> # infer schema
>>> session.create_dataframe([1, 2, 3, 4], schema=["a"]).collect()
[Row(A=1), Row(A=2), Row(A=3), Row(A=4)]
>>> session.create_dataframe([[1, 2, 3, 4]], schema=["a", "b", "c", "d"]).collect()
[Row(A=1, B=2, C=3, D=4)]
>>> session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]).collect()
[Row(A=1, B=2), Row(A=3, B=4)]
>>> session.create_dataframe([Row(a=1, b=2, c=3, d=4)]).collect()
[Row(A=1, B=2, C=3, D=4)]
>>> session.create_dataframe([{"a": 1}, {"b": 2}]).collect()
[Row(A=1, B=None), Row(A=None, B=2)]

>>> # create a dataframe from a pandas Dataframe
>>> import pandas as pd
>>> session.create_dataframe(pd.DataFrame([(1, 2, 3, 4)], columns=["a", "b", "c", "d"])).collect()
[Row(a=1, b=2, c=3, d=4)]

>>> # create a dataframe using an implicit struct schema string
>>> session.create_dataframe([[10, 20], [30, 40]], schema="x: int, y: int").collect()
[Row(X=10, Y=20), Row(X=30, Y=40)]
Copy

Note

When data is a pandas DataFrame, snowflake.connector.pandas_tools.write_pandas is called, which requires permission to (1) CREATE STAGE (2) CREATE TABLE and (3) CREATE FILE FORMAT under the current database and schema.