snowflake.snowpark.Session.createDataFrame¶
- Session.createDataFrame(data: Union[List, Tuple, DataFrame, Table], schema: Optional[Union[StructType, Iterable[str], str]] = None, **kwargs: Dict[str, Any]) DataFrame [source]¶
Creates a new DataFrame containing the specified values from the local data.
If creating a new DataFrame from a pandas Dataframe or a PyArrow Table, we will store the data in a temporary table and return a DataFrame pointing to that temporary table for you to then do further transformations on. This temporary table will be dropped at the end of your session. If you would like to save the pandas DataFrame or PyArrow Table, use the
write_pandas()
orwrite_arrow()
method instead. Note: Whendata
is a pandas DataFrame or pyarrow Table, schema inference may be affected by chunk size. You can control it by passing thechunk_size
keyword argument. For details, seewrite_pandas()
orwrite_arrow()
, which are used internally by this function.- Parameters:
data – The local data for building a
DataFrame
.data
can only be alist
,tuple
or pandas DataFrame. Every element indata
will constitute a row in the DataFrame.schema –
A
StructType
containing names and data types of columns, or a list of column names, orNone
.When passing a string, it can be either an explicit struct (e.g.
"struct<a: int, b: string>"
) or an implicit struct (e.g."a: int, b: string"
). Internally, the string is parsed and converted into aStructType
using Snowpark’s type parsing.When
schema
is a list of column names orNone
, the schema of the DataFrame will be inferred from the data across all rows.
To improve performance, provide a schema. This avoids the need to infer data types with large data sets.
**kwargs – Additional keyword arguments passed to
write_pandas()
orwrite_arrow()
whendata
is a pandas DataFrame or pyarrow Table, respectively. These can include options such as chunk_size or compression.
Examples:
>>> # create a dataframe with a schema >>> from snowflake.snowpark.types import IntegerType, StringType, StructField >>> schema = StructType([StructField("a", IntegerType()), StructField("b", StringType())]) >>> session.create_dataframe([[1, "snow"], [3, "flake"]], schema).collect() [Row(A=1, B='snow'), Row(A=3, B='flake')] >>> # create a dataframe by inferring a schema from the data >>> from snowflake.snowpark import Row >>> # infer schema >>> session.create_dataframe([1, 2, 3, 4], schema=["a"]).collect() [Row(A=1), Row(A=2), Row(A=3), Row(A=4)] >>> session.create_dataframe([[1, 2, 3, 4]], schema=["a", "b", "c", "d"]).collect() [Row(A=1, B=2, C=3, D=4)] >>> session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]).collect() [Row(A=1, B=2), Row(A=3, B=4)] >>> session.create_dataframe([Row(a=1, b=2, c=3, d=4)]).collect() [Row(A=1, B=2, C=3, D=4)] >>> session.create_dataframe([{"a": 1}, {"b": 2}]).collect() [Row(A=1, B=None), Row(A=None, B=2)] >>> # create a dataframe from a pandas Dataframe >>> import pandas as pd >>> session.create_dataframe(pd.DataFrame([(1, 2, 3, 4)], columns=["a", "b", "c", "d"])).collect() [Row(a=1, b=2, c=3, d=4)] >>> # create a dataframe using an implicit struct schema string >>> session.create_dataframe([[10, 20], [30, 40]], schema="x: int, y: int").collect() [Row(X=10, Y=20), Row(X=30, Y=40)]
Note
When data is a pandas DataFrame, snowflake.connector.pandas_tools.write_pandas is called, which requires permission to (1) CREATE STAGE (2) CREATE TABLE and (3) CREATE FILE FORMAT under the current database and schema.