You are viewing documentation about an older version (1.1.0). View latest version

snowflake.snowpark.Session.createDataFrame¶

Session.createDataFrame(data: List | Tuple | pandas.DataFrame, schema: StructType | List[str] | None = None) → DataFrame[source]¶

Creates a new DataFrame containing the specified values from the local data.

If creating a new DataFrame from a pandas Dataframe, we will store the pandas DataFrame in a temporary table and return a DataFrame pointing to that temporary table for you to then do further transformations on. This temporary table will be dropped at the end of your session. If you would like to save the pandas DataFrame, use the write_pandas() method instead.

Parameters:
  • data – The local data for building a DataFrame. data can only be a list, tuple or pandas DataFrame. Every element in data will constitute a row in the DataFrame.

  • schema – A StructType containing names and data types of columns, or a list of column names, or None. When schema is a list of column names or None, the schema of the DataFrame will be inferred from the data across all rows. To improve performance, provide a schema. This avoids the need to infer data types with large data sets.

Examples:

>>> # create a dataframe with a schema
>>> from snowflake.snowpark.types import IntegerType, StringType, StructField
>>> schema = StructType([StructField("a", IntegerType()), StructField("b", StringType())])
>>> session.create_dataframe([[1, "snow"], [3, "flake"]], schema).collect()
[Row(A=1, B='snow'), Row(A=3, B='flake')]

>>> # create a dataframe by inferring a schema from the data
>>> from snowflake.snowpark import Row
>>> # infer schema
>>> session.create_dataframe([1, 2, 3, 4], schema=["a"]).collect()
[Row(A=1), Row(A=2), Row(A=3), Row(A=4)]
>>> session.create_dataframe([[1, 2, 3, 4]], schema=["a", "b", "c", "d"]).collect()
[Row(A=1, B=2, C=3, D=4)]
>>> session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]).collect()
[Row(A=1, B=2), Row(A=3, B=4)]
>>> session.create_dataframe([Row(a=1, b=2, c=3, d=4)]).collect()
[Row(A=1, B=2, C=3, D=4)]
>>> session.create_dataframe([{"a": 1}, {"b": 2}]).collect()
[Row(A=1, B=None), Row(A=None, B=2)]

>>> # create a dataframe from a pandas Dataframe
>>> import pandas as pd
>>> session.create_dataframe(pd.DataFrame([(1, 2, 3, 4)], columns=["a", "b", "c", "d"])).collect()
[Row(a=1, b=2, c=3, d=4)]
Copy