snowflake.snowpark.Session.createDataFrame¶
- Session.createDataFrame(data: Union[List, Tuple, DataFrame, Table], schema: Optional[Union[StructType, Iterable[str], str]] = None, **kwargs: Dict[str, Any]) DataFrame[source]¶
- Creates a new DataFrame containing the specified values from the local data. - If creating a new DataFrame from a pandas Dataframe or a PyArrow Table, we will store the data in a temporary table and return a DataFrame pointing to that temporary table for you to then do further transformations on. This temporary table will be dropped at the end of your session. If you would like to save the pandas DataFrame or PyArrow Table, use the - write_pandas()or- write_arrow()method instead. Note: When- datais a pandas DataFrame or pyarrow Table, schema inference may be affected by chunk size. You can control it by passing the- chunk_sizekeyword argument. For details, see- write_pandas()or- write_arrow(), which are used internally by this function.- Parameters:
- data – The local data for building a - DataFrame.- datacan only be a- list,- tupleor pandas DataFrame. Every element in- datawill constitute a row in the DataFrame.
- schema – - A - StructTypecontaining names and data types of columns, or a list of column names, or- None.- When passing a string, it can be either an explicit struct (e.g. - "struct<a: int, b: string>") or an implicit struct (e.g.- "a: int, b: string"). Internally, the string is parsed and converted into a- StructTypeusing Snowpark’s type parsing.
- When - schemais a list of column names or- None, the schema of the DataFrame will be inferred from the data across all rows.
 - To improve performance, provide a schema. This avoids the need to infer data types with large data sets. 
- **kwargs – Additional keyword arguments passed to - write_pandas()or- write_arrow()when- datais a pandas DataFrame or pyarrow Table, respectively. These can include options such as chunk_size or compression.
 
 - Examples: - >>> # create a dataframe with a schema >>> from snowflake.snowpark.types import IntegerType, StringType, StructField >>> schema = StructType([StructField("a", IntegerType()), StructField("b", StringType())]) >>> session.create_dataframe([[1, "snow"], [3, "flake"]], schema).collect() [Row(A=1, B='snow'), Row(A=3, B='flake')] >>> # create a dataframe by inferring a schema from the data >>> from snowflake.snowpark import Row >>> # infer schema >>> session.create_dataframe([1, 2, 3, 4], schema=["a"]).collect() [Row(A=1), Row(A=2), Row(A=3), Row(A=4)] >>> session.create_dataframe([[1, 2, 3, 4]], schema=["a", "b", "c", "d"]).collect() [Row(A=1, B=2, C=3, D=4)] >>> session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]).collect() [Row(A=1, B=2), Row(A=3, B=4)] >>> session.create_dataframe([Row(a=1, b=2, c=3, d=4)]).collect() [Row(A=1, B=2, C=3, D=4)] >>> session.create_dataframe([{"a": 1}, {"b": 2}]).collect() [Row(A=1, B=None), Row(A=None, B=2)] >>> # create a dataframe from a pandas Dataframe >>> import pandas as pd >>> session.create_dataframe(pd.DataFrame([(1, 2, 3, 4)], columns=["a", "b", "c", "d"])).collect() [Row(a=1, b=2, c=3, d=4)] >>> # create a dataframe using an implicit struct schema string >>> session.create_dataframe([[10, 20], [30, 40]], schema="x: int, y: int").collect() [Row(X=10, Y=20), Row(X=30, Y=40)] - Note - When data is a pandas DataFrame, snowflake.connector.pandas_tools.write_pandas is called, which requires permission to (1) CREATE STAGE (2) CREATE TABLE and (3) CREATE FILE FORMAT under the current database and schema.