PySpark APIs supported for Snowpark Connect for Spark¶
Snowpark Connect for Spark supports PySpark APIs as described in this topic.
Snowpark Connect for Spark provides compatibility with PySpark’s 3.5.3 Spark Connect API, allowing you to run Spark workloads on Snowflake. Snowpark Connect for Spark compatibility is defined by its execution behavior when running a Spark application that uses the Pyspark 3.5.3 Spark Connect API. This guide details which APIs are supported and their compatibility levels.
Compatibility level definitions¶
- Full compatibility APIs
APIs with full compatibility behave identically to native PySpark. You can use these APIs with confidence that results will match exactly.
- High compatibility APIs
APIs with high compatibility work correctly but might have minor differences:
Error message formatting might differ.
Output display format might vary (such as decimal precision, column name casing).
Edge cases might produce slightly different results.
- Partial compatibility APIs
APIs with partial compatibility are functional but have notable limitations:
Only a subset of functionality might be available.
Behavior might differ from PySpark in specific scenarios.
Additional configuration might be required.
Performance characteristics might differ.
- Unsupported APIs
APIs that are not currently implemented or cannot be supported on Snowflake.
DataFrame APIs¶
The core DataFrame API coverage.
Full compatibility APIs¶
cachecoalescecollectcountcrossJoindropDuplicatesdrop_duplicatesdropnafillnafirstheadisEmptyjoinlimitmeltoffsetpersistrepartitionByRangereplaceselectshowtailtaketoDFtoLocalIteratortoPandasunionAllunpersistunpivotwherewithColumnsRenamedtoLocalIteratortoPandasunionAllunpersistunpivotwherewithColumnsRenamed
High compatibility APIs¶
aggcolRegexcorrcovcrosstabcubedescribedistinctdropexceptAllgroupBygroupbyintersectintersectAllisLocalmapInPandasorderByrollupsortunionunionByNamewithColumn
Notes¶
orderBy/sort: Column ordering inferred from the last DataFrame in the chain.union/unionByName: Type widening behavior might differ slightly.describe: Statistical output format might vary.
Partial compatibility APIs¶
aliasapproxQuantilecreateGlobalTempViewcreateOrReplaceGlobalTempViewcreateOrReplaceTempViewcreateTempViewexplainfilterfreqItemshintinputFilesprintSchemarandomSplitrepartitionsameSemanticssamplesampleByselectExprsemanticHashsortWithinPartitionssubtractsummarytransformwithColumnswithMetadata
Notes¶
explain: Query plan format differs from Spark.repartition: Partition count might not be exact.sample: Random sampling implementation differs.createTempView: View lifecycle might differ.
Unsupported APIs¶
checkSameSparkSessiondropDuplicatesWithinWatermarkobservepandas_apiregisterTempTableto_pandas_on_sparkwithWatermark
Column APIs¶
Coverage for column operations.
Full compatibility APIs¶
ascbetweencontainsdesceqNullSafegetItemisNullisinlikeotherwisestartswithsubstrwhen
High compatibility APIs¶
aliasasc_nulls_firstasc_nulls_lastastypebitwiseANDbitwiseORbitwiseXORcastdesc_nulls_firstdesc_nulls_lastendswithisNotNull
Notes¶
cast: Some invalid casts return NULL in Spark but error in Snowpark.alias: Struct field display format might differ.
Partial compatibility APIs¶
dropFieldsilikeoverrlikewithField
Notes¶
over: Window frame specifications might have subtle differences.rlike: Regex syntax follows Snowflake conventions.
SparkSession APIs¶
Full compatibility APIs¶
rangesqltable
High compatibility APIs¶
createDataFrame
Notes¶
Schema inference might produce different types (such as NUMBER(38,0) vs LONG).
Partial compatibility APIs¶
addArtifactaddArtifactsaddTagclearTagsgetTagsinterruptAllinterruptOperationinterruptTagremoveTag
Notes¶
Tags are mapped to Snowflake query tags.
Interrupt operations use Snowflake query IDs instead of operation IDs.
Unsupported APIs¶
copyFromLocalToFsstop
GroupedData APIs¶
Full compatibility APIs¶
aggmeanpivot
High compatibility APIs¶
aggmeanpivot
Partial compatibility APIs¶
applyavgsum
Unsupported APIs¶
applyInPandasWithStatecogroup
DataFrameReader APIs¶
Full compatibility APIs¶
table
High compatibility APIs¶
csv
Partial compatibility APIs¶
jsonloadparquetjdbc
Notes¶
File paths use Snowflake stages or cloud storage (S3, GCS, Azure).
Schema inference might differ from native Spark.
Some format-specific options might not be supported.
Unsupported APIs¶
orc
DataFrameWriter APIs¶
Full compatibility APIs¶
modesaveAsTabletext
Partial compatibility APIs¶
csvjsonoptionsparquet
Notes¶
Writes go to Snowflake stages or cloud storage.
Partitioning behavior might differ.
Unsupported APIs¶
bucketByinsertIntojdbcorcsortBy
DataFrameWriterV2 APIs¶
Coverage for the newer DataFrameWriterV2 API.
Full compatibility APIs¶
replace
Partial compatibility APIs¶
appendcreatecreateOrReplaceoptionoptionspartitionedBytablePropertyusing
Catalog APIs¶
Full compatibility APIs¶
cacheTableclearCachedropGlobalTempViewdropTempViewisCachedrefreshByPathrefreshTableuncacheTable
High compatibility APIs¶
currentCataloglistCatalogslistColumnsrecoverPartitionssetCurrentCatalog
Notes¶
listColumns: Column names are uppercase, types are Snowflake-specific.Error messages might differ in format.
Unsupported APIs¶
createExternalTablecreateTablefunctionExistsgetFunctionlistFunctionsregisterFunction
Window & WindowSpec APIs¶
Coverage for window functions.
Window (all D0) APIs¶
partitionByorderByrangeBetweenrowsBetweenunboundedPrecedingunboundedFollowingcurrentRow
WindowSpec(all D0) APIs¶
partitionByorderByrangeBetweenrowsBetween