Hybrid Execution (Public Preview)¶
Snowpark pandas supports workloads on mixed underlying execution engines and will automatically move data to the most appropriate engine for a given dataset size and operation. Currently you can use either Snowflake or local pandas to back a DataFrame object. Decisions on when to move data are dominated by dataset size.
For Snowflake, specific API calls will trigger hybrid backend evaluation. These are registered as either a pre-operation switch point or a post-operation switch point. These switch points may change over time as the feature matures and as APIs are updated.
Only Dataframes with data types that are compatible with Snowflake can be moved to the Snowflake backend, and if a Dataframe backed by pandas contains such a type (‘categorical’ for example) then astype should be used to coerce that type before the backend can be changed.
Example Pre-Operation Switchpoints:¶
apply, iterrows, itertuples, items, plot, quantile, __init__, plot, quantile, T, read_csv, read_json, concat, merge
Many methods that are not yet implemented in Snowpark pandas are also registered as pre-operation switch points, and will automatically move data to local pandas for execution when called. This includes most methods that are ordinarily completely unsupported by Snowpark pandas, and have N in their implemented status in the DataFrame and Series supported API lists.
Post-Operation Switchpoints:¶
read_snowflake, value_counts, tail, var, std, sum, sem, max, min, mean, agg, aggregate, count, nunique, cummax, cummin, cumprod, cumsum
Examples¶
Enabling Hybrid Execution¶
Manually Changing DataFrame Backends¶
Configuring Local Pandas Backend¶
Currently the auto switching behavior is dominated by dataset size, with some exceptions for specific operations. The default limit for running workloads on the local pandas backend is 10M rows. This can be configured through the modin environment variables:
Debugging Hybrid Execution¶
pd.explain_switch() provides information on how execution engine decisions are made. This method prints a simplified version of the command unless simple=False is passed as an argument.