Parallel Hyperparameter Optimization (HPO) on Container Runtime for ML¶
The Snowflake ML Hyperparameter Optimization (HPO) API is a model-agnostic framework that enables efficient, parallelized hyperparameter tuning of models using popular tuning algorithms.
Today, this API is available for use within a Snowflake Notebook configured to use the Container Runtime on Snowpark Container Services (SPCS). After you create such a notebook, you can:
Train a model using any open source package, and use this API to distribute the hyperparameter tuning process
Train a model using Snowflake ML distributed training APIs, and scale HPO while also scaling each of the training runs
The HPO workload, initiated from the Notebook, executes inside Snowpark Container Services on either CPU or GPU instances, and scales out to the cores (CPUs or GPUs) available on a single node in the SPCS compute pool.
The parallelized HPO API provides the following benefits:
A single API that automatically handles all the complexities of distributing the training across multiple resources
The ability to train with virtually any framework or algorithm using open-source ML frameworks or the Snowflake ML modeling APIs
A selection of tuning and sampling options, including Bayesian and random search algorithms along with various continuous and non-continuous sampling functions
Tight integration with the rest of Snowflake; for example efficient data ingestion via Snowflake Datasets or Dataframes and automatic ML lineage capture
Examples¶
This example illustrates a typical HPO use case by first ingesting data from a Snowflake table through the Container Runtime
DataConnector API, then defining a training function that creates an XGBoost model. The Tuner
interface provides the tuning
functionality, based on the given training function and search space.
from snowflake.ml.modeling.tune import get_tuner_context
from snowflake.ml.modeling import tune
# Define a training function, with any models you choose within it.
def train_func():
# A context object provided by HPO API to expose data for the current HPO trial
tuner_context = get_tuner_context()
config = tuner_context.get_hyper_params()
dm = tuner_context.get_dataset_map()
model = xgb.XGBClassifier(**config, random_state=42)
model.fit(dm["x_train"].to_pandas(), dm["y_train"].to_pandas())
accuracy = accuracy_score(
dm["y_train"].to_pandas(), model.predict(dm["x_train"].to_pandas())
)
tuner_context.report(metrics={"accuracy": accuracy}, model=model)
tuner = tune.Tuner(
train_func=train_func,
search_space={
"n_estimators": tune.uniform(50, 200),
"max_depth": tune.uniform(3, 10),
"learning_rate": tune.uniform(0.01, 0.3),
},
tuner_config=tune.TunerConfig(
metric="accuracy",
mode="max",
search_alg=search_algorithm.BayesOpt(),
num_trials=2,
max_concurrent_trials=1,
),
)
tuner_results = tuner.run(dataset_map=dataset_map)
# Access the best result info with tuner_results.best_result
The expected output looks similar to this:
accuracy should_checkpoint trial_id time_total_s config/learning_rate config/max_depth config/n_estimators
1.0 True ec632254 7.161971 0.118617 9.655 159.799091
The tuner_results
object contains all results, the best model, and the best model path.
print(tuner_results.results)
print(tuner_results.best_model)
print(tuner_results.best_model_path)
API overview¶
The HPO API is in the snowflake.ml.modeling.tune
namespace. The main HPO API is the tune.Tuner
class. When
instantiating this class, you specify the following:
A training function that fits a model
A search space (
tune.SearchSpace
) that defines the method of sampling hyperparametersA tuner configuration object (
tune.TunerConfig
) that defines the search algorithm, the metric to optimize, and the number of trials
After instantiating Tuner
, call its run
method with a dataset map (which specifies a DataConnector
for each input dataset) to start the tuning process.
For more information, execute the following Python statements to retrieve documentation on each class:
from snowflake.ml.modeling import tune
help(tune.Tuner)
help(tune.TunerConfig)
help(tune.SearchSpace)
Limitations¶
Bayesian optimization works only with the uniform sampling function. Bayesian optimization relies on Gaussian processes
as surrogate models, and therefore requires continuous search spaces. It is incompatible with discrete parameters
sampled using the tune.randint
or tune.choice
methods. To work around this limitation, either use
tune.uniform
and cast the parameter inside the training function, or switch to a sampling algorithm that handles
both discrete and continuous spaces, such as tune.RandomSearch
.
Troubleshooting¶
Error message |
Possible causes |
Possible solutions |
---|---|---|
Invalid search space configuration: BayesOpt requires all sampling functions to be of type ‘Uniform’. |
Bayesian optimization works only with uniform sampling, not with discrete samples. (See Limitations above.) |
|
Insufficient CPU resources. Required: 16, Available: 8. May refer to CPU or GPU. The numbers of required and available resources may differ. |
|
The full error message describes several options you can try. |