Maker.io main logo

Hyperparameter Optimization with Meta Ax

2023-11-13 | By ShawnHymel

License: Attribution

Hyperparameter optimization (HPO) (also known as “hyperparameter tuning”) is the process of automatically choosing the best set of hyperparameters for an algorithm, and it is popular in machine learning (ML) and, more specifically, reinforcement learning (RL). In this guide, we examine the three main types of HPO and provide a demonstration of using Bayesian Optimization on a simple RL problem.

In machine learning, hyperparameters are any of the values used to control the learning process that are not part of the model itself. The model contains “parameters” that are automatically updated and tuned during the learning process. Hyperparameters are values like the learning rate, batch size, number of layers, dropout rate, etc. that are often manually set by the designer prior to learning. 

Reinforcement learning is notoriously sensitive to hyperparameters: making minor changes to a single hyperparameter can mean the difference between an agent that learns and an agent that does nothing. In addition, RL algorithms often have many hyperparameters that require tuning. For example, the PPO algorithm has over 20 hyperparameters that can be tuned. This leads to a large search space when trying to find a combination that can work effectively for your particular problem.

If you would like to see this performed on a real RL problem, check out the following video:

 

Overview of Hyperparameter Optimization

HPO is a relatively straightforward task: you first choose some hyperparameters for your task, train your model, test the model against some known test set, and then log the hyperparameters and test results. This is known as a single “trial.”

Hyperparameter tuning loop

You can perform this test for a wide range of hyperparameter values. Once you have discovered the set of hyperparameters that produce the optimal model result, retrain the model using those hyperparameters for deployment.

As you probably guessed, we can automate this process of picking hyperparameters. There are three popular methods for hyperparameter optimization: grid search, random search, and Bayesian optimization.

Grid Search

With grid search, you perform an exhaustive search of all possible (reasonable) combinations of hyperparameters. In the example below, we assume that you only have two hyperparameters to tune: learning rate and batch size.

Grid search

You would train a model with each possible combination of hyperparameters, test the model on a test set, and then choose the set of hyperparameters that yielded the best results.

Because this is an exhaustive search, it requires running many training and test steps. Even if you parallelize this task (e.g. using a GPU or multiple computers), this is still the HPO technique that requires the most compute resources and time. As you start increasing the number of tunable hyperparameters, you can imagine how the time requirements begin to exponentially grow with this scheme.

Random Search

Another option is to choose a random selection of hyperparameter values, perform training and testing with each set, and then estimate the optimal value set based on the test results.

Random search

While this may not seem as effective as grid search, it has been shown to be surprisingly effective. The magic number seems to be around 60 trials (as discussed in this blog post) in order to get close to what grid search would produce.

Bayesian Optimization

The third alternative is to use a form of “smart” optimization. Because tuning hyperparameters is an optimization problem, we can turn to a number ML techniques for this task (most of which are designed to tackle optimization problems). However, note that because we cannot compute the derivative of the value function given by the hyperparameter values, we cannot use gradient descent (which is what makes neural networks work). The current popular technique is to use Bayesian Optimization (BO) to estimate this value function. 

Bayesian optimization

We start BO by choosing a set of random hyperparameters and performing a trial. The BO algorithm estimates the next (likely) best set of hyperparameters, which we use for another trial. We repeat this process until we converge on the (hopefully) global optimum that provides the best set of hyperparameters.

Bayesian optimization works by guessing at a surrogate model (function) that produces a metric (e.g. model accuracy) given some set of hyperparameter values as inputs. This surrogate model is constructed using Guassian processes (GPs) and includes a wide range of possible outcomes, which are modeled as probabilities. After each trial, the model is updated, which usually means the uncertainties are reduced. This process is repeated after each trial until the surrogate model can effectively estimate the desired metric given some hyperparameters. See here to learn more about BO.

Using a Framework for HPO

In practice, this turns out to be relatively straightforward, as we can rely on frameworks to perform BO for us, such as Meta’s Ax and the sweep function in Weights & Biases. Both of these frameworks offer the ability to do any of the HPO techniques described above.

In the video, we use Ax to perform a simple BO loop to find the hyperparameters for a reinforcement learning problem (inverted pendulum). See here for an introduction to reinforcement learning, gymnasium, and stable baselines3.

To perform BO with Ax, we create our Ax client, use it to get our next set of hyperparameters, perform a trial, and repeat until we’ve found an ML model that works for our purposes. Here is a snippet of code from the video demonstrating how to do BO with Ax:

Copy Code
# Configure Ax client
ax_client = AxClient(
random_seed=settings['seed'],
verbose_logging=settings['verbose_ax']
)
ax_client.create_experiment(
name=settings['ax_experiment_name'],
parameters=hparams,
objective_name=settings['ax_objective_name'],
minimize=False,
parameter_constraints=parameter_constraints,
)

# Perform trials to optimize hyperparameters
while True:

# Get next hyperparameters and end experiment if we've reached max trials
next_hparams, trial_index = ax_client.get_next_trial()
if trial_index >= settings['num_trials']:
break

# Show that we're starting a new trial
if settings['verbose_trial'] > 0:
print(f"--- Trial {trial_index} ---")

# Perform trial
avg_ep_rew = do_trial(settings, next_hparams)
ax_client.complete_trial(
trial_index=trial_index,
raw_data=avg_ep_rew,
)

# Save experiment snapshot
ax_client.save_to_json_file(ax_snapshot_path)

Conclusion

HPO is an important part of many machine learning pipelines, especially when working with larger models or complex learning algorithms like those found in reinforcement learning. While we demonstrate BO with Ax, you likely want to use a combination of HPO techniques. Often, you would start with random search to help narrow down your hyperparameter ranges to a manageable set and then perform BO in that smaller range.

Going Further

The full code shown in the video for performing BO on an RL task can be found in this repository.

To understand more about HPO, check out the following links:

TechForum

Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.

Visit TechForum