Introduction: Bayesian Optimization for Exploring Optimal Solutions with Limited Trials
Bayesian optimization (BO) is a powerful technique for optimizing expensive black-box functions where the internal structure is unknown, and evaluating them incurs significant costs (time, computational resources, finances, etc.), especially with limited availability of existing data. BO is also known for being able to optimize under a limited evaluation budget compared to other optimization techniques.
A black-box function refers to a function whose internal structure is unknown; it provides an output value for a given input value, but the relationship is not explicitly expressed by mathematical formulas or similar means. Examples include complex simulations, experiments in chemical synthesis, or materials development. Bayesian optimization cleverly utilizes two core components: a Surrogate Model and an Acquisition Function. First, it builds a probabilistic model (the surrogate model, often a Gaussian process regression) that predicts the function's behavior based on known evaluation points (input-output pairs). Then, based on the predictions of this surrogate model (both the predicted value and its uncertainty), it uses the acquisition function to determine the most promising point to evaluate next. By repeating this cycle of "predict -> evaluate -> update model," it efficiently searches for the optimal solution with a small number of evaluations.
However, traditional BO can still face challenges when operating under extremely low budget constraints. In these situations, the algorithm may struggle to construct an accurate surrogate model of the entire search space due to insufficient sampling. If the surrogate model's accuracy is low, it might overlook promising regions or, conversely, intensively explore unpromising areas, potentially leading to suboptimal performance (converging to a local optimum).
To address this issue, researchers have begun exploring innovative approaches to enhance the efficiency of BO in limited-budget settings. In this article, we will first define the concept of limited-budget BO, then briefly discuss the key ideas related to the existing approaches addressing this challenge. Furthermore, we will explain the associated benefits and challenges of these approaches.
Definition of limited-budget BO
In the context of optimization, we define limited-budget Bayesian Optimization as scenarios where the available number of function evaluations is highly constrained. This can happen for systems where evaluating a datapoint (a specific input parameter setting) may be extremely time-consuming or may require significant computational or financial resources.
For example, in polycrystalline materials, optimizing the grain boundary structure, which significantly affects material properties, is crucial but computationally intensive. Bayesian optimization has been applied to this problem using Density Functional Theory (DFT) simulations to evaluate material properties. These DFT calculations are highly accurate but can take weeks to complete for a single structure. Therefore, performing numerous such costly evaluations is impractical, making efficient optimization strategies essential to achieve the best possible performance (e.g., stability or specific physical properties) within a limited number of evaluations. The experimenters would want to achieve maximum optimal performance within very limited function evaluations.
Differentiating between Limited budget BO vs Cost-sensitive BO
In Bayesian optimization literature, when 'budget'/'cost' is mentioned, it can also refer not just to the limit on the number of evaluations, but also to the actual cost (time, price, resource consumption, etc.) of running the actual experiments or simulations.
In many real-world scenarios, this cost can be heterogeneous. This means the cost of evaluation can vary depending on the location (parameter settings) within the search space. For example, some experimental conditions might yield results quickly, while others take a long time, or specific parameter regions might require expensive reagents.
In such cases, the user often wants to achieve the primary objective (e.g., maximizing material performance) while simultaneously minimizing the total cost. To achieve this, the optimization algorithm must recommend cost-effective experimental settings, being aware of the varying cost across the search space. This problem setting is known as ‘cost-aware BO’ or ‘cost-sensitive BO’ and is an important topic within budget-constrained Bayesian optimization research.
However, this topic is outside the scope of this article. We will focus on technologies related to a finite evaluation budget with homogeneous cost in this article, assuming the cost per evaluation is uniform.
How traditional BO suffers when evaluation budget is very limited
Traditional BO algorithms can suffer in a few ways if the evaluation budget is highly constrained.
- Insufficient sampling leading to poor surrogate model accuracy: With limited function evaluations, BO algorithms may not have enough data to accurately model the search space. Insufficient samples can lead the surrogate model to fail in capturing the true function's behavior, potentially causing issues like overfitting (fitting too closely to known points, leading to poor prediction in unknown areas) or under-learning (leaving promising regions unmodeled). This insufficiency in sampling can lead to poor performance, recommending points far from the optimum.
- Difficulty in balancing the exploration-exploitation tradeoff: The core of Bayesian optimization lies in balancing exploration of unknown regions of the search space and exploitation of promising areas found so far. However, with fewer evaluations, balancing this tradeoff becomes more challenging. Over-emphasizing exploration might waste the limited budget on unpromising regions. Conversely, over-emphasizing exploitation increases the risk of getting stuck in local optima, missing potentially better solutions in unexplored areas. Resolving this dilemma with very few evaluations is extremely difficult.
- Exacerbation of dimensionality challenges: Traditional BO algorithms are known to suffer from the curse of dimensionality. As the number of dimensions (input parameters) in the search space increases, the volume of the space grows exponentially, and the number of samples required to model the black-box function also increases dramatically. In high-dimensional spaces, limited samples become very sparse, making surrogate model construction inherently difficult. With a smaller evaluation budget, this issue becomes more prominent, significantly limiting the applicability of traditional BO to high-dimensional problems.
Due to these issues, researchers have proposed various strategies that are specific to limited-budget systems so that BO can achieve optimal performance even in these adverse conditions. These strategies aim to extend or modify the traditional BO framework to make the most of the limited evaluation opportunities.
Trends in limited budget BO research
While exploring the research relevant to limited-budget or finite budget BO, we see two primary trends:
- Search space refinement: An approach that increases search efficiency by concentrating the limited budget on more promising regions. Instead of exploring the entire space uniformly, it aims to focus modeling and searching efforts on likely sub-regions.
- Look-ahead approach or Non-myopic BO: An approach that considers the impact of each evaluation step on future exploration. It decides the next evaluation point not just based on immediate gain (acquisition function value) but by maximizing the long-term cumulative effect over multiple future steps.
We will briefly discuss the essence of these two approaches and introduce relevant strategies for each approach.
Search space refinement: A Focused Approach on Promising Regions
Search space refinement is a strategy to increase the efficiency of search by focusing on smaller areas when the evaluation budget is limited. The assumption underlying this approach is that intensively exploring comparatively promising but smaller regions should yield better surrogates as well as better recommendations within a limited budget.
"A Simple Heuristic for Bayesian Optimization with A Low Budget" by Nomura & Abe, proposed a strategy to manipulate the search space to make the best use of remaining evaluations.
The search space is broken down into k equal sub-sections along one dimension and center point of each region is evaluated. The region with the most optimal center point is again broken down into k equal sub-sections along another dimension. This is repeated until all dimensions are covered. In Figure 1, we can visualize the refinement process for a 2 dimensional space with k=3 . A portion of the evaluation budget B is used for refining the search space which is called Bref. Thus we get a smaller but promising region of the search space that can be optimized efficiently by traditional BO.
Fig.1 Search Space Refinement (Created by miLab Editorial Team)
Advantages:
- Optimized use of limited evaluation budget: Focusing the search on smaller, promising regions allows efficient use of limited resources. It might find promising solutions faster than exploring the vast space uniformly.
- No prior knowledge required: The search space refinement heuristic presented is relatively simple and easy to implement. It does not require prior knowledge about the search space (e.g., rough areas where the optimum might exist).
Risks:
- Dependency on initial space partitioning: The effectiveness of this method heavily depends on how the search space is initially divided. Specifically, if the order of dimensions chosen for partitioning or the number of divisionsk is poorly selected, it might refine down to a subspace that does not contain the true optimal region, failing to find the global optimum.
- Requires heuristic tuning: The heuristics used for refining the search space (like the division number k, dimension selection order, etc.) might require additional tuning as the optimal settings can be problem-dependent. This might reduce the generalizability of the approach.
- Scalability: For high-dimensional data, the computational cost of the refinement process can increase. Also, because it refines dimension by dimension sequentially, in high-dimensional cases, the approach may focus too heavily on certain combinations of dimensions, potentially limiting overall exploration and missing other promising regions.
Look-ahead approach or Non-myopic BO
In the research of limited-budget BO, we observe another trend that can be called the "Look Ahead approach". In traditional BO, the algorithm is always trying to find the immediate best solution without considering its long-term effect. Most popular acquisition functions (AF) that we regularly use (e.g., Expected Improvement (EI), Probability of Improvement (PI), etc.) do not consider the long-term effects of a decision. This issue is referred to as the 'Myopia' issue of Bayesian optimization. Myopia or near-sightedness refers to the inability to see far away, meaning having only a short-term perspective.
Feature | Myopic Acquisition Function(e.g., EI, PI, UCB) | Non-myopic Acquisition Function (e.g., KG, ESA, Multi-step lookahead) |
Horizon | Next step only | Multiple steps ahead (up to r steps) |
Objective | Maximize immediate reward (improvement) | Maximize cumulative reward (improvement) over future steps |
Comp. Cost | Relatively low | High (especially for large lookahead steps r) |
Complexity | Simple | Complex (May require Dynamic Programming or Monte Carlo approx.) |
When the evaluation budget is highly restricted, considering the long-term effect of an evaluation (e.g., how much it reduces future uncertainty, how much it helps discover more promising regions) becomes more valuable than prioritizing short-term gains.
For example, a traditional acquisition function like Expected Improvement (EI) prioritizes short-term gains by maximizing the immediate expected reward (expected improvement over the current best value) depending only on past observations. Past samples help EI locate areas with high mean (regions to exploit) as well as high variance (uncertainty, regions to explore), and EI recommends points by balancing these two criteria. However, the recommended point is chosen based on the preferred level of exploration-exploitation, and the impact of this choice on future decisions (the next evaluation, the one after that, etc.) is not considered. This can be analogous to the decision made by a chess player who plays the next move based only on the immediate reward (e.g., capturing a piece) without considering how it might impact future board positions.
On the contrary, we can compare the strategy of non-myopic acquisition functions with another chess player who plays a move by planning multiple steps ahead. The impact of the current move (evaluation point selection) on possible future moves (future evaluation points) and potential future rewards (future improvement) is taken into account in this case. For example, one formulation of a non-myopic EI can be the sum of the immediate gain through myopic EI maximization and the expected value of future expected improvement, often calculated through Monte Carlo simulations. This type of expression is usually formulated using Dynamic Programming approaches (breaking down a complex problem into smaller sub-problems).
In practice, the process of calculating the expected cumulative future reward can become computationally very expensive because the number of future scenarios (combinations of evaluation choices and outcomes) to consider increases exponentially. For example, the computational cost can increase by an order of $$ O(m^r) $$ for a non-myopic EI, where $$ r $$ = number of look-ahead steps and m = number of simulations done in each step. Considering this computational burden, even though theoretical expressions are available for n-step look-ahead approaches, many practical non-myopic acquisition functions usually limit the value of $$ r $$ (the lookahead steps) typically from 2 to 5. This limited lookahead window is also referred to as the “rolling horizon” in the literature. Finding an appropriate value for the rolling horizon is a popular research question in the study of non-myopic BO.
Here are some examples where researcher addresses the 'myopia' issue of bayesian optimization in the study of limited-budget BO:
- Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach
- GLASSES: Relieving the myopia of Bayesian optimization.
- Practical Two-Step Lookahead Bayesian Optimization
- The Parallel Knowledge Gradient Method for Batch Bayesian Optimization
Next, we discuss the associated advantages and risks of using non-myopic BO approaches.
Advantages:
- Improved efficiency of evaluation: By considering multiple steps ahead and the limited budget, these methods can make more efficient use of the available function evaluations, potentially finding better solutions with fewer evaluations. Instead of just picking points with high immediate expected improvement, they can choose points that offer high "information value" or points that pave the way for better optimization later.
- Better exploration-exploitation balance: The non-myopic nature of these approaches allows for a more nuanced balance between exploration and exploitation compared to the search space refinement approach. This potentially reduces the chances of getting stuck in local optima. (See Figure 2 below). Myopic acquisition functions (like EI) can greedily evaluate the same promising-looking regions and end up getting stuck in a suboptimal area. However, non-myopic acquisition functions, by considering future uncertainty reduction, create a better balance between exploiting promising regions and exploring highly uncertain regions, thereby increasing the likelihood of finding the global optimum.
- Adaptability: Some methods can adapt their strategies based on the remaining budget and the current state of knowledge about the objective function. For instance, they might shift focus from exploration to exploitation as the budget dwindles. This leads to more flexible and robust optimization.
Fig.2 Myopic vs. Non-myopic Bayesian Optimization (Created by miLab Editorial Team)
Risks:
- Computational complexity: The increased sophistication of these methods often comes at the cost of higher computational complexity. Multi-step lookahead approaches can be more time-consuming to compute than simpler acquisition functions. Nevertheless, in limited budget BO applications, where experimental or simulation costs are significant, the elevated modelling (computation) cost often remains justifiable due to the demonstrated superior optimization performance relative to simpler acquisition functions. That is, the time spent calculating the next point (minutes to hours) is often acceptable compared to the cost of a single evaluation (hours to weeks).
- Approximations: Due to the complexity of exact multi-step lookahead calculations, many of these methods rely on approximations or heuristics. These approximations may not always lead to optimal decisions. Approximation error can become significant for larger rolling horizons, especially when used with a misspecified model (i.e., when the surrogate model does not capture the true function well). Choosing an appropriate number of steps for lookahead (the rolling horizon) is crucial for mitigating approximation error.
Considerations for practical MI applications
Bayesian Optimization has achieved widespread popularity and is frequently favored due to its black-box optimization capability, efficiency with limited sample sizes, and derivative-free nature. Especially in Materials Informatics (MI), where experiments and simulations are often costly, BO is recognized as a powerful tool.
However, the domain of limited-budget Bayesian optimization remains comparatively new and is still evolving. Therefore, applying these advanced approaches to practical MI scenarios (e.g., new material discovery, process optimization) requires careful consideration of several factors.
(1)Generalization capability and Approach Selection
The "search space refinement" and "look-ahead" approaches discussed have their own strengths and weaknesses. However, it might be difficult to find a single best approach that universally covers a wide range of MI applications. The choice of approach may depend on many factors, including:
- Dimension of the search space: For high-dimensional problems, search space refinement might risk focusing too narrowly, while look-ahead approaches can suffer from prohibitive computational costs.
- Quantity of remaining evaluation budget: With an extremely small budget, refinement strategies might be effective in quickly identifying a promising region. If a moderate budget remains, the long-term perspective of look-ahead approaches might be more advantageous.
- Computational budget for acquisition function calculation: Look-ahead approaches, especially those involving Monte Carlo simulations, can require significant time to determine the next evaluation point. Whether this computational overhead is acceptable needs assessment.
- Tolerance for approximation error: The potential impact of approximation errors in look-ahead methods on optimization performance and the tolerance for such errors must be considered.
(2) Scalability (Applicability to High Dimensions)
Most of the limited-budget BO approaches presented have primarily reported empirical results on relatively low-dimensional problems. Their effectiveness in high-dimensional search spaces (e.g., with tens or hundreds of parameters) may not be sufficiently verified. Since MI often involves high-dimensional parameter spaces, applying these methods to such problems may require additional validation or consideration of methods specifically designed for high dimensionality.
(3) Approximation Error and Tuning
In look-ahead approaches, depending on the number of look-ahead steps or simulation steps, the approximation error might nullify the advantages gained from the non-myopic strategy. Selecting an appropriate number of look-ahead steps (rolling horizon) can be problem-specific and may require prior tuning or dynamic adjustment based on the optimization progress. It should be noted that this tuning itself can incur costs.
(4) Integration with MI-Specific Challenges
Furthermore, practical MI applications often involve challenges beyond a limited budget, such as experimental noise, reproducibility issues, and the need for multi-objective optimization (e.g., optimizing performance, cost, and stability simultaneously). How to effectively combine these limited-budget BO approaches with solutions for these MI-specific challenges requires further investigation.
Conclusion: Making the Most of Limited Opportunities
The advanced BO strategies discussed in this article—search space refinement and look-ahead/non-myopic BO—offer promising improvements over traditional approaches in scenarios constrained by a limited evaluation budget. These methods aim to utilize precious evaluation opportunities more strategically to discover better solutions with fewer trials.
However, they also introduce additional complexity and potential challenges that need to be carefully considered when applying them to real-world optimization problems. Increased computational cost, the impact of approximation errors, the need for problem-specific tuning, and scalability to high-dimensional problems are points that warrant thorough consideration before implementation.
Limited-budget Bayesian optimization is expected to grow in importance, particularly in fields where evaluation costs are a bottleneck, such as materials science, drug discovery, expensive simulations, and robotics. Future development of more scalable, computationally efficient, and robust limited-budget BO methods holds the potential to further expand the possibilities for optimization in previously intractable problems. Practitioners are required to weigh the benefits and risks of these advanced techniques, considering the specific characteristics of their problem and available resources, to select the most appropriate approach.
References
- Nomura, M., & Abe, K. (2019). A simple heuristic for Bayesian optimization with a low budget. arXiv preprint arXiv:1911.07790 https://doi.org/10.48550/arXiv.1911.07790
- Lam, R., Willcox, K., & Wolpert, D. H. (2016). Bayesian optimization with a finite budget: An approximate dynamic programming approach. Advances in Neural Information Processing Systems, 29. https://proceedings.neurips.cc/paper/2016/file/5ea1649a31336092c05438df996a3e59-Paper.pdf
- González, J., Osborne, M., & Lawrence, N. (2016, May). GLASSES: Relieving the myopia of Bayesian optimisation. In Artificial Intelligence and Statistics (pp. 790-799). PMLR. https://proceedings.mlr.press/v51/gonzalez16b.pdf
- Wu, J., & Frazier, P. (2019). Practical two-step lookahead Bayesian optimization. Advances in neural information processing systems, 32. https://proceedings.neurips.cc/paper/2019/file/2e6d9c6052e99fcdfa61d9b9da273ca2-Paper.pdf
- Wu, J., & Frazier, P. (2016). The parallel knowledge gradient method for batch Bayesian optimization. Advances in neural information processing systems, 29. https://proceedings.neurips.cc/paper_files/paper/2016/file/18d10dc6e666eab6de9215ae5b3d54df-Paper.pdf
- Lee, E., Eriksson, D., Bindel, D., Cheng, B., & Mccourt, M. (2020, August). Efficient rollout strategies for Bayesian optimization. In Conference on Uncertainty in Artificial Intelligence (pp. 260-269). PMLR. https://www.auai.org/uai2020/proceedings/124_main_paper.pdf