Abstract
We discuss guidelines for evaluating the performance of parameterized stochastic solvers for optimization problems, with particular attention to systems that employ novel hardware, such as digital quantum processors running variational algorithms, analog processors performing quantum annealing, or coherent Ising machines. We illustrate through an example a benchmarking procedure grounded in the statistical analysis of the expectation of a given performance metric measured in a test environment. In particular, we discuss the necessity and cost of setting parameters that affect the algorithm’s performance. The optimal value of these parameters could vary significantly between instances of the same target problem. We present an open-source software package that facilitates the design, evaluation, and visualization of practical parameter tuning strategies for the complex use of the heterogeneous components of the solver. We examine in detail an example using parallel tempering and a simulator of a photonic coherent Ising machine computing and display the scoring of an illustrative baseline family of parameter setting strategies that feature an exploration-exploitation trade-off.
Keywords: Benchmarking, Ising solvers, Quantum computing
Introduction
We present an approach to benchmarking the performance of hybrid quantum-classical algorithms and quantum-inspired algorithms based on a characterization of parameterized stochastic optimization solvers. Technological progress in quantum computing and engineering has led to the proliferation of generic quantum computational methods, algorithmic applications, and hardware platforms where they can be tested. The Noisy-Intermediate-Scale-Quantum (NISQ) (Preskill 2018) era has catalyzed a myriad of ideas and implementations of physics-based hardware approaches to optimization that do not benefit from superposition and entanglement but whose performance is nonetheless grounded in complex, difficult-to-simulate dynamics. One class of such approaches is optimization solvers whose search algorithm is described by many coupled stochastic partial-differential equations. These methods include analog computing with oscillators (Albertsson and Rusu 2023), and optically coherent Ising machines (CIMs) (McMahon et al. 2016). Another approach is given by probabilistic bits, better known as p-bits, an intermediate between the standard bits of digital electronics and the emerging qubits of quantum computing (Camsari et al. 2019; Patel et al. 2022), and that can be physically implemented as perpendicular magnets. In perhaps an abuse of terminology, these physics-based solvers are often named quantum-inspired systems. Recent examples of the utilization of physics-inspired technologies include the design of 5 G telecommunication networks via Coerent Ising Machines and Parallel Tempering (Kim et al. 2021; Singh et al. 2022). Another class of approaches is parametrized quantum circuits encoding variational algorithms such as the quantum approximate optimization algorithm (QAOA) or quantum annealing. This type of quantum computation has been implemented on a variety of platforms, including ion traps (Perez 2020), neutral atoms (Dalyac et al. 2023; Andrist et al. 2023), and superconducting qubits (Kim et al. 2023; Maciejewski et al. 2023).
Empirical observations reveal that quantum and analog solvers can have an advantage over random search, producing probability distributions that potentially yield high-quality solutions (Kim et al. 2021; King et al. 2018; Coffrin 2023; Maciejewski et al. 2023; Mohseni et al. 2022). However, these solvers often struggle to generate samples of the global optimum and cannot guarantee its optimality, especially in the presence of noise. Existing techniques to address this issue, such as error mitigation, primarily focus on enhancing the quality of scalar observables, such as the expectation values of functions, rather than correcting the algorithm’s output (bitstrings) (Kim et al. 2023). Several other means can be employed to address this issue. First, improving the distribution of solution quality can be achieved through pre-processing techniques and tuning the algorithm’s parameters. This is an issue for practical expected performance since good parameter settings might not be generalizable to other problem instances, success metrics, or available resources. Moreover, the parameter tuning strategy is resource-consuming and must be reported when discussing the solvers’ expected performance. Second, by designing solution methods that leverage the solver’s capabilities to enhance the expectation value itself, the weaknesses of these methods can be mitigated by algorithmic approaches, e.g., Brown et al. (2022) and Dupont et al. (2023). Assessing the performance of such methods becomes challenging as the specific sub-problems’ solution only accounts for a portion of the total solution method.
As these quantum and quantum-inspired methods improve performance and capabilities, the problems they can solve become more sophisticated. Since the purpose of NISQ systems and quantum-inspired Ising machines is to solve problems, it is paramount to rigorously benchmark their performance (McGeoch 2015). There is a need to develop guidelines on evaluating the performance of new computing devices in the context of their future deployment in production, i.e., guidelines for an operational benchmarking as opposed to previous efforts that were mostly confined to research and development environments. A full operational evaluation must include overheads such as the cost of tuning. Without considering such overheads, it is easy to come to conclusions that would be misleading (Aaronson 2019).
We propose a benchmarking framework supported by an open-source software package intended to collect statistically relevant data when running a parameterized stochastic optimization solver attempting to solve instances from a distribution of representative problems. This work aims to provide guidelines for presenting “Window stickers” - i.e., a user-friendly and self-explanatory scorecard displaying the real-world performance expectation of a self-contained method using fixed and varying resources to solve applied problems of interest. While our considerations will mainly focus on optimization systems, this approach can be adapted to other computational tasks (e.g., sampling, learning) and platforms (e.g., neuromorphic chips).
Algorithmic benchmarking dates back to the 70 s with the work in Rice (1976) on algorithm selection based on performance. These ideas have been applied to optimization algorithms, where performance profiles (Dolan and Moré 2002) are among the most popular proposals. These diagrams show the performance of different optimization algorithms, reporting the number of problems each method can solve with respect to time. These diagrams, although criticized for misleading conclusions when including more than two algorithms (Gould and Scott 2016), have been widely used in the literature. Best practices have been proposed to provide the most informative benchmark analysis reported in Bartz-Beielstein et al. (2020). Among these best practices, an automated software tool for benchmarking ensures reproducibility guarantees, for which several tools have been proposed (Bussieck et al. 2014; Moreau et al. 2022).
Parameter setting can be understood as an algorithm selection, where each parameterization is interpreted as a separate algorithm. Hyper-parameter tuning and benchmarking are also relevant in other fields of computational algorithms, such as machine learning (Dai and Berleant 2019), where hardware accelerators provide advantages that need to be quantified across the boundaries of different hardware implementations. Several tools have been proposed to automate this parameter setting in that context, e.g., Hyperopt (Bergstra et al. 2022).
Although there is a rich literature on algorithmic selection and parameter setting, quantum and physics-inspired optimization methods have characteristics that make their benchmarking unique and challenging (McGeoch 2015). For example, new performance metrics, such as Time-to-target (King et al. 2015), have been proposed to represent the trade-offs between solution quality and efficiency for these methods, and recently, there starts to be more emphasis on measuring performance as a function of resources employed and distinguishing classes of instances (Lykov et al. 2023). Note that many quantum computer benchmarks have been mainly focused on circuit sizes that can be implemented without noise affecting their fidelity (Mills et al. 2021). Additionally, in the NISQ era, quantum devices performance fluctuates over time, requiring frequent calibration. This introduces an extra layer of noise in the observed output distributions to be factored in a benchmark.
For instance, this work aims to develop a methodology that adapts well to these quantum and physics-inspired methods for optimization, with supporting software to automate the benchmarking and parameter setting strategies and correctly account for these costs to provide practical and actionable information about solver performance. We list our contributions below:
Characterization of solution methods as instantiation of parameterized stochastic optimization solvers (see Fig. 1)
Proposal of benchmarking, visualizing, and designing parameter setting strategies (see Fig. 2)
Implementation in open-source software Stochastic-Benchmarking (Bernal Neira et al. 2023)
Ultimately, the question that such a pragmatic benchmarking procedure should answer can be framed as: given well-specified resources and a new, previously unseen problem instance from a known distribution, what are the expectations for its resolution with a specific solution method? As discussed throughout this paper, the key to answering this question is to properly define the concepts of resources, expectations, solution, and solution method. In particular, the definition of the solution method has to address how various parameters that define the solver are set (see Fig. 1).
Fig. 1.

Abstract conceptualization of a solution method and a solver. The black box indicates the core processing optimizer (e.g., a quantum device) primarily responsible for the method performance
Fig. 2.
Flowchart with the main steps to generate the “Window stickers” implemented in Stochastic-Benchmark (Bernal Neira et al. 2023)
Glossary of key terms
Solver: A parameterized stochastic optimization algorithm, potentially hardware-based (e.g., quantum or quantum-inspired), used to generate solution samples for a given problem instance.
Solution Method: The end-to-end computational process that includes both the solver and a parameter setting strategy (PSS), including any meta-parameter tuning.
Parameter Setting Strategy (PSS): A defined procedure for choosing solver parameters. This may include fixed values (fPSS) or adaptive procedures based on resource allocation and instance features.
Fixed Parameter Setting Strategy (fPSS): A strategy that uses constant solver parameters for all problem instances and resource levels.
Adaptive PSS: A parameter selection strategy that dynamically adjusts solver parameters based on meta-parameters and observed performance, typically involving an exploration-exploitation trade-off.
Meta-Parameters: Higher-level configuration variables that influence the behavior of an adaptive PSS (e.g., exploration fraction, budget per trial).
Virtual Best (VB): A performance benchmark defined by the best-performing parameter configuration for each individual instance and resource level, used as an upper bound reference.
Performance Score: A normalized metric assessing solver quality, typically defined with respect to known optimal and random solutions for each instance.
Resource (): A quantifiable measure of computational expenditure (e.g., time, number of samples) used to assess solver performance.
Window Sticker: A performance visualization summarizing solver behavior and parameter recommendations across a range of resources and problem instances.
Solution methods and parameterized stochastic solvers
This study uses a framework focused on analyzing parameterized stochastic optimization solvers. Here, a “solver" is an integrated system, where hardware (the device) and software (algorithms) work together to solve optimization problems. Solvers have multiple parameters that can significantly affect their performance, but these effects are usually unknown beforehand. For our analysis, the solver is seen as a sampler of random variables of an unknown distribution, a concept familiar in classical optimization as stochastic optimization methods (Fouskakis and Draper 2002). This approach is relevant for quantum heuristics and Ising machines, as they fit well within this category of optimization methods.
The raw output of such stochastic methods is a finite set, or string, of N bits or binary values , obtained by a single measurement at the end of the computation,1 to which we associate a vector variable . The algorithm description does not specify how the distribution is updated or how the samples are obtained.
Additionally, the stochastic nature of these solvers generates a distribution of solutions, which necessitates applying postprocessing techniques to determine the required output. A comparison between stochastic optimization algorithms and deterministic solution methods, which return the same solution to a problem every time they are executed and might even provide guarantees on the optimality of such a solution, might not be valid in general, given the heuristic nature of sampling associated with stochastic methods. From the bitstring , we can define a transformed real-valued variable , where is known as a pseudo-Boolean function (Boros and Hammer 2002).2 This variable, defined by a scalar function, takes the bitstring values and returns a real-valued cost or objective. This variable X can represent the solver’s progress toward solving a single problem. The solver’s performance can then be assessed through variable X, which can subsequently be used to learn how this solver behaves across different problem instances and compared against other solvers. The choice of is critical, as it must reflect the objective of the underlying optimization task (e.g., the Ising energy in spin models). An improperly chosen or misaligned transformation may lead to misleading performance estimates. So, this transformation should be defined using domain-specific knowledge and validated, where possible, to ensure it captures solver progress or quality.
Analyzing experimental results from specific cases is crucial to accurately benchmark solvers, which perform differently across various problem instances. This helps determine the solver’s effectiveness for specific problem classes or families, requiring analysis over multiple instances. In studying stochastic solvers, the objective is to estimate the probability density function (PDF) of the output variable X based on the samples collected during the solution process. This PDF offers empirical insight into the distribution of the solver samples. Stochastic solvers lead to a change in the solution paradigm, where the new goal is to skew these distributions towards the desired output and sample it as efficiently as possible. In this perspective, deterministic solvers search over a Dirac delta distribution centered at the optimal solution; in this case, sampling becomes irrelevant, and the deterministic search becomes equivalent to finding such a distribution. Additionally, we focus on reporting the output for variable X and the confidence level in ensuring a specific solution quality for a new, unknown problem instance within the targeted instance class.
Benchmarking framework
Following the concepts for a reproducible benchmark, we present our framework as shown in Fig. 2. We consider an instance generation procedure, which generates a population of instances containing enough information to predict behavior over a new, unseen problem based on their solution. This procedure is followed by selecting (meta-)parameter values for evaluating the different parameter search strategies (PSSs). After establishing a performance metric, we set the benchmark.
The solution methods are then run by (attempting to) solve the instances to identify promising solver parameters for the figure of merit. Recording the solution trajectory for each instance provides the information required to establish the performance profile of each method. This information can be used to choose the best solution strategy for each value along the trajectory. Considering this performance envelope, one could construct the virtual best (VB) performance profile. This would be equivalent to counting with an Oracle, which can tell which solution method is best for each resource value and each instance. This virtual best also provides a bound on the performance that can be obtained for selecting solution methods. Moreover, by aggregating the different parameter settings that result in the best performances for each instance, one can define a fixed parameter setting strategy (fPSS). This method is used extensively in the literature, where, e.g., the average of each parameter corresponding to the best parameter setting strategy across the different instances is computed for each resource value. If the aggregation results in a parameter setting not initially included in the PSSs, it should be rerun for all instances to verify its performance.
One observation is that fixed parameters for the solution methods might perform suboptimally over unseen instances, as the assumption that the instance population being “well-behaved” or representative might fail. A meta-parameter given to an advanced parameter tuning algorithm can address this, such as Hyperopt (Bergstra et al. 2022). These meta-parameters affect the behavior of the tuning procedure itself, as well as be used to balance the exploration and exploitation procedures of the solution method parameter tuning. This exploration-exploitation balance can be expressed by determining which fraction of a total budget is spent looking for the best parameters and which should be spent exploiting the best-found parameter. Moreover, during the exploration stage, each parameter setting considered could be explored for a variable amount of resources, presenting a trade-off between checking many different parameter settings or realizing the potential of each one explored after investing a larger amount of resources.
All these steps result in a trajectory of (meta-)parameters for evaluation in the solvers. Depending on the solution methods and the family of instances, these trajectories might need to be made actionable. Namely, they might appear erratic due to a reduced number of instances or if outliers affect the different aggregations, e.g., across instances or parameter values. Trajectories are smoothed and then rerun if they do not correspond to any evaluated PSSs to gather information about their performance. The instance family is divided into training and testing sets to avoid overfitting the results. The procedure for finding good PSSs is repeated over several different instance splits, and then a cross-validation scheme aggregates these results.
The resulting “Window stickers” consist then of parameter trajectories or plots that show the value that each parameter should follow with various resources; meta-parameter trajectories that yield the different parameter settings in adaptive PSSs, and performance profiles that show the expected merit function response to each different PSS. These analyses can then be aggregated across different problem families to show scaling performance over a feature of the instances.
The stochastic benchmark framework
Stochastic-Benchmark is an open-source package implementing the methodology described in the previous section (Bernal Neira et al. 2023). This open-source package introduces a statistical analysis methodology for evaluating and comparing the performance of (potentially quantum and quantum-inspired) optimization solvers. By incorporating visual presentation techniques and robust statistical analysis, Stochastic-Benchmark provides researchers with a comprehensive framework to assess solver performance and facilitate informed decision-making on design and production readiness in the field of quantum and quantum-inspired optimization. The Stochastic-Benchmark package holds particular relevance for analyzing quantum-inspired methodologies, which often produce a large set of solutions as outputs. The analysis framework addresses these issues by providing a general performance comparison and parameter setting strategy evaluation platform.
To practically implement the methodology illustrated in Fig. 2, we provide an efficient implementation of these methods. In this section, we proceed to explain how the Stochastic-Benchmark framework operates. Consider that the following is given:
- Resource to be evaluated .
- Performance metric to be considered P.
- Set of instances .
- Set of solvers .
- Set of pre-evaluated parameters for solver s, .
- Set of meta-parameters in case an adaptive PSS is to be included, .
For each solver in each parameter setting, , a given performance profile is evaluated for each instance, . The ordered set R of resources indicates the energy, time, and memory used for each call to the solver. Although some solvers provide the information of the performance metric as the progress of the resource, e.g., the logs provided in mixed-integer programming solvers with incumbent solutions against time, for some quantum- and physics-based methods, only the final distribution of the solution is provided. One could execute the solve for a grid of resource values, i.e., ; however, this would be highly costly considering that access to these solvers is limited and expensive. We implement the bootstrapping in a parallelizable manner to efficiently regenerate these profiles, using only the distribution of solutions for the largest result value , and compute confidence intervals for these metric predictions, which are then propagated along the data aggregations in the “Window stickers” framework. By incorporating confidence intervals, Stochastic-Benchmark provides a robust framework for evaluating solver performance and comparing different algorithms.
The performance profiles, , are aggregated to compute the VB, fPPS automatically within Stochastic-Benchmark. Moreover, there is an implementation to perform adaptive PSS by connecting to the hyper-parameter optimizer Hyperopt, and an armed bandit strategy is implemented to evaluate the balance of exploration of parameter values for solvers and exploitation of the best-found parameters. Thus, the main idea is that with a given amount of resource budget, a fraction of those resources (ExploreFrac) is spent exploring the parameter space to get a sense of which parameters are suitable and then, using the knowledge obtained, spend the remaining resources running the solver with one well-informed choice of parameters.
Each of these PSS outputs a parameter strategy plot, which denotes the variation of the parameter values for different values of resources. Actionable parameter strategy plots can be computed through callbacks in the code, which allow fitting these parameter profiles by functional forms using the Python numerical computation libraries numpy and scipy. Finally, the software automatically partitions the instance set into the training and testing sets and repeats the benchmarking procedure for each partition, ultimately applying a cross-validation technique to tackle the parameter strategies’ overfitting.
Illustrative example
This section describes results obtained by applying the Stochastic-Benchmark framework on an illustrative example. We describe the operational resources and constraints of the benchmark, the set of problem instances, the figure of merit, which information is accessible to solvers before the solution of the problems, the parameter setting strategy, and the test to assess a successful run. We consider these to be the elements of a conscientious benchmark.
Operational resources and constraints: Solution methods
We seek to minimize the energy of a class of zero-field Ising models, i.e., . The bitstring that minimizes the problem, , and its corresponding objective or ground state energy, , are desired. For this illustrative example, we consider two solvers: parallel tempering and a chaotic amplitude control coherent Ising machine simulator. Both methods were run on a single Ivy Bridge Node of NASA’s supercomputer Pleiades, which counts with two ten-core Intel Xeon E5-2680v2 (2.8 GHz) processors per node, and 3.2 GB RAM per core, 64 GB RAM per node. The resource considered here was the number of reads of the problem variables, also called spins, given their nature, which is proportional to the time executed.
Solver 1: Coherent Ising machine simulator. Ising machines are a class of solvers based on the dynamics of physical hardware that aim to find the minimum energy solution of the Ising model (Mohseni et al. 2022). Coherent Ising machines (CIMs) are an example of Ising machines that exploit mixed-state density operators in a quantum oscillator network (Wang et al. 2013). Currently, the CIM is primarily benchmarked by simulating a quantitative model of its behavior in different applications. Although this is a widely accepted approach, no single model of the CIMs dynamics exists. Instead, different models with varying degrees of fidelity have been constructed when modeling quantum mechanical effects. A specific type of CIM model is called the chaotic amplitude control (CIM-CAC), which seems to provide some advantages over other types of CIM (Leleu et al. 2021; Reifenstein et al. 2021). Recent improvements have also emerged on the simulated model based on machine-learning insights (Brown et al. 2024).
A set of ordinary differential equations describes the CIM dynamics. In the case of CIM-CAC, the spin variables are relaxed to continuous variables , and auxiliary variables satisfy , , , and , where denotes the squared target oscillation amplitude, and R the pump schedule parameter. After integrating these differential equations, the values of the variables are projected into the domain.
This solver considers four parameters, , , , and R, and the resources are given by the number of shots that account for the integration of the differential system in the time domain, simulating the execution in the hardware of the CIM. We use a Python-based simulation library CIM-optimizer (Chen et al. 2022) to simulate CIM-CAC.
Solver 2: Parallel Tempering. Replica exchange MCMC sampling (Hukushima and Nemoto 1996), which is also known as parallel tempering, is a state-of-the-art heuristic to solve Ising-like optimization problems. Parallel tempering aims to overcome the issues faced by simulated annealing (Kirkpatrick et al. 1983) by initializing multiple ‘replicas’ at different temperatures. The replicas undergo some Metropolis-Hastings updates, followed by a temperature swap between two replicas. Here, we briefly describe the solver and the parameters determining the solver’s performance and refer the reader to Zhu et al. (2015) and Mandrà and Katzgraber (2018) for more details.
In parallel tempering, several replicas are initiated at temperatures ranging between user-determined and that can be encoded in terms of two probabilities and , that control how likely a spin flip occurs in a Metropolis update at the and respectively. quantifies the probability of the least likely spin flip at the lower temperature, and denotes the likelihood of the most likely spin flip at the higher temperature. Both these probabilities depend on the J matrix values and can be approximated as and , where , , is the count of being equal to , and .
In addition to , the execution time is affected by another parameter, the number of sweeps s, which denotes the number of Metropolis updates to be implemented in the algorithm. Thus, the solver takes four parameters, , s, , and , and the resources are given by shots, accounting for a serial execution of the replicas. We benchmark the Python-based implementation of parallel tempering PySA (Mandra et al. 2023).
Choice of problems for benchmarking: Wishart instances
The values of J are selected from the Wishart ensemble (Hamze et al. 2020) to generate problem instances with planted solutions. In particular, they correspond to the solution of the nullspace of a system of linear equations, i.e., where , out of which after a perturbation with Gaussian noise, the J matrix is constructed. The difficulty of these problems is controlled by a parameter , with a non-monotonic easyhardeasy profile as is varied, with a critical value of . We choose for illustrative purposes in the following unless otherwise noted.
The Python library Chook (Perera et al. 2020) was used to generate 50 instances for each size N.
Figure of merit: performance ratio
We quantify the performance using a normalized performance score defined as follows:
Thus, the score ranges from 0, when the solver performs no better than random sampling, to 1, when the solver obtains the optimal solution. Considering that we know the solution a priori (since the Wishart instances have known solutions by design), this performance score would be closely related to the optimality gap. To compute the baseline for random performance, we sampled 1,000 random bitstrings per instance and evaluated their average objective values. This sample size was selected to ensure stability in the estimated expectation and reproducibility across test-train splits.
Accesible prior information
Although the solvers did not use any particular structure of the problems when solving the Wishart instances, their developers guided us through the ranges of the parameter values discussed below for performance. This indication was based solely on the size of the instances, and the problem type was not revealed to the developers to avoid biases in the parameter recommendation.
Parameter setting and run strategy
We provide a search space for each of the parameters considered, usually over a uniform distribution around nominal values provided by the developers, except for the transition probabilities in parallel tempering, which were varied in truncated normal distributions to avoid numerical errors of the solvers. A grid for the meta-parameters for Hyperopt, namely ExplorFrac and (the resource expense of every value queried during the exploration phase), and the distributions for the parameters are reported in Appendix.
Success test
To obtain the performance profile (the “Window stickers”), we analyze the performance profiles for ten test-train splits, with of the instances chosen as training instances and the rest as testing instances. We combine the confidence intervals and the aggregated value (mean or median) for the performance across all splits to provide cross-validated results. The results are automatically produced by the Stochastic-Benchmark software and are part of the examples in the repository (Bernal Neira et al. 2023).
Results
The cross-validated performance profiles for both solvers, CIM-CAC and PySA, are shown in Fig. 3. These profiles, obtained from 10 test-train splits of 50 Wishart instances with and , illustrate the empirical trade-offs between parameter setting strategies—namely, virtual best (VB), fixed suggested parameters (fPSS), and Hyperopt-based exploration-exploitation approaches. The plots quantify the expected performance score as a function of computational resource and highlight how each strategy performs under varying budget constraints.
Fig. 3.
Cross-validated performance profiles from 10 train-test splits of 50 Wishart instances with and , solved using (left) CIM-CAC (Chen et al. 2022) and (right) PySA (Mandra et al. 2023). Each curve represents a parameter setting strategy: virtual best (VB), fixed suggested parameters, and Hyperopt-based exploration-exploitation. The vertical axes (performance score) are aligned for comparability; horizontal axes reflect solver-specific resource definitions—spin reads for CIM-CAC and replica-sweep products for PySA—as detailed in Section 4. All results were generated using the Stochastic-Benchmark framework
Following the performance analysis, the framework identifies the best-performing parameter values for each solver based on training data. These values are visualized in Fig. 4, which shows parameter strategy plots for both CIM-CAC and PySA. Each curve illustrates how individual solver parameters vary with increasing resource budget under different parameter setting strategies. These visualizations enable a concise understanding of the solver behavior across resource levels and highlight which parameters are most sensitive to resource allocation. Full details of the parameter search spaces used to generate these plots are provided in Appendix.
Fig. 4.
Parameter strategy plots for (left) CIM-CAC (Chen et al. 2022) and (right) PySA (Mandra et al. 2023), showing the variation of key solver parameters as a function of resource budget. Each curve represents the fixed suggested values, the result of cross-validated virtual best strategies, or the projections from adaptive exploration strategies. The full parameter ranges used in these experiments are provided in Appendix, and were selected based on developer guidance and prior domain knowledge. The ranges span multiple orders of magnitude where appropriate (e.g., sweeps for PySA or Gamma for CIM-CAC) to accommodate different resource scaling behaviors and ensure coverage of realistic operating regimes. Parameter values shown here are the empirically recommended settings derived from cross-validation across 50 Wishart instances. All results were generated using the Stochastic-Benchmark software framework
To complement the fixed parameter profiles shown in Fig. 4, Fig. 5 presents strategy plots for the meta-parameters used in adaptive exploration-exploitation schemes, specifically ExploreFrac and . These meta-parameters govern how solver resources are allocated between exploration (parameter search) and exploitation (solver execution). The dashed lines denote configurations with the highest observed performance during training, while the solid lines represent actionable fits used for deployment. For CIM-CAC, performance peaked with ExploreFrac around 0.3 and , whereas PySA exhibited broader tolerance to meta-parameter variation. These results help quantify how tuning overhead can be effectively managed for different solver classes.
Fig. 5.
Meta-parameter strategy plots for the Hyperopt-based exploration-exploitation strategy, applied to (left) CIM-CAC (Chen et al. 2022) and (right) PySA (Mandra et al. 2023). Each plot shows the evolution of two meta-parameters: ExploreFrac, which denotes the fraction of total resource spent in the exploration phase, and , which controls the budget allocated per evaluation during exploration. The dashed curves correspond to the meta-parameter settings that yielded the highest observed performance in training, while the solid curves represent fitted actionable profiles for use in deployment. For CIM-CAC, optimal performance was consistently achieved with ExploreFrac around 0.3 and , while PySA exhibited flatter performance across a wider range. These plots are critical to understanding how meta-parameter values influence solver behavior across resource budgets, and are derived from cross-validation across 50 problem instances. Additional context and performance implications are discussed in Section 4
To further contextualize the performance of the solvers under practical constraints, Fig. 6 presents two comparative analyses. The left panel shows a direct performance comparison between CIM-CAC and PySA when normalized by wall-clock time, offering a realistic view of resource efficiency. The right panel illustrates how each solver scales with increasing instance size , under a fixed resource budget. These plots demonstrate that the relative advantage of each solver depends on both resource availability and problem size: CIM-CAC with Hyperopt performs best under sufficient resources, while PySA with fixed parameters becomes favorable for larger instances or tighter resource budgets.
Fig. 6.
(Left) Performance Comparison: The performance profiles of CIM-CAC (Chen et al. 2022) and PySA (Mandra et al. 2023) overlaid on the same plot, with resource chosen to be the wall-clock time. (Right) Scaling of performance for both technologies with . (generated by Stochastic-Benchmark)
Our results indicate that the optimal exploration-exploi-tation balance is solver- and context-dependent. For CIM-CAC, the Hyperopt-based strategy with ExploreFrac = 0.3 and consistently outperformed both fixed and other adaptive strategies across test-train splits (see Fig. 5). This behavior is also reflected in Fig. 6 (left), where this configuration leads to superior performance beyond a 10-second resource budget. In contrast, for PySA, fixed strategies outperform adaptive ones for limited resources or larger problem sizes (see Fig. 6, right). These findings suggest that moderate exploration fractions (20–40%) often yield competitive trade-offs, and further tuning may be solver-specific.
The resulting plots summarize solver behavior across a range of conditions, offering empirically grounded suggestions for parameter settings that performed well in past instances. While not universally optimal, such summaries can assist practitioners in configuring solvers for similar problems. Moreover, it allows for more specialized analysis. We include in Fig. 6 two examples, a matching of both methods with the same resource, in this case, wall-clock time, leading to a head-to-head comparison of the methods, and an instance size scaling analysis. By observing the N=50 results, it is apparent that in this illustrative case, our analysis allows to evaluate the benefit of using CIM-CAC with Hyperopt with ExplorFrac=0.3 and =1 versus all other tested options, if provided a sufficient amount of resources (at least 10 s for this case). However, if the number of resources is not allowed to increase, it seems that PySA with a fixed PSSs is the best solver for larger problems (right plot).
Conclusions
We presented an approach to benchmarking the performance of hybrid quantum-classical algorithms and physics-based algorithms based on a characterization of parameterized stochastic optimization solvers. In addition, we introduced methods for conscientious benchmarking that provide a scheme for holistic reporting of algorithmic performance. The analysis presented here is well fitted for stochastic optimization methods, among which we classify the quantum methods for optimization, e.g., quantum annealing and gate-based variational parametric algorithms. The main contribution is a set of rules that characterize what an objective benchmarking procedure needs to consider, particularly with solvers spanning different hardware architectures and software that implements this for broad usage by the community. Moreover, the methodology presented here allows for comparing different setups for a given solver, making it useful for parameter setting and tuning procedures.
Acknowledgements
We thank the NASA Quantum AI Laboratory (QuAIL) for valuable discussions, especially Salvatore Mandrá, Max Wilson, and Jeffrey Marshall. The authors thank the CIM-Optimizer and PySA developers for their advice on parameter tuning and the Pleiades supercomputer team for support in running the experiments. This work was supported by NSF CCF (#1918549) and NSF CNS (#1824470) and NASA Academic Mission Services (contract NNA16BD14C – funded under SAA2-403506) and DARPA under IAA 8839 Annex 130. R.B. acknowledges support from the NASA/USRA Feynman Quantum Academy internship program, and P.S. acknowledges support from the USRA internship program.
Appendix: Parameter values for illustrative examples
Appendix A.1. CIM-CAC
Nominal values: , , , , , , , , , , .
Search spaces:
ExplorFrac
Appendix A.2. PySA
Search spaces:
Sweeps:
Replicas:
ExplorFrac
Author contribution
F.W., E.R., and D.V. conceived the study. D.E.B.N., R.B., and P.S. developed the benchmarking methodology and implemented the Stochastic-Benchmark software framework. D.E.B.N., R.B., and P.S. conducted the experiments and data analysis. D.E.B.N., P.S., and D.V. wrote the main manuscript text. M.P., E.R., and D.V. provided supervision and feedback on the benchmarking protocol and contributed to interpreting the results. All authors reviewed and approved the final manuscript.
Data availability
No datasets were generated or analysed during the current study.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
In case of an Ising model framework, it is a spin configuration , however without loss of generality, both representations are equivalent up to a linear transformation.
It is well known that any pseudo-Boolean function can be written uniquely as a multilinear polynomial, i.e., .
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Aaronson S (2019) Quantum computing motte-and-baileys. https://scottaaronson.blog/?p=4447
- Albertsson DI, Rusu A (2023) Highly reconfigurable oscillator-based Ising machine through quasiperiodic modulation of coupling strength. Sci Rep 13(1):4005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andrist RS, Schuetz MJA, Minssen P, Yalovetzky R, Chakrabarti S, Herman D, Kumar N, Salton G, Shaydulin R, Sun Y et al (2023) Hardness of the maximum-independent-set problem on unit-disk graphs and prospects for quantum speedups. Phys Rev Res 5(4):043277 [Google Scholar]
- Bartz-Beielstein T, Doerr C, van den Berg D, Bossek J, Chandrasekaran S, Eftimov T, Fischbach A, Kerschke P, La Cava W, Lopez-Ibanez M, Malan KM, Moore JH, Naujoks B, Orzechowski P, Volz V, Wagner M, Weise T (2020) Benchmarking in optimization: best practice and open issues
- Bergstra J, Yamins D, Cox DD (2022) Hyperopt: distributed asynchronous hyper-parameter optimization. Astrophys Source Code Library, pp ascl–2205
- Bernal Neira DE, Brown R, Sathe P, Venturelli D (2023) Stochastic benchmark: toolkit for performance evaluation and parameter tuning of stochastic parameterized stochastic optimization solvers. https://github.com/usra-riacs/stochastic-benchmark
- Boros E, Hammer PL (2002) Pseudo-Boolean optimization. Discrete Appl Math 123(1–3):155–225 [Google Scholar]
- Brown R, Bernal Neira DE, Venturelli D, Pavone M (2022) A copositive framework for analysis of hybrid Ising-classical algorithms. arXiv:2207.13630
- Brown R, Venturelli D, Pavone M, Bernal Neira DE (2024) Accelerating continuous variable coherent Ising machines via momentum. arXiv:2401.12135
- Bussieck MR, Dirkse SP, Vigerske S (2014) PAVER 2.0: an open source environment for automated performance analysis of benchmarking data. J Global Optim 59:259–275 [Google Scholar]
- Camsari KY, Sutton BM, Datta S (2019) P-bits for probabilistic spin logic. Appl Phys Rev 6(1)
- Chen F, Isakov B, King T, Leleu T, McMahon P, Onodera T (2022) Cim-Optimizer: a simulator of the coherent Ising machine. https://github.com/mcmahon-lab/cim-optimizer
- Coffrin CJ (2023) On the emerging potential of quantum annealing hardware for combinatorial optimization. Technical report, Los Alamos National Laboratory (LANL), Los Alamos, NM (United States) [Google Scholar]
- Dai W, Berleant D (2019) Benchmarking contemporary deep learning hardware and frameworks: a survey of qualitative metrics. In: 2019 IEEE first international conference on cognitive machine intelligence (CogMI). IEEE, pp 148–155
- Dalyac C, Henry L-P, Kim M, Ahn J, Henriet L (2023) Exploring the impact of graph locality for the resolution of the maximum-independent-set problem with neutral atom devices. Phys Rev A 108(5):052423 [Google Scholar]
- Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91(2):201–213 [Google Scholar]
- Dupont M, Evert B, Hodson MJ, Sundar B, Jeffrey S, Yamaguchi Y, Feng D, Maciejewski FB, Hadfield S, Alam MS et al (2023) Quantum-enhanced greedy combinatorial optimization solver. Sci Adv 9(45):eadi0487 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fouskakis D, Draper D (2002) Stochastic optimization: a review. Int Stat Rev 70(3):315–349 [Google Scholar]
- Gould N, Scott J (2016) A note on performance profiles for benchmarking software. ACM Trans Math Softw (TOMS) 43(2):1–5 [Google Scholar]
- Hamze F, Raymond J, Pattison CA, Biswas K, Katzgraber HG (2020) Wishart planted ensemble: a tunably rugged pairwise Ising model with a first-order phase transition. Phys Rev E 101(5):052102 [DOI] [PubMed] [Google Scholar]
- Hukushima K, Nemoto K (1996) Exchange Monte Carlo method and application to spin glass simulations. J Phys Soc Jpn 65(6):1604–1608 [Google Scholar]
- Kim Y, Eddins A, Anand S, Wei KX, Van Den Berg E, Rosenblatt S, Nayfeh H, Wu Y, Zaletel M, Temme K et al (2023) Evidence for the utility of quantum computing before fault tolerance. Nature 618(7965):500–505 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim M, Mandrà S, Venturelli D, Jamieson K (2021) Physics-inspired heuristics for soft MIMO detection in 5g new radio and beyond. In: Proceedings of the 27th annual international conference on mobile computing and networking, pp 42–55
- King AD, Carrasquilla J, Raymond J, Ozfidan I, Andriyash E, Berkley A, Reis M, Lanting T, Harris R et al (2018) Observation of topological phenomena in a programmable lattice of 1,800 qubits. Nature 560(7719):456–460 [DOI] [PubMed] [Google Scholar]
- King J, Yarkoni S, Nevisi MM, Hilton JP, McGeoch CC (2015) Benchmarking a quantum annealing processor with the time-to-target metric
- Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 [DOI] [PubMed] [Google Scholar]
- Leleu T, Khoyratee F, Levi T, Hamerly R, Kohno T, Aihara K (2021) Scaling advantage of chaotic amplitude control for high-performance combinatorial optimization. Commun Phys 4(1):1–10 [Google Scholar]
- Lykov D, Wurtz J, Poole C, Saffman M, Noel T, Alexeev Y (2023) Sampling frequency thresholds for the quantum advantage of the quantum approximate optimization algorithm. npj Quant Inf 9(1):73 [Google Scholar]
- Maciejewski FB, Hadfield S, Hall B, Hodson M, Dupont M, Evert B, Sud J, Alam MS, Wang Z, Jeffrey S et al (2023) Design and execution of quantum circuits using tens of superconducting qubits and thousands of gates for dense ising optimization problems. arXiv:2308.12423
- Mandrà S, Katzgraber HG (2018) A deceptive step towards quantum speedup detection. Quant Sci Technol 3(4):04LT01 [Google Scholar]
- Mandra S, Akbari Asanjan A, Brady L, Lott A, Bernal Neira DE (2023) PySA: fast simulated annealing in native python. https://github.com/nasa/pysa
- McGeoch CC (2015) Benchmarking D-wave quantum annealing systems: some challenges. In: Electro-optical and infrared systems: technology and applications XII; and quantum information science and technology, vol 9648. SPIE, pp 264–273
- McMahon PL, Marandi A, Haribara Y, Hamerly R, Langrock C, Tamate S, Inagaki T, Takesue H, Utsunomiya S, Aihara K et al (2016) A fully programmable 100-spin coherent Ising machine with all-to-all connections. Science 354(6312):614–617 [DOI] [PubMed] [Google Scholar]
- Mills D, Sivarajah S, Scholten TL, Duncan R (2021) Application-motivated, holistic benchmarking of a full quantum computing stack. Quantum 5:415 [Google Scholar]
- Mohseni N, McMahon PL, Byrnes T (2022) Ising machines as hardware solvers of combinatorial optimization problems. Nat Rev Phys 4(6):363–379
- Moreau T, Massias M, Gramfort A, Ablin P, Bannier P-A, Charlier B, Dagréou M, Dupre la Tour T, Durif G, Dantas CF et al (2022) Benchopt: reproducible, efficient and collaborative optimization benchmarks. Adv Neural Inf Process Syst 35:25404–25421
- Patel S, Canoza P, Salahuddin S (2022) Logically synthesized and hardware-accelerated restricted Boltzmann machines for combinatorial optimization and integer factorization. Nat Electron 5(2):92–101 [Google Scholar]
- Perera D, Akpabio I, Hamze F, Mandra S, Rose N, Aramon M, Katzgraber HG (2020) Chook–a comprehensive suite for generating binary optimization problems with planted solutions
- Perez MA (2020) Transitioning quantum atomic technologies from the lab to the real world. In: Quantum photonics: enabling technologies, vol 11579. SPIE, p 1157906
- Preskill J (2018) Quantum computing in the NISQ era and beyond. Quantum 2:79 [Google Scholar]
- Reifenstein S, Kako S, Khoyratee F, Leleu T, Yamamoto Y (2021) Coherent Ising machines with optical error correction circuits. Adv Quant Technol 4(11):2100077 [Google Scholar]
- Rice JR (1976) The algorithm selection problem. In: Advances in computers, vol 15. Elsevier, pp 65–118
- Singh AK, Jamieson K, McMahon PL, Venturelli D (2022) Ising machines’ dynamics and regularization for near-optimal MIMO detection. IEEE Trans Wireless Commun 21(12):11080–11094 [Google Scholar]
- Wang Z, Marandi A, Wen K, Byer RL, Yamamoto Y (2013) Coherent Ising machine based on degenerate optical parametric oscillators. Phys Rev A 88(6):063853 [Google Scholar]
- Zhu Z, Ochoa AJ, Katzgraber HG (2015) Efficient cluster algorithm for spin glasses in any space dimension. Phys Rev Lett 115(7):077201 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No datasets were generated or analysed during the current study.





