Parameter Tuning Guide
Introduction to TitanQ Tuning
TitanQ does much of the heavy lifting to solve your very challenging problems. To allow the greatest flexibility, however, some knobs have been exposed to allow the best performance across a variety of problems. This guide intends to explain these knobs, and give insight into how best to tune TitanQ to deliver the best results for the specific problem at hand.
Model Hyperparameters
Model hyperparameters are mathematical adjustments that help the system’s stochastic sampler converge to optimal solutions. TitanQ’s two hyperparameters, coupling_mult
& beta
, drastically affect the resulting solution identified and therefore should be the 1st place to investigate when tuning.
coupling_mult
The way that we map problems onto our system is that there is a mapping between the initial QUBO graph, and the graph that is actually solved on the hardware (a bipartite RBM graph). This is the mapping from A) to B) in the figure below.
This is enforced as a soft constraint between the logical nodes and the physical nodes, where there are 2 physical nodes for each logical node (For more context, see this example of the difference between logical and physical nodes https://link.springer.cocm/article/10.1007/s11128-008-0082-9 ). This constraint is represented by the red dotted line in the above figure B). The coupling_mult
parameter sets the strength of this constraint and is usually set to a value between [0, 1]. In most cases we have found that setting it to coupling_mult=0.3
gives a sufficiently good solution quality and is a good starting point.
When the coupling_mult
is too low, the solution to the problem diverges from the solution returned by the hardware system. This is because the constraint is not strong enough in the problem, and the solution is solving the “wrong” problem. When coupling_mult
is too high, the system is solving the correct problem, but may do so very slowly, and converge slowly. This is OK in many cases, as you will receive an answer that is “good enough”. We suggest starting with a coupling_mult
value that is slightly too high and working downwards.
As a strategy for setting this parameter, we have usually started at the value of coupling_mult=0.3
, and adjusted from there. If the solution quality is very bad then usually coupling_mult
is too low, and we must raise it. If the solution is reasonable but we suspect a better one exists, then we can test this by reducing coupling_mult
gradually until we suddenly start seeing degraded solution performance. The best solutions will be produced when coupling_mult
is set just before this “fall-off” point, represented by the graph below. In this graph, you can see that setting coupling too low is catastrophic in solution quality, but setting coupling too high simply means slightly sub-optimal solutions.
For further reading, you can see our paper on RBM-based mappings (https://arxiv.org/pdf/2008.04436), specifically discussions around the C parameter and graph embeddings.
A visual example of coupling_mult
set too low or too high is included below as seen in our support for Clustering:
In summary,
- Most optimal
coupling_mult
values are between 0.03 and 0.5, and the maximum coupling seen for most problems is ~1.- We recommend starting with
coupling_mult=0.3
.
- We recommend starting with
- Solutions tend to improve as
coupling_mult
decreases, up until a “fall-off” point. - As
coupling_mult
gets closer and closer to the fall-off value from above, solutions tend to improve. - When
coupling_mult
gets too close to the fall-off value, an increasing fraction of runs begin to give degenerate solutions.- These degenerate solutions have very high energies (which is bad in the context of minimization) compared to other runs.
- These solutions tend to be all zeros or all ones.
- A common tuning strategy is to iteratively lower
coupling_mult
as much as possible before solutions become degenerate.
beta
To best explore the state space for optimal solutions, a scaling factor for the problem can be used. An array of scaling factors, here as beta
, helps ensure that local minima are able to be escaped from as well as respect for problem constraints if present.
To find the endpoints and values of this array of beta
, we usually look at the inverse of beta
(we refer to this as temperature) as this directly maps to the values in the weights and bias matrices.
For unconstrained problems:
- Tmin = min(abs(weights[weights != 0]), abs(bias[bias != 0]))
- Tmax = median(abs(weights[weights != 0]), abs(bias[bias != 0]))
For constrained problems:
-
Tmin = min(abs(weights[weights != 0]), abs(bias[bias != 0]))
-
Tmax = 10*max_row_sum
In this case, “max_row_sum” refers to the maximum sum of values in a particular row of your weights matrix/bias vector. Generally this will yield a much higher Tmax than the unconstrained problems.
From these endpoints, interpolation can be used to fill the array with appropriately spaced values as shown below
beta = 1.0 / np.geomspace(T_min, T_max, num_chains)
Some notes & observations on setting the Tmin and Tmax values when generating the beta
vector are as follows:
- Tmin
- If given a set of hyperparameters, decreasing Tmin to a very small number such as 0.0005 usually doesn’t damage the solution quality (sometimes improves it slightly).
- Tmax
- Sometimes, Tmax has an upper fall off value. I.e. setting Tmax beyond a certain value gives degenerate solutions.
- Sometimes, increasing Tmax beyond a certain peak value no longer improves solutions.
- These high temperature chains are essentially useless since they all trickle down to the lower temp ones.
While defining individual values for the beta
array is possible, it is not recommended as proper spacing between values is important, and is well handled by either linear spacing (np.linspace
with Numpy) or geometric spacing (np.geomspace
with Numpy) depending on the problem.
System Parameters
System parameters are directly related to solve request efficiency and resource utilization. These are not mathematically related to the problem to be solved, and instead define the configuration of the solve request to be launched.
num_chains & num_engines
The number of chains and engines determine how many copies of the problem to launch simultaneously. The num_chains
is the same as the length of beta
as each chain will be assigned one value from the beta
array. The num_engines
is the number of independent batches of chains to run in parallel. The number of engines will be the same as the number of solutions returned, as the best chain from each engine will be returned.
Smaller total number of chains (num_chains
* num_engines
) will result in more samples taken for the same timeout. This means that the absolute potential best solution quality could be higher. However, running more chains in parallel will guarantee a higher average solution quality, as while fewer samples are taken in each chain, there are more chains exploring the space.
Below are some recommendations:
num_chains
= 16, is usually the minimum, but higher granularity may be needed.num_chains
= 32, is usually a good value for this parameter. Larger than this usually does not have much benefit, though up to 64 has been seen to be effective.- Hyperparameters may need to be retuned with a new
num_chains
value. - A previous misconception was that it would be better to have more than 32 chains if the gap between Tmin and Tmax was very large, e.g. Tmin = 2 and Tmax = 32000. Increasing
num_chains
did not help for most cases. num_chains
*num_engines
= 256 for best results. Fornum_chains=32
,num_engines=8
. The system will return 8 solutions back to you, and you can decide which solution is best by your own metrics (constraint violations, minimal error, etc.).
timeout_in_secs
The timeout for the solve request is problem size dependent. Generally large problems should have larger timeouts. If your problems are < 1000 variables something like a 30 second timeout is reasonable. If your problems are > 10000 variables, a 60 second minimum timeout is recommended.
Potential Tuning Strategies
Sandwich Tuning
- Begin with a decent set of hyperparameters (as suggested above).
- Begin with
num_chains=16
andnum_engines=8
. - Decrease Tmin to a very low number such as 0.0005.
- Increase Tmax to a higher number (100-200 frequently seen).
- Iteratively lower
coupling_mult
right before solutions become degenerate (find the “fall-off” point). - Take turns increasing Tmin and decreasing Tmax until solution stops improving.
- If solutions start to become degenerate, consider readjusting
coupling_mult
, usually by raising it in steps of 0.01 magnitude.
- If solutions start to become degenerate, consider readjusting
- Increase
num_chains
to a maximum of approximately 64 to see if better temperature resolution improves quality.- If so, repeat steps 1-6 with new system configuration.
This strategy makes the following assumptions:
- Tmin below a certain value stops improving solution quality.
- Tmax above a certain value stops improving solution quality.
- The fall-off point for
coupling_mult
is not dependent (or low dependence) on Tmin or Tmax.
3rd Party Packages (Optuna)
- Run Optuna with a 1 min timeout and 100 trials (300 or more trials is even better).
- Take either the best performing set of hyperparameters (min), or the “mean of the sets of hyperparameters that gave the most consistently good solutions”.
- Run these hyperparameters on TitanQ with the same 1 min or longer timeout.
- Using Optuna with shorter timeout and more trials almost always performed better than longer timeout and fewer trials.