ASTRO is an adaptive sampling trust region algorithm designed to solve (unconstrained) smooth stochastic optimization problems with a first-order oracle constructed using a Monte Carlo simulation or a large dataset of ``scenarios." Like its derivative-free counterpart ASTRO-DF, ASTRO adaptively constructs a stochastic local model, which is then optimized and updated iteratively. Crucially and unlike ASTRO-DF, however, the extent of sampling in ASTRO is directly proportional to the inverse squared norm of the sampled gradient at the incumbent point, leading to clearly observable gains in numerical performance. The sequence of (true) gradient norms measured at ASTRO's iterates converges almost surely to zero. We also characterize a work complexity result that expresses ASTRO's convergence rate in terms of the total number of oracle calls (as opposed to the total number of iterations). This appears to be one of the few results of its kind and confirms the strong numerical performance observed in practice.
joint work with Coralia Cartis, Jan Fiala, Benjamin Marteau
In existing techniques for model-based derivative-free optimization, the computational cost of constructing local models and Lagrange polynomials can be high. As a result, these algorithms are not as suitable for large-scale problems as derivative-based methods. In this talk, I will introduce a derivative-free method based on exploration of random subspaces, suitable for nonlinear least-squares problems. This method has a substantially reduced computational cost (in terms of linear algebra), while still making progress using few objective evaluations.
joint work with Albert Berahas, Katya Scheinberg
To optimize a black-box function, gradient-based methods can be used in conjunction with finite difference methods. Alternatively, optimization can be done with polynomial interpolation models that approximate the black-box function. A class of random gradient-free algorithms based on Gaussian smoothing of the objective function was analyzed by Yurii Nesterov in 2011. Similar random algorithms were implemented and discussed in the machine learning community in more recent years. We compare these methods on their accuracy of gradient approximation and the overall performance.