134 Pages
English

Studies in Continuous Black-box Optimization [Elektronische Ressource] / Tom Schaul. Gutachter: Hans-Joachim Bungartz ; H. Jürgen Schmidhuber. Betreuer: H. Jürgen Schmidhuber

Gain access to the library to view online
Learn more

Description

Technische Universität MünchenLehrstuhl VI: Echtzeitsysteme und RobotikStudies in ContinuousBlack-box OptimizationTom SchaulVollständiger Abdruck der von der Fakultät für Informatik der TechnischenUniversität München zur Erlangung des akademischen Grades einesDoktors der Naturwissenschaften (Dr. rer. nat.)genehmigten Dissertation.Vorsitzender: Univ.-Prof. B. Brügge, Ph.DPrüfer der Dissertation: 1. Univ.-Prof. Dr. H.-J. Bungartz2. Univ.-Prof. Dr. H. J. SchmidhuberDieDissertationwurdeam06.06.2011beiderTechnischenUniversitätMüncheneingereicht und durch die Fakultät für Informatik am 28.06.2011 angenommen.iiAbstractptimization is the research field that studies that studies the design ofO algorithms for finding the best solutions to problems we humans throwat them. While the whole domain is of important practical utility, the presentthesis will focus on the subfield of continuous black-box optimization, presentinga collection of novel, state-of-the-art algorithms for solving problems in thatclass.First, we introduce a general-purpose algorithm called Natural EvolutionStrategies (NES). In contrast to typical evolutionary algorithms which searchin the vicinity of the fittest individuals in a population, evolution strategiesaim at repeating the type of mutations that led to those individuals. We cancharacterize those mutations by a search distribution.

Subjects

Informations

Published by
Published 01 January 2011
Reads 10
Language English
Document size 2 MB

Technische Universität München
Lehrstuhl VI: Echtzeitsysteme und Robotik
Studies in Continuous
Black-box Optimization
Tom Schaul
Vollständiger Abdruck der von der Fakultät für Informatik der Technischen
Universität München zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften (Dr. rer. nat.)
genehmigten Dissertation.
Vorsitzender: Univ.-Prof. B. Brügge, Ph.D
Prüfer der Dissertation: 1. Univ.-Prof. Dr. H.-J. Bungartz
2. Univ.-Prof. Dr. H. J. Schmidhuber
DieDissertationwurdeam06.06.2011beiderTechnischenUniversitätMünchen
eingereicht und durch die Fakultät für Informatik am 28.06.2011 angenommen.iiAbstract
ptimization is the research field that studies that studies the design of
O algorithms for finding the best solutions to problems we humans throw
at them. While the whole domain is of important practical utility, the present
thesis will focus on the subfield of continuous black-box optimization, presenting
a collection of novel, state-of-the-art algorithms for solving problems in that
class.
First, we introduce a general-purpose algorithm called Natural Evolution
Strategies (NES). In contrast to typical evolutionary algorithms which search
in the vicinity of the fittest individuals in a population, evolution strategies
aim at repeating the type of mutations that led to those individuals. We can
characterize those mutations by a search distribution. The key idea of NES is
to ascend the gradient on the parameters of that distribution towards higher
expected fitness. We show how plain gradient ascent is destined to fail, and
provide a viable alternative that instead descends along the natural gradient
to adapt the search distribution, which appropriately normalizes the update
step with respect to its uncertainty. Being derived from first principles, the
NES approach can be extended to all types of search distributions that allow a
parametric form, not just the classical multivariate Gaussian one. We derive a
numberofNESvariantsfordifferentdistributions,andshowhowtheyareuseful
on different problem classes. In addition, we rein in the computational cost,
avoidingcostly matrixinversionsthroughanincrementalchangeof coordinates.
Two additional, novel techniques, importance mixing and adaptation sampling,
allow us to automatically tune the learning rate and batch size to the problem,
and thereby further reduce the average number of required fitness evaluations.
A third technique, restart strategies, provides the algorithm with additional
robustness in the presence of multiple local optima, or noise.
Second,weintroduceanewapproachtocostly black-boxoptimization,when
fitnessevaluationsareveryexpensive. Here,we modelthe fitnessfunctionusing
state-of-the-art Gaussian process regression, and use the principle of artificial
curiosity to direct exploration towards the most informative next evaluation
candidate. Boththeexpectedfitnessimprovementandtheexpectedinformation
gaincanbederivedexplicitlyfromtheGaussianprocessmodel,andourmethod
constructsa frontof Pareto-optimalpoints accordingto these twocriteria. This
iiimakes the exploration-exploitation trade-off explicit, and permits maximally
informed candidate selection.
We empirically validate our algorithms on a broad set of benchmarks. We
also include a study of how continuous black-box optimization can solve chal-
lengingneuro-evolutionproblems,wheremulti-dimensionalrecurrentneuralnet-
works are trained to play the game of Go. At the same time, we establish to
what degree those network architectures can transfer good playing behavior
from small to large board sizes.
In summary, this dissertation presents a collection of novel algorithms, for
the general problem of continuous black-box optimization as well as a number
of special cases, each validated empirically.
ivDedicated to
Joé Schaul
In memory of
Adina Mosincat
vviTable of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
0.1 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
0.2 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . xviii
0.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
1 Introduction: Continuous Black-box Optimization . . . . . . 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Continuous Optimization: Real-valued Solution Space . . 2
1.1.2 Black-box Optimization: Unknown Function . . . . . . . 2
1.1.3 Local Optimization: Find a Nearby Optimum . . . . . . . 3
1.1.4 Noisy Optimization: Functions Corrupted by Noise . . . . 3
1.1.5 Multi-objective Optimization . . . . . . . . . . . . . . . . 4
1.2 Evaluating Optimization Algorithms . . . . . . . . . . . . . . . . 5
1.3 State-of-the-Art Approaches . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Evolutionary Methods . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Response Surface Methods . . . . . . . . . . . . . . . . . 7
1.4 Impact and Applications . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Related Problem Domains . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Open Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Natural Evolution Strategies . . . . . . . . . . . . . . . . . . . 11
2.1 The NES Family . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Chapter Outline . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Search Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
vii2.2.1 Search Gradients for Gaussian Distributions . . . . . . . . 14
2.2.2 Limitations of Plain Search Gradients . . . . . . . . . . . 15
2.3 Using the Natural Gradient . . . . . . . . . . . . . . . . . . . . . 16
2.4 Performance and Robustness Techniques . . . . . . . . . . . . . . 19
2.4.1 Fitness Shaping . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 Fitness Baselines . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.3 Importance Mixing . . . . . . . . . . . . . . . . . . . . . . 22
2.4.4 Adaptation Sampling. . . . . . . . . . . . . . . . . . . . . 24
2.4.5 Restart Strategies . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Techniques for Multinormal Distributions . . . . . . . . . . . . . 28
2.5.1 Using Exponential Parameterization . . . . . . . . . . . . 28
2.5.2 Using Natural Coordinates . . . . . . . . . . . . . . . . . 29
2.5.3 OrthogonalDecompositionofMultinormalParameterSpace 31
2.5.4 Connection to CMA-ES . . . . . . . . . . . . . . . . . . . 32
2.5.5 Elitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Use for Multi-objective NES . . . . . . . . . . . . . . . . . 37
2.6 Beyond Multinormal Distributions . . . . . . . . . . . . . . . . . 38
2.6.1 Separable NES . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6.2 Rotationally-symmetric Distributions . . . . . . . . . . . 40
Sampling from Radial Distributions . . . . . . . . . . . . 41
Computing the Fisher Information Matrix . . . . . . . . . 41
2.6.3 Heavy-tailed NES . . . . . . . . . . . . . . . . . . . . . . 42
Groups of Invariances . . . . . . . . . . . . . . . . . . . . 43
Cauchy Distribution . . . . . . . . . . . . . . . . . . . . . 44
2.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.7.1 Experimental Setup and Hyperparameters . . . . . . . . . 46
2.7.2 Black-box Optimization Benchmarks . . . . . . . . . . . . 47
2.7.3 Separable NES . . . . . . . . . . . . . . . . . . . . . . . . 48
Separable and Non-separable Benchmarks . . . . . . . . . 48
Neuro-evolution. . . . . . . . . . . . . . . . . . . . . . . . 53
Lennard-Jones Potentials . . . . . . . . . . . . . . . . . . 54
2.7.4 Heavy Tails and Global Optimization . . . . . . . . . . . 56
2.7.5 Results Summary . . . . . . . . . . . . . . . . . . . . . . . 57
2.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3 Curiosity-driven Optimization . . . . . . . . . . . . . . . . . . 61
3.1 Artificial Curiosity . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.1.2 Artificial Curiosity as a Guide for Optimization . . . . . . 62
3.1.3 Formal Framework . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Exploration/Exploitation Trade-off.. . . . . . . . . . . . . . . . . 63
3.3 Curiosity-driven Optimization (General Form) . . . . . . . . . . 65
3.4 Models of Expected Fitness and Information Gain . . . . . . . . 65
viii3.4.1 A Good Model Choice: Gaussian Processes . . . . . . . . 66
3.4.2 Derivation of Gaussian Process Information Gain . . . . . 66
3.5 Curiosity-driven Optimization with Gaussian Processes. . . . . . 67
3.6 Minimal Asymptotic Requirements . . . . . . . . . . . . . . . . . 68
3.6.1 Reaching Optima at Arbitrary Distance . . . . . . . . . . 68
3.6.2 Locating Optima with Arbitrary Precision . . . . . . . . . 69
3.6.3 Guaranteed to Find Global Optimum . . . . . . . . . . . 69
3.7 Proof-of-concept . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4 A Study in Scalability . . . . . . . . . . . . . . . . . . . . . . . 73
4.1 Scalability in Problem Size . . . . . . . . . . . . . . . . . . . . . 73
4.2 Example Domain: the Game of Go . . . . . . . . . . . . . . . . . 74
4.2.1 Rules of the Game . . . . . . . . . . . . . . . . . . . . . . 74
4.2.2 Challenges of Go . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.3 Simpler Variants of Go . . . . . . . . . . . . . . . . . . . . 75
4.2.4 Computer Opponents . . . . . . . . . . . . . . . . . . . . 76
4.3 Scalable Neural Architectures . . . . . . . . . . . . . . . . . . . . 76
4.3.1 Multi-dimensional RNN . . . . . . . . . . . . . . . . . . . 77
4.3.2 Multi-dimensional LSTM . . . . . . . . . . . . . . . . . . 78
4.3.3 A Custom Architecture for Go . . . . . . . . . . . . . . . 78
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 79
4.4.2 Random-weight Networks . . . . . . . . . . . . . . . . . . 80
4.4.3 Optimized Networks . . . . . . . . . . . . . . . . . . . . . 82
4.4.4 Coevolved Networks . . . . . . . . . . . . . . . . . . . . . 84
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2 Closing Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A Appendix: Implementations in Python . . . . . . . . . . . . . 95
B Appendix: Weighted Mann-Whitney Test . . . . . . . . . . . 97
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
ixx