Good ideas to perform an architecture search in CNN/DL.
Abstract: Architecture search aims at automatically finding neural architectures that are competitive with architectures designed by human experts. While recent approaches have come close to matching the predictive performance of manually designed architectures for image recognition, these approaches are problematic under constrained resources for two reasons: first, the architecture search itself requires vast computational resources for most proposed methods. Secondly, the found neural architectures are solely optimized for high predictive performance without penalizing excessive resource consumption. We address the first shortcoming by proposing NASH, an architecture search which considerable reduces the computational resources required for training novel architectures by applying network morphisms and aggressive learning rate schedules. On CIFAR10, NASH finds architectures with errors below 4% in only 3 days. We address the second shortcoming by proposing Pareto-NASH, a method for multi-objective architecture search that allows approximating the Pareto-front of architectures under multiple objective, such as predictive performance and number of parameters, in a single run of the method. Within 56 GPU days of architecture search, Pareto-NASH finds a model with 4M parameters and test error of 3.5%, as well as a model with less than 1M parameters and test error of 4.6%.
Conclusion: We proposed NASH, a simple and fast method for automated architecture search based on a hill climbing strategy, network morphisms, and training via SGDR. Experiments on CIFAR10 showed that our method yields competitive results while requiring considerably less computational resources for architecture search than most alternative approaches. However, in most practical application not only the predictive performance plays an important role but also resource consumption. To address this, we proposed Pareto-NASH, a multi-objective architecture search method that employs additional operators for shrinking models and extends NASH’s hill climbing strategy to an evolutionary algorithm. ParetoNASH is designed to exploit the fact that evaluating the performance of a neural network is orders of magnitude more expensive than evaluating, e.g., the model’s size. Experiments on CIFAR-10 showed that Pareto-NASH is able to find competitive models in terms of both predictive performance and resource efficiency.