Learning Scalable Deep Kernels with Recurrent Structure

Abstract: Many applications in speech, robotics, finance, and biology deal with sequential data, where ordering matters and recurrent structures are common. However, this structure cannot be easily captured by standard kernel functions. To model such structure, we propose expressive closed-form kernel functions for Gaussian processes. The resulting model, GP-LSTM, fully encapsulates the inductive biases of long short-term memory (LSTM) recurrent networks, while retaining the non-parametric probabilistic advantages of Gaussian processes. We learn the properties of the proposed kernels by optimizing the Gaussian process marginal likelihood using a new provably convergent semi-stochastic gradient procedure, and exploit the structure of these kernels for scalable training and prediction. This approach provides a practical representation for Bayesian LSTMs. We demonstrate state-of-the-art performance on several benchmarks, and thoroughly investigate a consequential autonomous driving application, where the predictive uncertainties provided by GP- LSTM are uniquely valuable.
Discussion: We proposed a method for learning kernels with recurrent long short-term memory structure on sequences. Gaussian processes with such kernels, termed the GP-LSTM, have the structure and learning biases of LSTMs, while retaining a probabilistic Bayesian nonparametric representation. The GP-LSTM outperforms a range of alternatives on several sequence-toreals regression tasks. The GP-LSTM also works on data with low and high signal-to-noise ratios, and can be scaled to very large datasets, all with a straightforward, practical, and generally applicable model specification. Moreover, the semi-stochastic scheme proposed in our paper is provably convergent and efficient in practical settings, in conjunction with structure exploiting algebra. In short, the GP-LSTM provides a natural mechanism for Bayesian LSTMs, quantifying predictive uncertainty while harmonizing with the standard deep learning toolbox. Predictive uncertainty is of high value in robotics applications, such as autonomous driving, and could also be applied to other areas such as financial modeling and computational biology.
Learning Scalable Deep Kernels with Recurrent Structure

Nested LSTMs

By Joel Ruben Antony Moniz, David Krueger

We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as coutert=ftct1+itgt, NLSTM memory cells use the concatenation (ftct1,itgt) as input to an inner LSTM (or NLSTM) memory cell, and set coutert = hinnert. Nested LSTMs outperform both stacked and single-layer LSTMs with similar numbers of parameters in our experiments on various character-level language modeling tasks, and the inner memories of an LSTM learn longer term dependencies compared with the higher-level units of a stacked LSTM.

 

Full implementation in Github.

Nested LSTMs