DeepMind Papers At ICLR 2018

Maximum a posteriori policy optimisation

Authors: Abbas Abdolmaleki, Jost Tobias Springenberg, Nicolas Heess, Yuval Tassa, Remi Munos

We introduce a new algorithm for reinforcement learning called Maximum a posteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning. In particular, for continuous control, our method outperforms existing methods with respect to sample efficiency, premature convergence and robustness to hyperparameter settings.

Read the paper
Check out the poster at East Meeting level; 1,2,3 from 1100 to 1300

Hierarchical representations for efficient architecture search

Authors: Hanxiao Liu (CMU), Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, Koray Kavukcuoglu

We explore efficient neural architecture search methods and show that a simple yet powerful evolutionary algorithm can discover new architectures with excellent performance. Our approach combines a novel hierarchical genetic representation scheme that imitates the modularized design pattern commonly adopted by human experts, and an expressive search space that supports complex topologies. Our algorithm efficiently discovers architectures that outperform a large number of manually designed models for image classification, obtaining top-1 error of 3.6% on CIFAR-10 and 20.3% when transferred to ImageNet, which is competitive with the best existing neural architecture search approaches. We also present results using random search, achieving 0.3% less top-1 accuracy on CIFAR-10 and 0.1% less on ImageNet whilst reducing the search time from 36 hours down to 1 hour.

Read the paper
Check out the poster at East Meeting level; 1,2,3 from 1100 to 1300

Learning an embedding space for transferable robot skills

Authors: Karol Hausman, Jost Tobias Springenberg, Ziyu Wang, Nicolas Heess, Martin Riedmiller

We present a method for reinforcement learning of closely related skills that are parameterized via a skill embedding space. We learn such skills by taking advantage of latent variables and exploiting a connection between reinforcement learning and variational inference.

The main contribution of our work is an entropy-regularized policy gradient formulation for hierarchical policies, and an associated, data-efficient and robust off-policy gradient algorithm based on stochastic value gradients. We demonstrate the effectiveness of our method on several simulated robotic manipulation tasks.

Read the paper
Check out the poster at East Meeting level; 1,2,3 from 1100 to 1300

Learning awareness models

Authors: Brandon Amos, Laurent Dinh, Serkan Cabi, Thomas Rothörl, Sergio Gómez Colmenarejo, Alistair M Muldal, Tom Erez, Yuval Tassa, Nando de Freitas, Misha Denil

We show that models trained to predict proprioceptive information about an agent’s body come to represent objects in the external world. The models able to successfully predict sensor readings over 100 steps into the future and continue to represent the shape of external objects even after contact is lost. We show that active data collection by maximizing uncertainty over future sensor readings leads to models that show superior performance when used for control. We also collect data from a real robotic hand and show that the same models can be used to answer questions about the properties of objects in the real world.

Read the paper
Check out the poster at East Meeting level; 1,2,3 from 1100 to 1300

Kronecker-factored curvature approximations for recurrent neural networks

Authors: James Martens, Jimmy Ba (Vector Institute), Matthew Johnson (Google)

Kronecker-factor Approximate Curvature (Martens & Grosse, 2015) (K-FAC) is a 2nd-order optimization method which has been shown to give state-of-the-art performance on large-scale neural network optimization tasks (Ba et al., 2017). It is based on an approximation to the Fisher information matrix (FIM) that makes assumptions about the particular structure of the network and the way it is parameterized. The original K-FAC method was applicable only to fully-connected networks, although it has been recently extended by Grosse & Martens (2016) to handle convolutional networks as well. In this work we extend the method to handle RNNs by introducing a novel approximation to the FIM for RNNs. This approximation works by modelling the covariance structure between the gradient contributions at different time-steps using a chain-structured linear Gaussian graphical model, summing the various cross-covariances, and computing the inverse in closed form. We demonstrate in experiments that our method significantly outperforms general purpose state-of-the-art optimizers like SGD with momentum and Adam on several challenging RNN training tasks.

Read the paper
Check out the poster at East Meeting level; 1,2,3 from 1100 to 1300

Distributed distributional deterministic policy gradients

Authors: Gabriel Barth-maron, Matthew Hoffman, David Budden, Will Dabney, Daniel Horgan, Dhruva Tirumala Bukkapatnam, Alistair M Muldal, Nicolas Heess, Timothy Lillicrap

This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improvements such as the use of N-step returns and prioritized experience replay. Experimentally we examine the contribution of each of these individual components, and show how they interact, as well as their combined contributions. Our results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.

Read the paper
Check out the poster at East Meeting level; 1,2,3 from 1630 to 1830

The Kanerva Machine: A generative distributed memory

Authors: Yan Wu, Greg Wayne, Alex Graves, Timothy Lillicrap

We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation.

Read the paper
Check out the poster at East Meeting level; 1,2,3 from 1630 to 1830

Memory-based parameter adaptation

Authors: Pablo Sprechmann, Siddhant Jayakumar, Jack Rae, Alexander Pritzel, Adria P Badia · Benigno Uria, Oriol Vinyals, Demis Hassabis, Razvan Pascanu, Charles Blundell

Humans and animals are able to incorporate new knowledge quickly from a few examples, continually throughout much of their lifetime
. In contrast, neural network-based models rely on the data distribution being stationary and a gradual training procedure to obtain good generalisation. Drawing inspiration from the theory of complementary learning systems, we propose Memory-based Parameter Adaptation (MbPA), a method for augmenting neural networks with an episodic memory to allow for rapid acquisition of new knowledge while preserving the high performance and good generalisation of standard deep models. MbPA, stores examples in memory and then uses a context-based lookup to directly modify the weights of a neural network. It alleviates several shortcomings of neural networks, such as catastrophic forgetting, fast, stable acquisition of new knowledge, and fast learning during evaluation.

Read the paper
Check out the poster at East Meeting level; 1,2,3 from 1630 to 1830

Source: https://deepmind.com/blog/article/deepmind-papers-iclr-2018

Generative Data Intelligence

Maximum a posteriori policy optimisation

Hierarchical representations for efficient architecture search

Learning an embedding space for transferable robot skills

Learning awareness models

Kronecker-factored curvature approximations for recurrent neural networks

Distributed distributional deterministic policy gradients

The Kanerva Machine: A generative distributed memory

Memory-based parameter adaptation

El Salvador Increases Bitcoin Reserves Amid Price Surge – Discover The Country’s Holdings | Bitcoinist.com – CryptoInfoNet

Cryptocurrency Investor Transforms $1,470 Into $200,000 With A Five-Minute Surge In Solana Trading – CryptoInfoNet

Latest Intelligence

Biden Order to Halt China-Tied Bitcoin Mine Beside Nuke Base Came as U.S. Firm Just Bought it

Electric Vehicle Sales Decreased from 18.8% to 18% U.S. Market Share in 1st Quarter of 2024 – CleanTechnica

Crypto Continues to Get Politicized as Senate Votes to Kill SAB 121 Custody Bill – Unchained

Stash announces new B2B offering called StashWorks

Infinite Mobility’s E-Cargo Bikes Are Solar Powered – CleanTechnica

XRP Whales Banking On Massive Bull Run With Billions Accumulated Despite Ripple vs SEC Showdown