Zephyrnet Logo

Reinforcement Learning with Feedback Graphs. (arXiv:2005.03789v1 [cs.LG])

Date:

[Submitted on 7 May 2020]

Download PDF

Abstract: We study episodic reinforcement learning in Markov decision processes when
the agent receives additional feedback per step in the form of several
transition observations. Such additional observations are available in a range
of tasks through extended sensors or prior knowledge about the environment
(e.g., when certain actions yield similar outcome). We formalize this setting
using a feedback graph over state-action pairs and show that model-based
algorithms can leverage the additional feedback for more sample-efficient
learning. We give a regret bound that, ignoring logarithmic factors and
lower-order terms, depends only on the size of the maximum acyclic subgraph of
the feedback graph, in contrast with a polynomial dependency on the number of
states and actions in the absence of a feedback graph. Finally, we highlight
challenges when leveraging a small dominating set of the feedback graph as
compared to the bandit setting and propose a new algorithm that can use
knowledge of such a dominating set for more sample-efficient learning of a
near-optimal policy.

Submission history

From: Christoph Dann [view email]
[v1]
Thu, 7 May 2020 22:35:37 UTC (388 KB)

Source: http://arxiv.org/abs/2005.03789

spot_img

Latest Intelligence

spot_img