Adaptive Discretization for Decision Making in Large Continuous Spaces

Abstract: In this talk, I will present a sequence of two works that explore adaptive discretization for decision making in continuous state and action spaces. In the first work, I will present a Q-learning policy with adaptive data-driven discretization for episodic RL on continuous state-action spaces. We recover the regret guarantees of prior algorithms for continuous state-action spaces, and experiments demonstrate how our algorithm automatically adapts to the underlying structure of the problem, resulting in better performance compared both to heuristics and Q-learning with uniform discretization. In the second work, we consider the question of learning the metric amongst actions when it is unknown, for a nonparametric contextual multi-arm bandit problem. Suppose that there is a large set of actions, yet there is a simple but unknown structure amongst the actions, e.g. finite types or smooth with respect to an unknown metric space. We present an algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We will discuss regret bounds and show simulations that illustrate the potential gains for learning data driven similarities.