Multi-armed bandit problems with history dependent rewards
Data: 11 Aprile 2022 alle 14:30
Luogo: Aula 5016 (Lab LM).
Speaker: Ciara Pike-Burke (Imperial College London)
Persona di riferimento: Nicolò Cesa-Bianchi
The multi-armed bandit problem is a common sequential decision making framework where at each time step a player selects an action and receives some reward from selecting that action. The aim is to select actions to maximize the total reward. Commonly it is assumed that the (expected) reward of each action is unknown, but is constant and does not depend on the actions that the player has previously taken. However, in many practical settings this is not realistic. For example in web-advertising, the benefit from showing an advert is likely to depend on the number of times the user has seen it in the past, and in product recommendation the reward of suggesting an item will depend on the time since it was last suggested to the customer. In this talk we will consider several variants of the multi-armed bandit problem where the reward depends on the history of the players actions. For each problem, we will discuss whether learning is possible, and if so provide algorithms that perform well theoretically.
Ciara Pike-Burke is a lecturer in Statistics at Imperial College London. Her research is in statistical machine learning, specialising in sequential decision making problems. She is interested in decision making under uncertainty and potentially limited feedback, and has worked on multi-armed bandit, reinforcement learning and online learning problems. Prior to joining Imperial, Ciara was a postdoc at Universitat Pompeu Fabra (Barcelona) working with Gabor Lugosi and Gergely Neu. She obtained her PhD at Lancaster University as part of the STOR-i program, supervised by Steffen Grünewälder and Anton Altmann at Sparx.
04 aprile 2022