presented at event 2017 Multi-Disciplinary Conference on Reinforcement Learning and Decision Making Conference