SLMC //student projects: Controlling Exploration//

SLMC homeSLMC researchSLMC peopleSLMC projectsSLMC newsSLMC coursesSLMC hiringSLMC contactIPABSchool of InformaticsUniversity of Edinburgh
Project Leader
Philipp Robbel
MSc, School of Informatics, University of Edinburgh
Project Supervisor
Sethu Vijayakumar, PhD
IPAB, School of Informatics, University of Edinburgh

Project Title
Exploring the exploration problem


Project Goal

The control of a manipulator (such as a robot arm) in a non-deterministic environment has to take factors such as the device's dynamics, inertia, and time lags in the control signal flow into account. To learn the sensorimotor laws that will move the device on a desired trajectory, data has to be collected from exploratory movements in the environment. These exploratory movements need to be generated without yet knowing the precise sensorimotor laws and should reasonably cover the state space of the device.

This project addresses the problem of how these exploratory movements can be generated. Going beyond simple exploration mechanisms such as random drifting or oscillating motor signals, it is the goal of this project to find theoretically grounded techniques that explore the state space in a way that is optimal from the learning point of view, i.e. the data collected in this ways leads to most efficient learning.

Firstly, approaches to closely related problems in other areas will be explored. Among these are techniques for optimal data selection in statistical learning (active learning), model-based reinforcement learning approaches with standard behaviour planning techniques (dynamic programming), as well as guided and imitation learning for motor control. Secondly, the question could also be addressed analytically in a simple dynamic environment, e.g. a 2D variable with linear dynamics, assuming noisy data.

References:
  • Cohn, D. A.; Ghahramani, Z.; and Jordan, M. I. 1995. Active learning with statistical models. In Tesauro, G.; Touretzky, D.; and Alspector, J., eds., Advances in Neural Information Processing, volume 7. Morgan Kaufmann.
  • MacKay, D. J. C. 1992b. Information-based objective functions for active data selection. Neural Computation 4.
  • Sebastian B. Thrun and Knut Moller. On planning and exploration in non-discrete environments. Technical Report 528, GMD, Sankt Augustin, FRG, 1991.
  • J. Storck and S. Hochreiter and J. Schmidhuber. Reinforcement driven information acquisition in non-deterministic environments. In Proc. International Conference on Artificial Neural Networks 95, 1995.
This description is based on a MSc project proposal by Marc Toussaint (6 October 2004).

Project Timeline

Time Frame Task (completed or scheduled)
Oct 2004 - Dec 2004 Background reading and exploration of the relevant fields. Construction of a literature review.
3 Dec 2004 Submission of the research report.
4 Feb 2005 Submission of a full research proposal.
Feb 2005 - Aug 2005 Pursuing of the project and writing the MSc dissertation.

This preliminary timeline will be updated with more detail as the research report and proposal are completed.

Project Results & Conclusions

Thesis