Development of goal-oriented behavior in self-learning robots

Martius, Georg; Michael Herrmann, J

doi:10.1186/1471-2202-12-S1-P149

Volume 12 Supplement 1

Twentieth Annual Computational Neuroscience Meeting: CNS*2011

Poster presentation
Open access
Published: 18 July 2011

Development of goal-oriented behavior in self-learning robots

Georg Martius¹ &
J Michael Herrmann^2,3

BMC Neuroscience volume 12, Article number: P149 (2011) Cite this article

1539 Accesses
Metrics details

The homeokinetic principle [1] describes a mechanism for the self-organisation of behaviour in early development. It implies a self-tuned balance between the sensitivity of motor actions with respect to sensory inputs and the predictability of the perceptual consequences of actions. The principle gives rise to a synaptic plasticity rule for artificial motor neurons, which has been shown to generate coherent and coordinated movements in autonomous robots [2] and which can be interpreted as a model of early behavioural learning. Learning in this sense consists in the construction of a behavioural manifold which must, however, remain modifiable in order to incorporated goal-related information or rewards in the course of further development. Goal-related optimization for shaping rather than replacing on-going exploration is referred to as guided self-organisation and is the subject of the present paper.

We present three strategies for guided self-organization, namely using rewards, teaching signals or assumptions about the symmetries of the desired behaviour. The strategies are analysed for several different robots (a khepera-like robot, spherical robot, snake, and multi-segment chain robot) in a physically realistic simulation [2].

Guidance by reward

An online reward signal can act as a global modulation of the learning speed. The reward signal can now be used to bias the exploratory behaviour. In a spherical robot fast locomotion by rewarding high forward velocity is achieved as well as curved driving and spinning when rewarding high rotational velocity. In a more challenging example, a snake with 20 degrees of freedom develops various crawling behaviours when rewarded for high velocity.

Guidance by teaching

If target values for motors or sensor are given, a natural gradient approach is found to be optimal also when embedded in a dynamical system. In experiments with a spherical robot, revolving behaviour about different axes is achieved by a perceptual teaching scheme based on a single sensor at a time.

Guidance by cross-motor teaching

Internal teaching signals are generated by exploiting symmetries in the motor patterns of a desired behaviour, which are realised as mutual teaching between the motor units. This self-supervised scheme induced soft constraints that reduce the effective dimension of the dynamics and thus guide the self-organisation process into a sub-space of the control problem. The effectiveness of the method is demonstrated using a multi-segment chain robot which develops locomotion within a very short time. The direction of locomotion can be inverted by changing the mutual teaching scheme.

Constraining the process of behavioural self-organisation by a given or evolutionarily acquired objective leads to an exploration of behaviours within a lower dimensional manifold. This manifold characterises all behaviours that are compatible with the objective and that can be represented by the internal model of the robot. It is also possible that the external goal does not allow for the calculation of a gradient. Here the exploration produces cases that can be compared from the point of view of the robot such that learning becomes possible also under very general conditions. In the context of development the influence of learning by self-organisation and by reward mechanisms may vary. Although in the early stages the pure self-organisation of sensorimotor loops can be expected to follow mainly intrinsic principles, later stages will see a combination of different learning mechanisms as described here in an exemplary case. We find that the maintenance of criticality that is essential in the homeokinetic approach is not abandoned with goal-oriented learning as rather weak effects of the objective are most efficient. It can, furthermore, be predicted that an early exploratory phase which is not subject to directed learning increases the efficiency with which later the objective is met.

References

Der R: Self-organized acquisition of situated behavior. Theory in Biosciences. 2001, 120: 179-187.
Article Google Scholar
Martius G: Ressources on homeokinesis. 2011, [http://robot.informatik.uni-leipzig.de]
Google Scholar

Download references

Acknowledgments

The work was supported by DIP F1.2 and by the BMBF grants 01GQ0432 and 01GQ1005A

Author information

Authors and Affiliations

Max Planck Institute for Mathematics in the Sciences, Inselstr. 22, 04103, Leipzig, Germany
Georg Martius
Bernstein Focus: Neurotechnology, Bunsenstr. 10, 37073, Göttingen, Germany
J Michael Herrmann
University of Edinburgh, IPAB & ILSI, School of Informatics, 10 Crichton St, Edinburgh, EH8 9AB, UK
J Michael Herrmann

Authors

Georg Martius
View author publications
You can also search for this author in PubMed Google Scholar
J Michael Herrmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georg Martius.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Martius, G., Michael Herrmann, J. Development of goal-oriented behavior in self-learning robots. BMC Neurosci 12 (Suppl 1), P149 (2011). https://doi.org/10.1186/1471-2202-12-S1-P149

Download citation

Published: 18 July 2011
DOI: https://doi.org/10.1186/1471-2202-12-S1-P149

Twentieth Annual Computational Neuroscience Meeting: CNS*2011

Development of goal-oriented behavior in self-learning robots

Guidance by reward

Guidance by teaching

Guidance by cross-motor teaching

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Neuroscience

Contact us

Twentieth Annual Computational Neuroscience Meeting: CNS*2011

Development of goal-oriented behavior in self-learning robots

Guidance by reward

Guidance by teaching

Guidance by cross-motor teaching

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Neuroscience

Contact us