Learning course adjustments during arm movements with reversed sensitivity derivatives
© Abdelghani and Tweed; licensee BioMed Central Ltd. 2010
Received: 3 May 2010
Accepted: 26 November 2010
Published: 26 November 2010
To learn, a motor system needs to know its sensitivity derivatives, which quantify how its neural commands affect motor error. But are these derivatives themselves learned, or are they known solely innately? Here we test a recent theory that the brain's estimates of sensitivity derivatives are revisable based on sensory feedback. In its simplest form, the theory says that each control system has a single, adjustable estimate of its sensitivity derivatives which affects all aspects of its task, e.g. if you learn to reach to mirror-reversed targets then your revised estimate should reverse not only your initial aiming but also your online course adjustments when the target jumps in mid-movement.
Human subjects bent a joystick to move a cursor to a target on a computer screen, but the cursor's motion was reversed relative to the joystick's. The target jumped once during each movement. Subjects had up to 4000 trials to practice aiming and responding to target jumps.
All subjects learned to reverse both initial aiming and course adjustments.
Our study confirms that sensitivity derivatives can be relearned. It is consistent with the idea of a single, all-purpose estimate of those derivatives; and it suggests that the estimate is a function of context, as one would expect given that the true sensitivity derivatives may vary with the state of the controlled system, the target, and the motor commands.
To learn effectively, a motor system needs to know how its error e(for instance, the vector from target to hand in a reach) depends on the vector of neural commands u sent to the muscles (e.g. the signals to biceps, triceps, brachioradialis etc.). Mathematically, what the system needs is the matrix ∂e/∂u, called the control jacobian  or the matrix of sensitivity derivatives. But is ∂e/∂u itself learned, or is it known innately? Here we test a recent theory which holds that the brain's estimates of sensitivity derivatives are not solely innate but are deduced from sensory feedback .
Abdelghani et al.  pointed out the importance, for this question, of the signs of the elements of ∂e/∂u: if your brain knew ∂e/∂u innately, then over time its innate estimates would of course become inaccurate (owing to your growth, aging, injuries, and healing), but so long as the signs of the estimates were correct, you could usually maintain good performance. But if the signs of ∂e/∂u reversed (e.g. if you put on reversing goggles and tried to reach for things) then your innate estimates would make you "learn" the wrong way, strengthening those components of u that should be weakened and vice versa. Given this kind of reversal, recovery is possible only for systems that can revise their estimates of ∂e/∂u. In  we argued that neural controllers can learn this kind of task, and we proposed a mechanism, called implicit supervision, by which the brain might deduce ∂e/∂u.
There is empirical support for implicit supervision. It is the only theory that explains how neural controllers can deal with sign changes in ∂e/∂u, as happen with reversed vision or nerve transposition [4–14]. And it explains why it is harder to adapt to reversals than to other changes [3, 15, 16].
The next question is whether adjustable estimates of sensitivity derivatives govern all aspects of a task. For instance, when you learn to move to mirror-reversed targets, does your adapted estimate of ∂e/∂u reverse both your initial aiming and your online course adjustments: when the target jumps in mid-movement, is your path adjustment appropriately reversed?
Data relevant to this issue have come from a novel experiment by Gritsenko and Kalaska . They trained people to reach to stationary (i.e. non-jumping) right-left-reversed targets. After the training was complete, they tested the subjects' responses when the mirror-reversed target jumped suddenly in mid-reach, and they found that in many cases the subjects' earliest course adjustments were not appropriately reversed, as they should have been if ∂e/∂u had been learned. What are the implications of this fact for the theory of implicit supervision? Does it mean that the reach controller in the brain has multiple estimates of ∂e/∂u-- one perhaps concerned with launching a reach toward its target, and a different one concerned with course adjustments? Is this latter estimate incapable of adapting, or might it adapt given a different training regimen -- the point of Gritsenko and Kalaska's study was to train on stationary targets and then test generalization to jumps, but what if subjects were trained on jumping targets? And finally, might Gritsenko and Kalaska's findings be compatible after all with a single, all-purpose estimate of ∂e/∂u rather than separate ones for launch and adjustment?
Here we put subjects through many trials with jumping targets, and show that they can learn to reverse their rapid, online course adjustments; i.e. we show that these adjustments are governed by an adjustable estimate of ∂e/∂u. And we argue that all the available data are compatible with a single, adaptable, all-purpose estimate.
This study complies with the Helsinki Declaration and was approved by the Ethics Review Office of the University of Toronto, reference number 16210. All subjects gave their informed consent.
Subjects bent a joystick to move a cursor toward a jumping target on a computer screen. They sat facing the screen at a distance of 80 cm, and used their dominant arm to manipulate an Impulse Stick -- a USB force feedback joystick made by Immersion Inc. (San Jose, CA, USA) -- through its full range of ±40°, or about ±6 cm. The joystick was placed to the subject's right or left side, its x-axis parallel with the screen.
During each trial the target jumped once. Jump time was determined randomly, though based on cursor motion to help ensure that it occurred during the arm movement: the target jumped when ||e||, the size of the error vector e from target to cursor, first fell below a threshold value of random(0.25, 0.75)||e0||, where e0 was the initial error vector when the target appeared at the start of the trial, and random(0.25, 0.75) was a number chosen at the start of each trial from a uniform distribution between 0.25 and 0.75. The size of the target jump was 60% as large as the error at jump time, i.e. 0.6||e||, except when a jump of that magnitude would have carried the target outside the movement range, in which case the x- and/or y-components of the jump were truncated to stay in range. The jump was orthogonal to the vector from target to cursor at the time of the jump (Figure 1b), except, again, when x- or y-components were truncated to stay in range. The direction of the jump, along this orthogonal line, was random: half the time in one direction, the rest in the other.
After the jump, subjects adjusted their motion to try to reach the target. If they managed to get the center of the cursor within 0.3 cm of the center of the target and hold it there for at least 100 ms within 2 s of the target's initial appearance then they were rewarded with a beep and a flash, i.e. the target changed momentarily from a pair of concentric blue circles to a filled-in red disk (Figure 1c). If they scored a beep, the next trial began immediately. If not, the next trial began 2 s after the initial appearance of the target. The initial cursor location, at the start of each new trial, was simply wherever the cursor happened to be at the end of the previous trial. Subjects saw the cursor and the target at all times throughout each trial, so they got plenty of feedback about their performance.
Subjects performed multiple blocks of 50 reaches. They had the option to rest as long as they liked between blocks. In control blocks, pushing the joystick forward moved the cursor up, and pushing right moved it rightward (Figure 1d). In test blocks, the relation between joystick and cursor was altered, in different ways in the two experiments described below, changing ∂e/∂u. On Day 1 subjects performed 20 blocks of 50 control reaches each, for 1000 reaches in all. On each of three or four subsequent days they did 20 blocks of reversed reaches, for a total of 3000 or 4000 reversed reaches. Finally, they did another 20 blocks of control trials. Through all these trials we sampled joystick position at 10-ms intervals.
Experiment 1. Course adjustment with reversed sensitivity derivatives
In test blocks, both dimensions of cursor motion were reversed from control, flipping the signs of all components of ∂e/∂u(Figure 1e). Five subjects took part -- one female, four males, all healthy, aged 21-48. Three of them knew the experiment involved a reversed relation between joystick and cursor. One of these three had experience with joystick experiments, and one with joystick computer games. All our single-person data plots (Figure 2, 3, and 4) are of subjects who were unfamiliar both with joysticks and with the idea of motor adaptation to reversals, but the key findings were the same for all subjects, as shown in Figure 5.
Experiment 2. Reversal and rotation
Here the relation between joystick and cursor was more complex: reflected vertically through the midline and rotated 30 degrees counterclockwise (Figure 1f). Five subjects took part -- one female, four males, all healthy, aged 21-48. None of them knew the joystick-cursor relation beforehand. All found it bewildering, and none was able to state it afterwards based on their experience. Four of the subjects were veterans of Experiment 1, and therefore had more joystick experience in this second part, but that fact is irrelevant here because our hypothesis and analysis involved no comparisons of the two experiments. The single-person data plot (Figure 6) is of the new subject, without joystick experience, but the key results were the same for all, as shown in Figure 7.
Experiment 1: Course adjustments with reversed sensitivity derivatives
In early reversed trials, both launch and course adjustment go the wrong way, as shown in Figure 2d. The errors are revealed also in the velocity trace for the movement, Figure 2e: before the target jump, cursor velocity is not consistently positive, i.e. not in the direction of the target; after the jump, cursor velocity is mostly negative, i.e. opposite the jump. These plots also show that, even in early reversed trials, subjects don't move relentlessly in the wrong direction, but rather their trajectories are confused, with a tendency to go the wrong way and then try to correct. But after the subject has performed 3000 trials under reversed conditions, launch and adjustment are both appropriate, as shown in Figure 2g and 2h. This is the key result of our study: movement traces like these show that the subject, after training, could make online course adjustments with no wrong-way response, as predicted by the theory of implicit supervision.
To show that this behavior was consistent, we averaged cursor velocities over many trials. Figure 2f and 2i show mean velocity and its standard deviation for one subject, with control data superimposed on the data for trials with reversed sensitivity derivatives. In control traces (Figure 2c), cursor velocity is appropriately positive in both the launch and adjustment stages of the movement. In early reversed trials (Figure 2f), velocities are not consistently positive during launch or adjustment. In late reversed trials (Figure 2i), velocities are again appropriate, and resemble controls as regards direction, size, timing, and variance. This same result was seen in all five subjects.
Further, in all four plots there was no significant difference, at the same p level, between control and late reversed trials, i.e. subjects returned to something like control performance. This finding is interesting but peripheral to our purposes, because the theory of implicit supervision doesn't imply anything about whether post-adaptation performance will be identical with controls. The point of Figure 5 is that all subjects improved, driving down their adjustment errors without appreciably slowing their responses.
Three subjects knew the experiment involved a reversed relation between joystick and cursor. The other two subjects never recognized the relation, i.e. they couldn't state it in words when questioned after the experiment was over. None of the five subjects felt, introspectively, that it helped to try to work out the relation of cursor to joystick, or to imagine the target in some reversed location on the computer screen, or to reverse their hand motion deliberately. What worked was simply to chase the target with the cursor, giving no thought to hand motion, improving gradually and automatically.
Experiment 2: Reversal and rotation
The theory of implicit supervision holds that the brain's estimates of sensitivity derivatives, ∂e/∂u, can be revised based on sensory feedback [4, 5, 13]. This theory explains how neural controllers can deal with sign changes in ∂e/∂u. For instance, humans and monkeys can learn to handle objects and navigate while wearing reversing prisms [3, 15, 16]. People can learn to mirror-draw, and dentists can drill teeth seen in a mirror. When antagonist muscles or nerves are transposed, animals can sometimes regain their coordination [6–11, 19]. And facial-palsy patients treated by hypoglossal nerve transposition learn to control face and tongue independently [12, 14, 20]. The theory also explains why it is harder to adapt to reversals than to other changes: displacing, magnifying, and minifying goggles don't flip the signs of ∂e/∂u, so we can adapt to them without revising our estimates of the sensitivity derivatives; reversing prisms, on the other hand, do flip the signs, so we can't adapt without re-estimating ∂e/∂u.
Here we have confirmed another prediction of the theory: in arm-movement tasks with reversed and rotated sensitivity derivatives, our subjects learned to make appropriate course adjustments when the target jumped. After training, individual movements often showed no wrong-way response (naturally some movements did show mistakes, during launch or adjustment, but similar mistakes were seen also in control trials). Averaged velocity traces in the reversed task after training resembled control traces as regards direction, size, timing, and variance. For all subjects, wrong-way responses shrank (as quantified by adjustment errors), to near control levels. So the neural estimate of ∂e/∂u that is used for course adjustments is clearly revisable.
How do our results fit with those of Gritsenko and Kalaska ? For our purpose -- testing implicit supervision -- what is important about that study is the discrepancy between launch and adjustment: some of the subjects who learned to launch toward the target still made course adjustments in the wrong direction. That finding raised a question for our theory: if the subjects improved their launches by re-estimating their sensitivity derivatives then why didn't the revised estimate correct their adjustments?
One possible explanation is that there are two (or more) separate estimates of ∂e/∂u for different aspects of a task, e.g. for launch and adjustment. In this view, a subject might reverse their launch-related estimates of ∂e/∂u but not their adjustment-related estimates, maybe because the latter change more slowly, or because the training included no practice adjusting to target jumps. (A different issue is whether launch and adjustment involve separate controllers, e.g. one using feedback and the other not. This is a separate question because even entirely disjoint launchers and adjusters, whether feedback-guided or not, could still be governed by a single estimate of ∂e/∂u. Our concern here is with ∂e/∂u, not with other possible contrasts between launching and adjusting.)
But a simpler explanation is that there is one all-purpose estimate of ∂e/∂u, and Gritsenko and Kalaska's subjects revised it over only part of its domain. The key point is that ∂e/∂u is not a constant matrix but varies over a domain D. For instance we might have D = X × X* × U, where X is the state space of the plant (e.g. the space of all possible combinations of arm joint angles and velocities), X* is the space of target states, U is the space of motor commands, and × is the Cartesian product. When a target jumps during an arm movement, it suddenly transports the subject to a new region of D. (In Gritsenko and Kalaska's experiment, subjects may not have been transported very far through D, as the target jumped only 10°, measured from the starting point of the movement. But by the time they reacted, their angular errors would be larger than 10°. And even if the new region of D were close to the old, the appropriate motor command might be quite different there, as the subjects would need lateral acceleration in situations where the target had jumped.) Gritsenko and Kalaska's study was designed to train people with no jumps and then test their generalization to jumping targets, so their subjects had little experience with the post-jump regions of D, and therefore, we suggest, didn't completely revise their estimates of ∂e/∂u there; some learning may have generalized from nearby regions, but not enough to abolish their inappropriate, unreversed responses. Our study was designed to give subjects plenty of experience with jumps during their training, and so they learned ∂e/∂u over the relevant parts of D.
This idea doesn't imply that there are "boundaries" within D, or that different regions of it are linked with different learning mechanisms or controllers. The point is simply that a learner trained in one domain usually does poorly in others, e.g. a neural network trained to approximate the function x2 over the domain [0, 0.1] does poorly when tested over a different region, say [0.1, 0.2]. And the failure is worse, the more the target function differs between the two regions. Similarly, implicit supervision trained exclusively on one subset of D -- the subset inhabited by reaches to fixed targets -- yields poor estimates of ∂e/∂u elsewhere.
The four types of learning curves in Figure 4 -- launch error, adjustment error, launch latency, and adjustment latency -- decline with roughly similar time courses. Unfortunately their shapes offer no clues as to how many estimates of ∂e/∂u are being adapted. The similarity between the four curves need not imply a single estimate of ∂e/∂u underlying them all; it is also compatible with multiple estimates of ∂e/∂u if those estimates learn in similar ways. And conversely, even markedly dissimilar curves would be compatible with a single estimate of ∂e/∂u because the four curves reflect different aspects of the task, occurring in different regions of the domain D. They are expected to differ, even if they all depend on the same estimate of ∂e/∂u. In simulations, the correlations and other similarities between these curves vary enormously depending on assumptions about learning algorithms, neural coding, and noise throughout the control system, i.e. both single and multiple estimates are compatible with a wide variety of curves.
In both our experiments, subjects' responses were often delayed, e.g. in Figure 5 and 7, LL and AL were always greater in early reversed trails than in control trials, and often stayed greater for thousands of trials, though eventually they improved to roughly control values. Evidently subjects slowed some aspects of their movements in unfamiliar conditions, maybe to permit more voluntary control.
Voluntary reversals have been studied by Day and Lyon . Their subjects reached straight ahead for a target which jumped right or left in mid-reach. The subjects were told to react to the jump by moving in the opposite direction, but even after several hundred trials, their first reaction was still in the jump direction, followed by a reversed response. What does this mean for implicit supervision? There are many possibilities, e.g. 1) Day and Lyon's results may have nothing to do with changes in ∂e/∂u. Their study involved no sensory reversal, so there was no change in the relation between any sensory error signal e and motor commands; rather there was a verbal instruction to reverse. Subjects may simply have tried to aim for an imaginary target opposite the real one, in keeping with their instructions. 2) Subjects may have created a new, mental error signal e' equal to -1 times the visual error e, and then learned ∂e'/∂u. They may have had two separate representations of ∂e'/∂u for early and late responses to jumps. Or their early and late responses may have been guided by ∂e/∂u and ∂e'/∂u respectively. 3) Subjects may have had one representation of ∂e'/∂u for reflexive control generally and another for higher-level control, i.e. separate representations for different levels of control rather than for different stages of a movement.
There may be hints of multilevel control in our results as well, e.g. in Figure 1d an early reversed trial, the subject launches in an inappropriate direction but then later, something makes them reverse course with a tight U-turn (though the new direction is also inappropriate). If there is a high-level controller that steps in here, it may have a separate estimate of ∂e/∂u, better than the reflexive controller's, but this scheme would be inefficient: learning ∂e/∂u is computationally costly, so there are good reasons to do it just once. Another possibility is that the high-level controller has no good estimate of ∂e/∂u, but adopts some simple, exploratory strategy, e.g. it thinks "my estimate of ∂e/∂u is clearly inaccurate, and my most recent action was counterproductive, so I'll try undoing it or doing something else different". Or maybe high-level controllers can rapidly estimate the current value of ∂e/∂u, i.e. they don't learn the function ∂e/∂u but just estimate its value at the current spot in its domain D, which is easier. This approach would bring advantages if used to supplement (not replace) learning the function ∂e/∂u-- see Fortney and Tweed .
Where in the brain might ∂e/∂u be represented? One possibility is the cerebellum, which is involved in sensorimotor learning and internal models . These models are neural circuits that mimic aspects of the system to be controlled, such as the mechanical properties of an eyeball or limb, and especially the relation between neural commands and motor performance. In particular, so-called forward models mimic the response of the controlled system to neural commands . Therefore an estimate of ∂e/∂u is a kind of forward model, representing the relation between performance error e and command u.
We have shown that people can learn to reverse their online course adjustments, implying that these adjustments are based on revisable estimates of sensitivity derivatives, as predicted by the theory of implicit supervision. And we have argued that the available data are consistent with the simplest version of the theory, that a single, contextual estimate of ∂e/∂u guides motor learning for all stages of a task, including launch and adjustment.
For their comments we thank D. Broussard, L. Chinta, K. Fortney, T. Lillicrap, and W. MacKay. This work was funded by the Canadian Institutes of Health Research and the Natural Sciences and Engineering Research Council.
- Callier F, Desoer C: Linear System Theory. 1991, New York, USA: SpringerView ArticleGoogle Scholar
- Astrom KJ, Wittenmark B: Adaptive Control. 1995, Reading, MA, USA: Addison-WesleyGoogle Scholar
- Abdelghani MN, Lillicrap TP, Tweed DB: Sensitivity derivatives for flexible sensorimotor learning. Neural Computation. 2008, 20 (8): 2085-2111. 10.1162/neco.2008.04-07-507.View ArticlePubMedGoogle Scholar
- Stratton GM: Vision without inversion of the retinal image. Psychol Rev. 1897, 4: 341-360. 10.1037/h0075482. 463-481View ArticleGoogle Scholar
- Ewert P: A study of the effect of inverted retinal stimulation upon spatially coordinated behavior. Genetic Psychology Monographs. 1930, 7: 177-363.Google Scholar
- Sperry R: Effect of 180 degree rotation of the retinal field on visuomotor coordination. J Exp Zool. 1943, 92: 263-279. 10.1002/jez.1400920303.View ArticleGoogle Scholar
- Sperry RW: The problem of central nervous reorganization after nerve regeneration and muscle transposition. Quarterly Review of Biology. 1945, 20: 311-369. 10.1086/394990.View ArticlePubMedGoogle Scholar
- Sperry R: Effect of crossing nerves to antagonistic limb muscles in the monkey. Arch Neurol Psychiatr. 1947, 58: 452-473.View ArticleGoogle Scholar
- Leffert R, Meister M: Patterns of neuromuscular activity following tendon transfer in the upper limb: a preliminary study. J Hand Surg. 1976, 1: 181-189.View ArticleGoogle Scholar
- Yumiya H, Larsen K, Asanuma H: Motor readjustment and input-output relationship of motor cortex following crossconnection of forearm muscles in cats. Brain Res. 1979, 177: 566-570. 10.1016/0006-8993(79)90474-8.View ArticlePubMedGoogle Scholar
- Brinkman C, Porter R, Norman J: Plasticity of motor behavior in monkeys with crossed forelimb nerves. Science. 1983, 220: 438-440. 10.1126/science.6836289.View ArticlePubMedGoogle Scholar
- Scaramella L: Cross-face facial nerve anastomosis: historical notes. Ear Nose Throat J. 1996, 75: 343-354.PubMedGoogle Scholar
- Sugita Y: Global plasticity in adult visual cortex following reversal of visual input. Nature. 1996, 380: 523-526. 10.1038/380523a0.View ArticlePubMedGoogle Scholar
- Tate J, Tollefson T: Advances in facial reanimation. Curr Opin Otolaryngol Head Neck Surg. 2006, 14: 242-248. 10.1097/01.moo.0000233594.84175.a0.View ArticlePubMedGoogle Scholar
- von Helmholtz H: Treatise on Physiological Optics. 1962, Phoenix, USA: DoverGoogle Scholar
- Harris CS: Perceptual adaptation to inverted, reversed, and displaced vision. Psychological Review. 1965, 72: 419-444. 10.1037/h0022616.View ArticlePubMedGoogle Scholar
- Gritsenko V, Kalaska J: Rapid online correction is selectively suppressed during movement with a visuomotor transformation. Journal of Neurophysiology. 2010Google Scholar
- Mendenhall W, Wackerly D, Schaefer R: Mathematical Statistics with Applications. 2008, Belmont, CA, USA: Thomson Brooks/Cole, 7Google Scholar
- Missiuro W, Kozlowski S: Investigation on adaptive changes in reciprocal innervation of muscles. Arch Phys Med Rehabil. 1963, 44: 37-41.Google Scholar
- Vera C, Lewin M, Kasa J, Calderon M: Central functional changes after facial-spinal-accessory anastomosis in man and facial-hypoglossal anastomosis in the cat. J Neurosurg. 1975, 43: 181-191. 10.3171/jns.1975.43.2.0181.View ArticlePubMedGoogle Scholar
- Day BL, Lyon IN: Voluntary modification of automatic arm movements evoked by motion of a visual target. Experimental Brain Research. 2000, 130 (2): 159-168. 10.1007/s002219900218.View ArticlePubMedGoogle Scholar
- Fortney KP, Tweed DB: Learning without synaptic change: a mechanism for sensorimotor control. Society for Neuroscience: 2006; Atlanta. 2006Google Scholar
- Kawato M: Internal models for motor control and trajectory planning. Current Opinion in Neurobiology. 1999, 9: 718-727. 10.1016/S0959-4388(99)00028-8.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.