Visuo-auditory interactions in the primary visual cortex of the behaving monkey: Electrophysiological evidence
© Wang et al. 2008
Received: 10 January 2008
Accepted: 12 August 2008
Published: 12 August 2008
Skip to main content
© Wang et al. 2008
Received: 10 January 2008
Accepted: 12 August 2008
Published: 12 August 2008
Visual, tactile and auditory information is processed from the periphery to the cortical level through separate channels that target primary sensory cortices, from which it is further distributed to functionally specialized areas. Multisensory integration is classically assigned to higher hierarchical cortical areas, but there is growing electrophysiological evidence in man and monkey of multimodal interactions in areas thought to be unimodal, interactions that can occur at very short latencies. Such fast timing of multisensory interactions rules out the possibility of an origin in the polymodal areas mediated through back projections, but is rather in favor of heteromodal connections such as the direct projections observed in the monkey, from auditory areas (including the primary auditory cortex AI) directly to the primary visual cortex V1. Based on the existence of such AI to V1 projections, we looked for modulation of neuronal visual responses in V1 by an auditory stimulus in the awake behaving monkey.
Behavioral or electrophysiological data were obtained from two behaving monkeys. One monkey was trained to maintain a passive central fixation while a peripheral visual (V) or visuo-auditory (AV) stimulus was presented. From a population of 45 V1 neurons, there was no difference in the mean latencies or strength of visual responses when comparing V and AV conditions. In a second active task, the monkey was required to orient his gaze toward the visual or visuo-auditory stimulus. From a population of 49 cells recorded during this saccadic task, we observed a significant reduction in response latencies in the visuo-auditory condition compared to the visual condition (mean 61.0 vs. 64.5 ms) only when the visual stimulus was at midlevel contrast. No effect was observed at high contrast.
Our data show that single neurons from a primary sensory cortex such as V1 can integrate sensory information of a different modality, a result that argues against a strict hierarchical model of multisensory integration. Multisensory interaction in V1 is, in our experiment, expressed by a significant reduction in visual response latencies specifically in suboptimal conditions and depending on the task demand. This suggests that neuronal mechanisms of multisensory integration are specific and adapted to the perceptual features of behavior.
The classical view of multisensory integration, based on anatomical grounds , proposes that each sensory modality is processed through separate channels from the sensory receptors to the primary sensory areas and then further integrated into associative unimodal areas converging at the level of cognitive polymodal areas . Indeed, in primates, neuronal responses to more than one sensory modality have been described in areas higher-up in the hierarchy like the frontal, temporal and parietal lobes [3–9]. While these polysensory areas are the best candidates to support sensory fusion, recent studies in humans have surprisingly revealed that multisensory interactions can take place in early stages of sensory processing, in regions thought to be involved in only one modality [10, 11]. This result has led to a reappraisal of the cortical regions involved in multisensory integration . In humans for example, imaging [13–16] and EEG studies [17, 18] have clearly shown heteromodal responses in sensory areas even at the level of the primary sensory fields. Furthermore, the discovery of heteromodal connections directly linking areas involved in different sensory modalities could be the anatomical support of such interactions [19–21]. For example, in the monkey, the core of the auditory cortex receives direct inputs from both somatosensory and visual areas . It can be inferred that these cortical heteromodal connections, as well as the thalamo-cortical loop [22, 23], could be the anatomical pathway responsible for the visual [24–26], somatosensory [27, 28] or proprioceptive  influences observed in the monkey auditory cortex .
In the normal adult cat, some early electrophysiological studies have reported auditory responses in visual areas [31–33], a result which is still controversial . Multisensory integration in the primary visual cortex (V1) of the monkey has not been established, apart from a clear influence of a non visual eye position signal on visual activity [35, 36]. However, auditory or visuo-auditory responses in area V1 are highly probable since we have demonstrated direct projections from the auditory cortex (including A1) and the polymodal area STP to area V1 in the calcarine sulcus . Furthermore, the auditory system is activated more precociously that the visual one, and for example the latencies of auditory responses recorded in areas AI and STP are about 35 and 45 ms respectively [37, 38]. Consequently it conceivable that an auditory stimulus can modulate the visual responses in V1 where the mean onset latencies are longer, between 50–70 ms [39, 40]. As some authors have reported even shorter latencies in V1 when using high contrast stimuli , one could expect that an auditory stimulus would affect mostly late visual responses such as the one obtained using non-optimal stimuli (ie. low visual contrast).
We thus conducted an electrophysiological study to look for visuo-auditory interactions at the single cell level in primary visual cortex. Because the auditory projections to V1 are more dense at the representation of the peripheral visual field, a region of space that encompasses most of the auditory receptive field in AI [42, 43], our electrophysiological recordings targetted visual cells with RF located between 10 and 20° of retinal eccentricity.
The present study is based on data obtained from two monkeys (Macaca mulatta) trained to performed a visual or visuo-auditory oculomotor task. A detailed description of the general methods used in the electrophysiological recording has been reported in a previous study . All experimental protocols, including care, surgery, and training of animals, were performed according to the Public Health Service policy on the use of laboratory animals and complied with guidelines of the European Ethics Committee on Use and Care of Animals.
The core of the present study concerns two monkeys (Mk1 and Mk2) trained to perform a visually guided saccadic task during which the visual target could be accompanied by an auditory stimulus (V/VA active task). A trial was initiated by the appearance of a fixation point (FP) located at the center of the video screen and of a size of 0.2 degree. The monkey had to direct its gaze and to maintain fixation at this central point. The duration of presentation of the FP was randomized between trials and lasted between 1500 to 1800 ms. Simultaneously, with the extinction of the FP, a peripheral visual target was flashed for 50 ms. The monkey was required to perform a saccade in the direction to the locus of the visual target within 250 ms of its appearance. Responses were considered as correct when the saccades were performed within a window of 4 × 4 degrees centered on the visual target, and in these cases a few drops of fruit juice were delivered to the monkeys as a reward. In half of the trials, presented randomly, a 25 ms sound (a white noise) was delivered from a speaker located at the same eccentricity on the azimuth as the visual stimulus. In such visuo-auditory trials (VA), the visual and the auditory stimuli were presented at the exactly same time. In both conditions (V and VA) the monkey was required to perform a saccade directed toward the visual target and consequently, the auditory stimulus had no behavioral meaning for the animal. Note that we did not train the monkeys to perform a saccade toward the auditory stimulus alone.
The first monkey engaged in the present study (Mk1) was first trained to perform two control tasks before the V/VA active task. In a first stage, the monkey was trained to perform a simple passive fixation task (V/VA passive task). Following the presentation of the FP (of variable duration from 1500 to 1800 ms), a visual or visuo-auditory stimulus was presented for 500 ms together with the FP. To get rewarded, the monkey had to maintain its fixation until the FP was extinguished.
Further, Mk1 was trained in a visual control task (V-only control task), during which the color of the FP informed the animal whether he had to maintain a central fixation (blue FP) or to make a saccade toward a visual peripheral stimulus (Red FP). In this case the visual stimulus was never accompanied by an auditory stimulus. The timing of stimulus presentation was identical to that described for the active task (50 ms).
The monkey Mk1 was engaged successively in each of these different protocols for several months, a period during which electrophysiological recordings were performed in the primary visual cortex (see below). Mk2 was trained from the beginning to do the VA active task.
Protocols control and data acquisition were executed under REX system. To avoid a jittering when generating the auditory stimulus through Windows system, all stimuli were pre-generated and stored in memory. Then we added a buffer silent time before each auditory stimulus. Triggers corresponding to the beginning of auditory stimuli (from the buffer) and the first visual frame (through VRG) were sent to REX system. The audio buffer length was adjusted to synchronize the visual and auditory stimuli at expected delay.
Aseptic surgery was performed to attach a head-post to the skull and to implant a scleral search coil in both eyes. Single-unit recordings were made in one of the two monkeys (Mk1). Once the monkey had reached a high level of performance, a second surgery was performed to implant a recording chamber above the peripheral visual field representation in V1 located in the calcarine sulcus . The skull was removed within the chamber, and a fixed grid was placed, so that the electrode penetrations were spaced 1 mm apart. Guide tubes were used to help to penetrate the dura. Sterile, tungsten-in-glass electrodes of ~1 MΩ impedance were inserted with a hydraulic microdrive fixed to the recording chamber, perpendicular to the cortical surface. Extracellular recordings were carried out in both hemispheres of the monkey from which the visual responses were previously analyzed for disparity selectivity (see ). Action potential waveforms were sorted online with the help of a spike sorting software (AlphaOmega MSD®) and only single units recorded through complete trials were selected for analysis.
The behavioral analysis was derived from the performance of the two monkeys trained to perform the visually and visuo-auditory guided saccadic task (V/VA active task). For each trial we determined the saccade latency defined as the first point when the eye position was significantly different from the average eye position signal during the 300 ms prior to stimulus offset. This corresponded to the time at which the difference between the current position and the mean was 2 times greater than the maximum range observed during the fixation period. Then we performed a statistical analysis (Multifactor Anova) to compare the saccade latencies obtained during the V and VA conditions. We used a multi-way ANOVA test to the saccade latency obtained in each monkey. Contrast (3 groups in MK1, 2 groups in MK2), eccentricity (2 groups), and V or VA stimulation (2 groups) were treated as different factors. We checked both single factor and two-factors interactions. In case of p value too low to be computed, as we know it show very high significance, we also indicate F value as references. Post-hoc test was then applied to compare the saccade latencies between individual pair of conditions.
For each neuron in each condition, the neuronal activity was recorded for 20–40 correct trials. Two parameters were studied to analyze the effect of visuo-auditory interactions in V1 cells : the amplitude and the latency of the visual responses. To measure the visual response latency, we first computed histograms of neuronal activity aligned on the stimulus onset. As previously described [48, 49], we further smoothed the accumulative line by simulating each spike as a mini gauss function (Amp. = 1; Sigma = 4). So within each 1 ms bin, we got statistical spike numbers of 40 (trial number)*10(gauss summation) ms window. Then we measured the baseline average spike number per bin in the 200 ms prior to stimulus onset, and used it as the Poisson distribution lambda parameter of spontaneous activity. So the threshold of response activity was the smallest number n that the Poisson cumulative density function evaluated which equaled or exceeded 0.99. Thus, if the firing property obeyed the same Poisson distribution as the baseline, the spike number within each bin would not exceed this value at 99% confidence. We calculated this number n by using the Matlab Poisson function. We then used a detection window of 4 ms and measured activity starting from visual stimulus onset. If the minimum value inside this window was greater than n, we determined the response latency as the first point of the window. Because we could only get one latency value for a group of trials, we used bootstrap methods to compare the activity between conditions : shuffle latencies were calculated from the same number of trials in a sample taken randomly from both conditions. This was performed 4000 times to obtain 2000 randomly grouped pairs, from which we calculated the individual difference within pairs. The bootstrap p value is the ratio of pairs for which differences were no less than the values obtained from the experimental data. At the population level, after normalization of the responses, we used an Anova test to analyze the factor effect on response amplitude or latency, and paired t-tests for post hoc comparison.
We analyzed next the effect of bimodal stimulation by comparing the sRT in the V-only and VA conditions (Fig 2). As classically reported and resulting from multisensory integration , we observed a strong reduction in the saccade latencies in the VA condition compared to the V-only situation. This reduction was observed at all eccentricities (Anova, 10°: Mk1 F = 231.17,, Mk2 F = 66.67, both p = 0; 20° Mk1 F = 404.76, Mk2 F = 271.45, both p = 0) and at all contrasts of the visual target (Anova, Mk1, 15% F = 516, 55% F = 329.7, both p = 0; Mk2 30% F = 107.3, p = 0, 55% F = 69.38, p = 3.33E-16, 88% F = 117.45, p = 0). On average, when combining all conditions, the decrease in sRT ranged from 10% (Mk1) to 15% (Mk2) when saccades were made toward the VA stimulus. The rule of inverse effectiveness , proposes that the higher benefits resulting from multisensory integration should be obtained in sensory conditions of low saliency. Thus we searched for an effect of visual contrast on the reduction of sRT during visuo-auditory saccades. In Mk2, for which data were obtained at 3 different contrasts (30–55 and 88%), there was a tendency toward a more pronounced shortening of sRT at low contrast. When saccades were made at 20°, sRT in VA conditions were 19% shorter at a low contrast (187 ms in V-only vs. 152 ms in VA, p = 6.38E-21) while the decrease was only 14% at high contrast (174 ms vs. 150 ms, p = 7.29E-23). However we did not replicate these results in the second monkey or in all conditions. In Mk1, at 20°, the reduction was similar at low (11.1% decrease) and high contrast (11.9% decrease). Thus, we observed a constant decrease in sRT at all the visual contrasts used, data which seem to contradict the rule of inverse effectiveness. However, this could be due to the level of training. When analyzing the data during the first sessions of the behavioral training of Mk1 (not shown), we found a stronger decrease in sRT at low contrast, but at that time the monkey was not performing at an efficient level and his saccade latencies were much longer. This effect disappeared after extensive training over several weeks.
The present study is based on three sets of visual responsive single units (total n = 136) recorded in the primary visual area V1 of one monkey (Mk1). Each set of cells was obtained during a single behavioral condition (V/VA active task n = 49; V/VA passive task n = 45; V-only control task n = 42), all cells were recorded in peripheral V1 and most of them (69%) were located in the upper bank of the Calcarine sulcus (Fig 1) and present a receptive field located over 10° of eccentricity in the lower visual field. The size of the receptive fields were ranging between 1 and 4° (see ) characteristic of those cells recorded in the peripheral representation of V1.
Response rates and latency values (± se) of V1 single units recorded during the V/VA active tasks using three different contrast levels.
Response Rate (spk/s)
Low level (n = 17)
33.9 ± 5.0
35.7 ± 5.1
100.7 ± 6.2
96.8 ± 6.2
Mid-level (n = 39)
52.2 ± 4.0
52.7 ± 4.1
64.5 ± 2.5
61.0 ± 2.3
High level (n = 45)
54.4 ± 3.9
54.1 ± 3.8
49.2 ± 1.8
49.0 ± 2.0
Response rates and latency values (± se) of V1 single units recorded during the V/VA passive tasks and the V-only control task using a middle (55%) contrast value.
Visual control task
Response Rate (spk/s)
29.5 ± 2.6
29.8 ± 2.8
28.7 ± 2.3
65.0 ± 4.7
65.2 ± 4.3
44.0 ± 1.4
43.5 ± 1.5
However, the cell response latency was globally reduced when the auditory stimulus was delivered simultaneously with the visual target (Table 1). At the population level and at each contrast, the cells latency tended to be shorter. This is illustrated in Fig 4 by a leftward shift toward negative values in the distribution of the relative differences (in ms) when comparing the A and VA conditions. This effect reached a statistically significant level only for the 55% middle visual contrast (paired t-test, p = 0.009). In this condition, the mean latency was 64.5 ms in the unimodal visual stimulation against 61.0 ms in the VA condition, corresponding to a global decrease of more than 5%.
At 15% contrast, the VA stimulation lead to similar values of latency compared to the V-only task (96.8 ms vs 100.7 ms respectively), a difference which was not statistically significant (paired t-test, p = 0.43 ns) probably because of a greater variability in the measured latency due to a strong reduction in the cell discharge (see above) when presenting this low contrast visual stimulus.
Finally, at high contrast (88%) the neurons showed very comparable latencies (49.0 in VA vs 49.2 ms in V-only, paired t-test, p = 0.82 ns).
To conclude, we observed that the concomitant presentation of an auditory signal simultaneously with a saccadic visual target induced a reduction of the latency of V1 cells that depended on the contrast of the visual target. Furthermore, the discharge rates of the neurons remained unaffected by bimodal stimulation except during visual conditions that approach the perceptive threshold.
Previous studies in the behaving monkey have shown that neuronal responses in striate and extrastriate cortical areas can be modulated by the behavioral meaning of the stimulus [57, 58]. Consequently, we compared the visual responses of a third set of V1 single units (n = 42) during a dual task, a passive central fixation and an active visually guided saccadic task. As explained in the methods, the type of task was indicated to the animal by the color of the fixation point. In this case (Table 2, Fig 6), we did not observe an effect of the task (passive vs. active) either on the frequency discharge (28.7 and 29.4 spk/s respectively, paired t-test, p = 0.25 ns) or on the visual latency (44.0 and 43.5 ms respectively, paired t-test, p = 0.70 ns). This last part suggests that the visuo-auditory interactions that differentially affected the cells in the previous conditions were probably not due to the oculomotor demands of the task in which the monkey was engaged.
The present results demonstrate that in behaving monkeys visuo-auditory interaction can occur at the single cell level at the first cortical stage of processing of visual information, the primary visual cortex V1. Multisensory interactions in V1 are characterized in our experiment by a modulation of V1 responses corresponding to a reduction of the neuronal onset latency. Moreover this effect was dependent on the perceptual charge of the task in which the animal was engaged.
We show that the simultaneous presentation of a sound during a visually guided saccade, induces a reduction of about 10 to 15% in the saccade latency depending on the animal and on the visual stimulus contrast level. Such behavioral improvement resulting from a bimodal visuo-auditory stimulation has been already reported during similar paradigms of spatially oriented behavior in humans [54, 59–61], monkeys [50, 59], carnivores [62, 63] and even in rodents or birds [64, 65]. Numerous studies have established the beneficial effect of bimodal stimulation  when the experimental sensory conditions respect the rules of spatial and temporal congruencies . In these cases, multisensory integration results in perceptual improvements by reducing ambiguity in various tasks, from simple detections to complex discriminations, memory or learning tasks [67–71]. The decrease in reaction times during a bimodal paradigm has been explained by a co-activation system  that violates the race model of independent sensory channels in which the faster modality initiates the motor response. In our study, we did not train the animals to make a saccade toward an isolated auditory cue, so we cannot conclude on the race model. However, we recently reported evidence that such a converging model can account for a shortening in RT in visuo-auditory detection task in the monkey .
Multisensory integration is supposed to obey the rule of inverse effectiveness that proposes a higher multisensory benefit when the unisensory stimuli are weak [62, 73]. We did not observe such effects and the decrease in sRT was identical when comparing visuo-auditory performances at low or high visual saliencies, a result comparable to that recently reported in a similar behavioral study in the monkey). We cannot rule out the possibility that if we had used a weaker auditory stimulus it would have produced a change in bimodal gains , but it is very likely that this lack of inverse effectiveness is due to the fact that our experiments were performed on highly trained monkeys. It has been shown in monkey, that a continuous training strongly decreases the saccade latency , probably reducing the potential range of facilitation induced by the mechanisms of multisensory integration.
The delimitation of the polymodal areas associated with multisensory integration was until recently, generally circumscribed to cortical areas in the parietal, frontal and inferotemporal regions of the monkey [5, 38, 75–77]. However, electrophysiological and functional imaging studies in humans have recently revealed that visual, somatosensory or auditory areas defined originally as unimodal can be the locus of interactions between other non-specific sensory modalities [13, 16–18, 78–81]. In the monkey, electrophysiological recordings have confirmed that unimodal areas, located at the first stages of the sensory processing hierarchy, can integrate information from a different sensory channel .
Until recently, this heteromodal activity had been observed in primates principally in the auditory system. For example, recordings of neuronal activity in the auditory cortex have revealed visual and somatosensory responses in the associative areas of the belt and parabelt [25, 27, 28, 82, 83]. In the primary auditory cortex, electrophysiological recordings (current source density) suggest that non-auditory events are of a rather modulatory influence and do not drive activation at the spiking level . For example, proprioceptive information (eye position) can induce changes in the strength of the neuronal discharge in response to a spatially defined sound . Furthermore, it has been proposed that the effect of non-auditory stimuli on AI activity is performed through a modulation of cortical oscillations to allow either enhancement or depression, depending on the timing of the bimodal stimulation . Our results are in agreement with this notion of a modulatory effect and we did not find any auditory response in the single units we tested. The lack of pure auditory response in spite of an auditory modulation of the visual latency suggests that in V1, multisensory interaction could be a subthreshold phenomenon as hypothetized for multisensory interactions in the auditory cortex [24, 83]. Because the auditory system is activated faster than the visual one, the auditory stimulus can depolarize the membrane potential of the visual V1 cells, inducing an earlier spiking response compared to the visual-only condition. Such multisensory interaction on cortical sensitivity has been recently suggested by TMS studies in human at both perceptual  and behavioral  levels.
The main visuo-auditory effect we observed, was a shortening of the visual latencies but only in specific behavioral situations. All together these results suggest that in primates, multisensory integration mechanisms differentially affect sensory responses when they occur in primary or secondary sensory areas [11, 24, 87–90].
Most of the neuronal rules of interactions between sensory modalities have been established in the Superior colliculus (SC) which is considered to be the key structure for multisensory integration . In the SC, the convergence of different sensory modalities is reflected mainly by an enhancement in neuronal activity in response to a combined multimodal stimulus when spatial and temporal congruencies are respected [91–94]. A modulation (enhancement or depression) of the strength of the unimodal response by bimodal stimulation has been also reported in higher order polymodal areas of the monkey such as the prefrontal, parietal or inferotemporal areas [75, 77, 95, 96] and even in the primary auditory cortex [24, 84, 97]. However, the proportion of neurons showing enhancement or depression varies strongly across cortical areas. When presenting middle or high contrast visual stimuli, we did not observe such an effect on the response rate in the large sample of visual cells recorded in V1, irrespective of the behavioral paradigm, suggesting that the neuronal mechanisms of multisensory integration are based on rules which are specific to each individual area. However, at low (15%) contrast, the slight increase of the responsiveness of V1 neurons in the AV condition suggests that the rule of inverse effectiveness could apply to V1. We cannot exclude that such effect on the response rate would be more prominent for visual stimuli of even lower perceptive saliency.
In addition, as described in the methods, the visual and auditory stimuli are only spatially congruent in the horizontal azimuth dimension. While the receptive fields of the auditory neurons are large  and cover probably the offset that separate the two stimuli, one can speculate that a better spatial congruency between the auditory and visual stimuli would lead to greater effects on V1 cells during bimodal stimulation.
Finally, a rule common to several cerebral loci of multisensory integration is an effect on the response onset latency [75, 99]. We observed that in the active task, the visual latency was reduced by about 5%, a result very similar to that reported in the SC [50, 99]. This decrease in neuronal response onset, which is in line with a shortening of the visuo-auditory bold response in human V1 assessed by fMRI , could participate in the speeding up of the behavioral saccadic responses during bimodal presentation (see below, ).
As developed in the introduction, previous anatomical studies have established that sensory fusion was processed through the convergence of the different sensory channels at the level of associative cortical areas [1, 2]. The numerous reports of multisensory interactions at low level of sensory processing (present data, [24, 27, 84]) and acting on early sensory responses, favor a modulatory influence through heteromodal connections linking directly unisensory areas [19–21]. However, such modulatory effect could also originate from non-specific thalamic nuclei that integrate different sensory processing . A cortico-thalamic loop that bypass cortico-cortical connections could thus support fast transmission and provide multisensory and sensory-motor information to unimodal areas [22, 101].
In the alert monkey, we have shown that visual neurons in V1 showed a decrease in response onset when the visual stimuli were presented simultaneously with a sound. However, our main result is that this effect on the visual responses is dependent on the behavioral context : we did not see any changes in V1 neuron latency in a passive situation when the monkey did not perform an oriented saccade toward the spatial location where the auditory stimulus was presented. It could be argued that this difference simply reflects a process of visual spatial attention  due to the oculomotor task and not a modulation specifically due to the integration of the auditory stimulus at the neuronal level. In V1 and extra-striate areas, it has been shown that attentional mechanisms [103, 104] or behavioral relevance [57, 105, 106] can affect the characteristics of the neuronal visual response such as the discharge rate, the latency or the neurons selectivity. We did not observe a change in the cell firing rate when comparing neuronal activity in a visual passive and active task without any auditory stimuli. While our comparisons are performed on a different set of neurons, it strongly suggests that the shortening in latency depends specifically on the bimodal conditions in a particular behavioral situation, and not on visual attentional processes linked to the oculomotor demand of the task. However, we are aware that the three paradigms differ in term of attentional loads but in both passive and active AV tasks, the auditory stimulus can involve similar mechanisms of exogenous attention. The distinction between exogenous spatial attention and crossmodal interactions (or integration) is still an open question  as both mechanisms result in an improvement in sensory perception .
Our results are in complete agreement with studies in humans and animals showing different patterns of multisensory integration according to the behavioral context. First in humans, the detection or discrimination of bimodal objects, as well as the perceptual expertise of subjects, differentially affect both the temporal aspects and the cortical areas at which multisensory interactions occur [18, 108]. Similarly the index of multisensory integration computed from the activity of neurons in the deep layers of the Superior Colliculus, is also dependent on the oculomotor behavior of the animal . Finally, while heteromodal visual or somatosensory responses can be obtained in the auditory cortex of a passive or anaesthetized monkey [24, 27, 87], some authors have reported that some visual responses can be related to task in which the animal is engaged .
All together these findings suggests that the neuronal network involved in multisensory integration as well as its expression at the level of the neuronal activity is highly dependent on the perceptual task in which the subject is engaged. Thus multisensory interactions can underly from active perception to attentional mechanisms. This hypothesis is supported by the anatomical pattern of heteromodal connections that directly link areas involved in different modalities. In monkey, such heteromodal connections either link specific sensory representations (retinotopy or somatotopy) of interconnected areas or specific functional regions in each modality [19, 21, 110].
Such an influence of the perceptual context on the neuronal expression of multisensory interaction has further consequences on the phenomena of cross-modal compensation that occurs after sensory deprivation in animals  or humans [112, 113]. In blind subjects , the efficiency of somatosensory stimulation on the activation of the visual cortex, is maximum during an active discrimination task (Braille reading). This suggests that the mechanisms of multisensory interaction at early stages of sensory processing and the cross-modal compensatory mechanisms are probably mediated through common neuronal pathways.
The effect of an auditory stimulus on V1 responses is probably supported through the direct projections that originate in the auditory (A1 and belt) and multimodal (STP) areas and target V1 [20, 21]. As discussed previously , the auditory projections to V1 originate mainly from the dorsal auditory stream, specialized in processing spatial information, and reach the peripheral representation of V1. The characteristics of this heteromodal connectivity suggest that this pathway is probably involved in rapidly orienting the gaze toward a sound source located in the peripheral field for which visual acuity is poor. In situations of spatial and temporal congruency, multisensory integration has been shown to facilitate the neuronal responses of neurons of the superior colliculus [115, 116], both at the sensory and motor levels [50, 59]. Consequently, multisensory integration at the collicular level will allow a direct influence on motor output because the SC is directly involved in the control of oculomotor behavior . A large number of visual areas project directly down to the SC, but in the monkey, the main inputs are originating from the primary visual cortex which constitutes about 20 to 30% of the SC cortical afferents . Consequently the decrease in V1 response latencies during bimodal stimulation can act directly on the response of cells in the SC and speed up the initiation of the saccadic command by the brain stem oculomotor nucleus. However, because the reduction in V1 latencies (5% decrease) does not match the amount of facilitation at the saccadic level (10 to 15% reduction in saccade latency), one should consider other mechanisms outside V1, to transfer the facilitation from the sensory to the motor level.
A remaining question is whether the visuo-auditory interactions reported here at the level of V1 and expressed as a reduction in neuronal latency, represent a real multisensory integration or only a sensory combination . In our protocol, auditory and visual stimuli are not redundant signals as the sound has no meaning to perform the task and thus in this way, we should refer to bimodal interactions in V1. However, at the behavioral level, the observed shortening of saccade latency in the bimodal conditions is a phenomenon generally attributed to multisensory integration processing . It is possible that the reduction in latency, especially because it affects mainly the longer ones, will induce a higher temporal coherence of the visual responses across V1. Such a processing has been suggested to increase the cortical synchronization which in turn enhances the speed and reliability of the visual responses , and thus could participate to the reduction of RT in bimodal conditions.
To conclude, our results provide further evidence of the various roles of monkey area V1 in visual perception. Area V1 receives feedback projections from a large number of cortical areas . V1 is connected with areas located at higher levels of the visual processing hierarchy [122, 123], with non-visual sensory areas as described above, as well as with the area prostriata  which might constitute a gateway to the motor system . This connectivity pattern could be the anatomical support of the neuronal modulation of V1 responses by higher cognitive processes such as attention mechanisms [126, 127] or memory tasks [128, 129]. The present results suggest that multisensory integration should be added to the list of cognitive processes performed in V1.
We thank F. Lefevre et S. Aragones for care of the animal, C. Marlot for her precious work on the bibliography data base, and L. Reddy for corrections on the manuscript. Grant Support. The CNRS Atipe program (YW and BP), the CNRS Robea program (YT, SC, BP).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.