Summary of results
Various left-hemispheric perisylvian structures known to support the perception of spoken language showed, as expected, significant hemodynamic responses to the test materials of this study both in sighted and blind subjects. Furthermore, the precentral gyrus of the same side and the cerebellum displayed significant BOLD signal changes under both forward speech conditions as well as reversed moderately fast speech. These observations are in line with clinical and functional imaging data pointing at a contribution of those structures – under specific circumstances – to auditory speech perception. For example, the cerebellum has been found to engage in the encoding of specific temporal-linguistic information during word identification tasks .
Similar to a preceding single-case study , individuals with vision loss exhibited significant hemodynamic activation both of right-hemispheric V1 and contralateral FG – covarying with capabilities of ultra-fast spoken language comprehension. More specifically, the BOLD signal changes within these two areas showed a positive correlation with individual ultra-fast speech perception skills and, obviously, depended upon semantic, syntactic, and/or phonological content since time-reversed (backward) speech stimuli were associated with reduced hemodynamic activation. Visual inspection of the data, furthermore, suggests a more extensive and a more bilateral occipital response pattern in the three early-blind as compared to the late-blind individuals. In addition, a positive correlation between BOLD signal magnitude and the level of ultra-fast speech understanding emerged at the level of Pv on either side.
Interactions between left FG and left perisylvian cortex in blind listeners during ultra-fast speech perception
FG is embedded into the so-called ventral route of the central-visual system which, especially, engages in object recognition (e.g., ), but may contribute to phonological operations as well (e.g., ). Repeatedly, first, functional imaging studies found this region to support pre-lexical stages of reading tasks and, more specifically, to “house” visual word forms . Among other things, FG has been observed to respond to spoken lexical items even in sighted people (e.g., ). Second, impaired speech sound processing in children with reading difficulties seems to be associated with diminished connectivity between FG and frontal language areas . Conceivably, thus, left FG cooperates with the posterior and anterior perisylvian “language zones” – more specifically, ipsilateral IFG and aSTS as well as bilateral pSTS – during ultra-fast speech comprehension. Clinical and functional imaging data indicate a contribution of left IFG to spoken language perception – at least in case more demanding segmentation processes and/or working memory operations are involved . Left pSTS has been found to respond to acoustic signals conveying phonetic-phonological information, irrespective of intelligibility, whereas hemodynamic activation of the anterior part of the same sulcus is restricted to meaningful verbal stimuli [22, 31]. Whereas both phonemic as well as non-phonemic sound structures elicit BOLD signal changes within bilateral pSTG, responses of left-hemispheric anterior and middle STS are restricted to familiar consonant-vowel syllables . Furthermore, pSTS at either side has been found associated with phonological aspects of speech recognition . Against this background, the observed temporal lobe activation pattern might be associated with the conveyance of information into higher-order supramodal cortical structures such as (i) the left temporo-parieto-occipital junction, supporting meaning-based representations (auditory-to-meaning interface), and (ii) left-hemispheric frontal areas, providing access to speech production units and phonological working memory (auditory-motor interface) (see  for a review). Presumably, left FG representing a secondary phonological area expands the phonological network to cope with the higher processing demands during ultra-fast speech perception.
Lateralized, i.e., predominantly right-hemispheric hemodynamic activation of V1 in blind listeners during ultra-fast speech perception
The present investigation supports the suggestion that, indeed, visual cortex contributes in a causal manner to enhanced auditory speech processing skills in blind subjects since, first, the capability of ultra-fast spoken language comprehension covaried with the strength of hemodynamic activation of right V1 and, second, the involvement of this structure was considerably reduced during listening to reversed, i.e., non-meaningful test materials.
A series of studies indicate age of blindness onset to significantly constrain the capacity for structural/functional reorganization of central-visual areas in humans. More specifically, only individuals suffering from congenital blindness – lacking any stimulus-driven elaboration of the visual system – appear to be able to “mold” occipital cortex in a fundamentally different manner as compared to sighted subjects . It is, furthermore, still a controversial issue in how far late-onset visual deficits may induce cortical reorganization in terms of functional cross-modal plasticity. For example, neuroimaging studies point at a decline of those capabilities after an age of 14–16 years [21, 36]. And Wan and colleagues  found early, but not late (≥ 14 years) vision loss to enhance auditory perception during non-speech tasks. The present investigation did not find any significant correlations between the time of onset or the duration of blindness, on the one hand, and the ability to understand ultra-fast speech as well as hemodynamic activation of visual cortex, on the other. Thus, the extent of the recruitment of the central-visual system appears primarily to correlate with behavioral performance rather than the age at vision loss (see [38, 39] for similar data). Nevertheless, a significant impact of this clinical parameter upon cross-modal fMRI effects cannot securely be excluded since both high-performing (performance > 60% correctly repeated words) early-blind participants, but only a single skilled late-blind individual (1 out of 5 subjects) displayed bilateral occipital responses. By contrast, a right-lateralized distribution emerged in most late-blind individuals. Similarly, Braille reading was reported to induce responses of the visual cortex at either side in early-blind subjects, whereas late-blind individuals display an activation pattern restricted to the hemisphere ipsilateral to the reading hand . Although the rather small and heterogeneous sample of blind subjects of the present study precludes any firm conclusions, ultra-fast speech perception does not appear to depend upon major rewiring of visual cortex – comparable to the reorganizational processes bound to congenital blindness. Rather, this perceptual capability seems associated with task-dependent cross-modal functional plasticity based, conceivably, on the engagement of existing anatomical structures.
Principally, recruitment of – predominantly right-hemispheric – occipital cortex during ultra-fast speech comprehension could either reflect early, i.e., signal-related computational operations or could be bound to higher-order processing stages, succeeding semantic speech encoding. Previous studies found speech- or language-related tasks such as verbal memory or verb generation tests to yield, as a rule, bilateral hemodynamic activation of occipital cortex – in the presence of more pronounced left-sided responses [14, 41]. Hemodynamic activation of primary visual areas at either side also could be documented in blind individuals listening to meaningful as well as meaningless sentences . Again, Braille reading yielded bilateral occipital responses, slightly enhanced within the hemisphere contralateral to the “reading” hand (, see also ). By contrast, the observed hemodynamic activation of V1 in blind listeners during ultra-fast speech perception displayed strong lateralization effects toward the right side. Thus, the distinct informational cues of the acoustic signal facilitating speech perception under time-critical conditions might be predominantly processed within the non-language-dominant hemisphere. Short spectro-temporal “segments” of the acoustic signal, extending across time intervals of a few tens of milliseconds, encode most of the information related to single speech sound categories such as the various consonants of a language system (e.g., ). Important acoustic features within this domain are, e.g., the formant transitions and the voice onset time of stop consonants. It is well established that the extraction of those segmental aspects of spoken language mainly depends upon left-hemispheric perisylvian “language zones”, including anterior and posterior aspects of the superior temporal lobe and posterior ventro-lateral frontal cortex [30, 44, 45]. Besides those segmental aspects, the acoustic speech signal conveys suprasegmental (prosodic) information such as the intonation of an utterance (“sentence melody”), related to the fundamental frequency contour of the speech signal. In addition, prosodic information also encompasses the specification of temporal structures such as rhythmic and metric patterns [46, 47]. By contrast to left-lateralized encoding of the segmental level of verbal utterances, various sources of evidence indicate primarily contralateral representation of suprasegmental/prosodic speech information (e.g., ). In case of formant-synthesized verbal utterances, such as the test materials used in the present study, the prosody of spoken language is more or less restricted to syllable timing (syllabic rhythm) as reflected in the speech envelope, i.e., the low-pass-filtered intensity contour of the acoustic signal. A recent whole-head magnetoencephalography (MEG) study, including stimulus materials (ultra-fast and moderately fast speech) similar to the present fMRI investigation, provides additional evidence for a direct translation of the acoustic correlates of syllable structure into electrophysiological brain activity . Most noteworthy, electrophysiological recordings found the speech envelope to be predominantly processed within the right hemisphere . Conceivably, the observed occipital lateralization effects during ultra-fast speech perception in blind subjects indicate V1 to engage in the analysis of the speech envelope or, more specifically, syllabic rhythm. Against this background, activation of the central-visual system might also be expected in case of unintelligible reversed speech. However, Ahissar and colleagues  reported a significant correlation between signal-driven syllable-related brain activity of auditory cortex and speech comprehension. This observation could be explained by top-down processes bound to expectations related to the sound structure of the incoming signal which interact with the initial processing of the auditory input. Assuming, thus, right-lateralized early prosodic processing, occipital pole responses correlating with ultra-fast speech comprehension might reflect signal-driven rather than higher-order comprehension processes.
Mechanisms of ultra-fast speech perception: facilitated verbal consolidation under time-critical conditions
Blind individuals have been found to outperform sighted subjects in tasks requiring temporal order judgments of backward-masked tone stimuli, particularly, in case of brief intervals (40 ms) between the respective auditory events . This condition resembles, by and large, ultra-fast speech since each syllable can be expected to act as a potential masker of the preceding one. Stevens and Weaver  assigned the increased temporal resolution of non-speech acoustic events in blind subjects to “perceptual consolidation”, i.e., higher-order processing stages such as auditory working memory, rather than the analysis of spectro-temporal signal characteristics. These suggestions might provide a basis for the explanation of the observed mesiofrontal engagement in the perception of accelerated verbal utterances. Besides visual cortex, hemodynamic activation of left SMA was found to covary with the ability to comprehend ultra-fast spoken language. Several studies indicate this mesiofrontal area to engage in the syllabic organization of verbal utterances during speech production [54–56]. On a broader scale, SMA appears to support timing processes across various sensorimotor and cognitive domains [57, 58]. Furthermore, clinical as well as experimental studies point at a contribution of SMA also to speech perception and verbal working memory [59–63]. Since the verbal encoding of longer stretches of speech such as the test materials of the present study must be expected to engage short-term memory processes (see ) and since SMA appears to act as a platform of timing operations, related, among other things, to verbal working memory functions, right V1 might provide a “fast track” channel conveying temporal information on syllable structure directly from primary auditory areas via left SMA into verbal working memory. More specifically, the cooperation of primary auditory areas, right V1, and left SMA could facilitate a signal-driven timing mechanism for the transformation of the acoustic signal into a stable (consolidated) verbal code under time-critical conditions.
The role of the pulvinar during ultra-fast speech perception: synchronization of central-visual and –auditory areas
Besides several cortical regions, ultra-fast speech comprehension capabilities also covaried with the hemodynamic responses of Pv at either side. Animal data obtained in tree shrews indicate those thalamic nuclei to project to V1 as well – in addition to higher-order areas of the central-visual system . As concerns primates, at least some Pv subcomponents are embedded into reciprocal connections with both striate and extrastriate areas (e.g., ). In consideration of this network architecture, the respective parts of the Pv have been assumed to support attentional processes operating within the visual domain. Furthermore, tract-tracing studies in monkeys found both the ascending auditory pathways as well as the optic tracts to send convergent collateral fiber tracts to deep layers of the superior colliculus, and the respective target neurons, in turn, project via Pv to auditory as well as visual cortex . Among other things, the pulvinar contributes to the detection of temporo-spatial coincidences of audiovisual signal configurations . In blind subjects, Pv might help to synchronize – driven by acoustic input – striate cortex with the central-auditory system during ultra-fast speech perception, based upon cross-modal subcortical pathways that in sighted individuals subserve audiovisual coincidence detection and the control of visual attention. Given, furthermore, direct anatomical connections between auditory and visual areas [69–71], early multisensory convergence processes at the cortical level must be assumed – as demonstrated, e.g., by means of transcranial magnetic stimulation . These considerations suggest the observed hemodynamic responses within bilateral Pv and primary visual areas, to reflect early (thalamo-cortical) rather than later (cortico-cortical) stages of ultra-fast speech processing.
Contribution of visual cortex to the perception of time-compressed speech in normal subjects
In principle, speech perception represents an audiovisual process, and under difficult acoustic conditions lip reading may considerably improve spoken language understanding. It must be expected, thus, that the visual system encompasses – to some extent – preconfigurated connections with the auditory system providing a basis for interactions between the two modalities. Indeed, a recent Diffusion Tensor Imaging study (DTI) – evaluating white matter parameters in children – found inter-subject differences in fractional anisotropy to correlate with the comprehension of time-compressed speech : Moderately manipulated signals (40% compression) yielded these effects in white matter areas adjacent to audiovisual association cortex and posterior cingulate gyrus while a greater degree of compression resulted in changes of tracts adjoining prefrontal areas (dorsal and ventral).
A previous fMRI study reported compressed as compared to normal speech to elicit a “convex” distribution pattern of hemodynamic responses within IFG, and BOLD signal changes paralleled the extent of this manipulation as long as intelligibility of the verbal utterances was preserved . Similarly, sighted subjects showed reduced IFG activation in the present investigation while listening to ultra-fast speech. A further fMRI experiment revealed learning to understand time-compressed speech to be associated with increased activation of left and right auditory association cortices as well as left ventral premotor cortex, suggesting speech perception to involve the integration of multi-modal data sets, mapping acoustic patterns onto articulatory motor plans . At very high syllable rates, sighted subjects, obviously, do not recruit the visual system in order to enhance speech comprehension. Furthermore, invasive electrophysiological measurements during application of time-compressed speech revealed the speech envelope – up to frequencies of 15 Hz – to be well-represented at the level of auditory cortex, suggesting that the time resolution of primary auditory cortex is not the limiting factor for ultra-fast speech comprehension . Similarly, our group found significant MEG phase locking to envelope features of ultra-fast verbal utterances (16 syl/s) [49, 76]. In this latter study, blind individuals showed an additional phase-locked component bound to right visual cortex – absent in sighted subjects. Although, principally, primary auditory cortex should be able to track the speech envelope, this extracted information might not suffice to trigger phonological processes during lexical encoding at the level of the working memory.
A recent fMRI study – delineating the “bottleneck” of time-compressed speech processing – found higher stages of language processing associated with “buffer regions” within left ventrolateral frontal cortex/anterior insula, precentral gyrus and mesio-frontal areas to represent the limiting factor of spoken language comprehension . Our data suggest that the visual cortex must also be considered an essential prerequisite to enhanced speech encoding at high syllable rates. Altogether, sighted subjects appear unable – or at least not to “attempt” – to recruit the central-visual system in order to speed up comprehension of spoken language. Occipital cortex, indeed, responds to auditory stimulation, given the negative values of percent signal change, but appears rather to be “actively suppressed” during attempts to understand ultra-fast speech. Against this background, the bottleneck within the frontal language network referred to should represent the upper limit of spoken language understanding. Blind subjects might be able to circumvent these constraints, based upon the recruitment of an additional timing mechanism bound to interactions between pulvinar, auditory/visual cortex, and SMA.
Limitations of the study
The present study did not find onset of vision loss to pose major constraints upon ultra-fast speech perception capabilities. However, larger well-documented subject groups are required to further corroborate these findings and to identify other clinical factors, such as disease duration, with an eventual impact upon the recruitment of central-visual structures during auditory language comprehension. Furthermore, intra-individual long-term studies are needed to track the time-course of the cerebral reorganization processes associated with the acquisition of ultra-fast speech perception skills and to determine in how far vision loss represents a necessary pre-condition for this capacity. In order to further delineate any differential task-dependent cross-modal reorganization patterns in subjects with early and late vision loss, a larger sample of early-blind individuals has to be recruited.