In the current study, we set out to investigate how the intelligibility of connected speech is reflected in behavioral measures as well as in the concomitant activity in the auditory cortex and surrounding brain areas. By varying the intelligibility of the stimuli while keeping the acoustic features of the stimuli constant, the experimental design allowed us to tentatively identify cortical processing related to speech comprehension. Initially unintelligible, acoustically distorted sentences resulting in a 30% subjective intelligibility rating, were perceptually changed by presenting intact, undistorted versions of the sentences. Upon a second presentation of the acoustically distorted versions of the sentences, their intelligibility increased markedly, up to 80%. These perceptual changes were reflected in the transient and sustained activation of auditory cortex and surrounding brain areas.
In the gradiometer analyses, local activity of the auditory cortex at 100 ms as indexed by the N1m response was sensitive to the acoustic structure of speech in that the distorted stimuli elicited stronger activation with an earlier peak latency than the undistorted stimuli. An increase of response amplitude and decrease of latency was observed also at around 200 ms, in the P2m response. The amplitude and latency effects of the P2m were substantially more pronounced in the right hemisphere than in the left. These findings indicate that transient activity of the auditory cortex is sensitive to the acoustic properties of sound during the early (up to 300 ms) processing stages of connected speech, and that the right hemisphere is more sensitive to acoustic variability than the left. The initial transient responses were followed by a sustained response, arising at around 300 ms, and appearing to consist of an early (300-1000 ms) and a late (1000-3000 ms) phase. The early phase was more prominent in the left hemisphere and increased in amplitude when subjects attended to the stimuli. Compared to the preceding transient activity, the sustained activation was less sensitive to acoustic distortion of speech.
In the MNE analyses, the auditory cortex and surrounding areas exhibited divergent, bilateral activity patterns associated with acoustic feature processing and speech intelligibility. During the N1m time range, an increase in cortical activity due to stimulus degradation was observed in regions extending from the superior temporal gyrus (auditory cortex; CST in the current notation) to the inferior parts of the postcentral gyrus (CIP). Interestingly, a number of areas within the temporal cortex were sensitive to speech intelligibility during the P2m time range, with the intelligible stimuli - both distorted and undistorted - resulting in stronger activity than the unintelligible stimuli. This activation encompassed the auditory cortex, the inferior frontal gyri (including Broca’s area; AST), the anterior part of the superior temporal gyrus (AIT), and the posterior part of the inferior temporal gyrus (PIT). During the early phase of the SF (300-1000 ms), the auditory cortex was more active in response to the distorted than the undistorted sentences, regardless of their intelligibility. In contrast, cortical activity in the posterior parts of the superior temporal gyrus (including Wernicke’s area; PST) was stronger only during intelligible speech, regardless of whether the stimulus material was acoustically intact or distorted.
In the present experiment, the stimuli were distorted by using amplitude quantization, which has been shown to decrease substantially the intelligibility of isolated speech sounds (see, e.g. [18, 43]). This was also the case in the current study, as the distorted sentences were initially very difficult to understand. However, after the subject was exposed to the undistorted versions of the sentences, the comprehensibility of the distorted sentences increased considerably. It is unlikely that the intelligibility effect seen in both behavioral and brain measures is an effect due solely to the repetition of the distorted stimuli given that the gap between repetition (i.e., between Session 1 and 3) was around 20 minutes. This time span makes it improbable that the subject could have been drawing on any echoic or short-term memory resources. Instead, this increase in comprehension was most likely caused by top-down mechanisms utilizing the long-term memory representations which were instantly activated (or primed; e.g. ) during listening to the intact versions of the stimuli. Similar changes in the perception of acoustically identical speech-like stimuli have been observed also using noise-vocoded sentences [5, 22] and sine-wave speech stimuli . However, in these cases the perceptual changes were brought about through extended training sessions, whereas in the current context, these effects were immediate, and observable after already a single presentation of the undistorted versions of the stimuli. Thus, depending on the experimental setup, it now appears to be possible to study brain mechanisms of perceptual learning occurring over a long time scale as well as rapid activation of linguistic memory representations.
The changes in the acoustic structure of the speech stimuli brought about by distortion were reflected in both the transient and sustained activation patterns of the auditory areas. In contrast, the temporal regions anterior and posterior to auditory cortex (area CST) were insensitive to degradation. The observed increase in the amplitude of the transient responses is in line with earlier results employing the same distortion method [17–19]. These studies have demonstrated that the amplitude increase of the N1m and P2m responses is related to an increase in harmonic frequencies in the signal spectrum brought about by quantization. According to this explanation, the additional harmonics activate a larger number of neurons involved in the pitch extraction process. In the current study the latency of the transient responses was also affected by the distortion, with earlier N1m and P2m latencies for the distorted sentences. This finding deviates from our earlier results using isolated speech sounds (~200 ms vowel sounds), for which the response latencies remained unchanged when the stimuli were distorted. One reason for these differences may lie in the experimental design: in previous studies by Miettinen et al. [17–19], short-duration isolated vowels were repeated at a fast rate whereas in the current case long-duration sentences with a complex, continually evolving spectral structure were presented with intervening long silent periods. Similar latency results were recently reported by Obleser and Kotz , who found that the N1m response peaks earlier and is larger in amplitude for distorted sentences than for their undistorted counterparts.
In the present experiment, the auditory cortex was highly responsive to distortion of speech, which is consistent with prior hemodynamic studies showing that the core auditory areas are sensitive to acoustic differences in speech stimuli [21–28]. The regions surrounding the auditory cortex, in turn, were sensitive to the intelligibility of speech, with stronger activation elicited by intelligible speech regardless of whether the stimulus material was distorted. These findings are congruent with the above fMRI results, in particular with those by Okada et al. , who observed a bilateral sensitivity of both the anterior and posterior superior temporal regions to speech intelligibility. Importantly, we observed that, already during the P2m time range, areas in the vicinity of the auditory cortex were sensitive to speech intelligibility as well (see Figure 8). This intelligibility effect, observable presumably because of the temporal resolution of the MEG, might reflect the influence of top-down feedback from higher-order cortical areas on the activity of auditory cortex. Similar findings have also been reported by Wild et al.  and Sohoglu et al. , who demonstrated that prior expectations of speech content modulate the activity of auditory cortex during listening to distorted speech.
The novel experimental paradigm introduced here points to several interesting possibilities for future research. Firstly, one should keep in mind that the current intelligibility effects in cortical activity were observed in the passive condition which always followed the active condition, and it therefore remains to be clarified whether there were carry-over effects from one to the other. This interesting issue, related to the decay time of recognition memory, clearly deserves further study. Secondly, an important question for future investigation is how the number of sentences used in the experiment affects intelligibility and behavioral performance. Assuming the memory system probed with the current paradigm has a capacity limitation, increasing the number of sentences should at some point lead to decreased performance. Indeed, the intelligibility of the sentences in the current study might have been facilitated by the limited number of words and sentence stubs used to construct the stimulus material. Thirdly, in studying the priming of memory representations of speech, a further step, requiring a larger set of sentences than in the current case, would be to average brain responses selectively based on the behavioral performance (in terms of unintelligible vs. intelligible sentences), and to study how this is reflected in the activation of brain areas. We expect that this approach would lead to even more pronounced intelligibility effects in cortical activity than those reported here.