Open Access

Thai lexical tone perception in native speakers of Thai, English and Mandarin Chinese: An event-related potentials training study

  • Edith Kaan1Email author,
  • Christopher M Barkley2,
  • Mingzhen Bao1 and
  • Ratree Wayland1
BMC Neuroscience20089:53

DOI: 10.1186/1471-2202-9-53

Received: 06 November 2007

Accepted: 23 June 2008

Published: 23 June 2008



Tone languages such as Thai and Mandarin Chinese use differences in fundamental frequency (F0, pitch) to distinguish lexical meaning. Previous behavioral studies have shown that native speakers of a non-tone language have difficulty discriminating among tone contrasts and are sensitive to different F0 dimensions than speakers of a tone language. The aim of the present ERP study was to investigate the effect of language background and training on the non-attentive processing of lexical tones. EEG was recorded from 12 adult native speakers of Mandarin Chinese, 12 native speakers of American English, and 11 Thai speakers while they were watching a movie and were presented with multiple tokens of low-falling, mid-level and high-rising Thai lexical tones. High-rising or low-falling tokens were presented as deviants among mid-level standard tokens, and vice versa. EEG data and data from a behavioral discrimination task were collected before and after a two-day perceptual categorization training task.


Behavioral discrimination improved after training in both the Chinese and the English groups. Low-falling tone deviants versus standards elicited a mismatch negativity (MMN) in all language groups. Before, but not after training, the English speakers showed a larger MMN compared to the Chinese, even though English speakers performed worst in the behavioral tasks. The MMN was followed by a late negativity, which became smaller with improved discrimination. The High-rising deviants versus standards elicited a late negativity, which was left-lateralized only in the English and Chinese groups.


Results showed that native speakers of English, Chinese and Thai recruited largely similar mechanisms when non-attentively processing Thai lexical tones. However, native Thai speakers differed from the Chinese and English speakers with respect to the processing of late F0 contour differences (high-rising versus mid-level tones). In addition, native speakers of a non-tone language (English) were initially more sensitive to F0 onset differences (low-falling versus mid-level contrast), which was suppressed as a result of training. This result converges with results from previous behavioral studies and supports the view that attentive as well as non-attentive processing of F0 contrasts is affected by language background, but is malleable even in adult learners.


Variation in voice pitch, an auditory impression of the rate of vocal fold vibration (F0), plays a different linguistic function in tone and non-tone languages. Tone languages, such as Thai and Mandarin Chinese, use differences in either average F0 or F0 contours (or slopes) over strings of otherwise identical phonemes to distinguish between different words in the lexicon from one another. For instance, the Thai syllable [kha:] means something completely different when pronounced with a tone that is low-falling ("galangal root"), low-falling and then rising ("leg"), high-falling ("I, servant"), high-rising ("to do business in") or mid-level ("to be lodged in"). In non-tone languages such as English, on the other hand, pitch variation is not used to differentiate word meaning. However, even though F0 is not used to distinguish meaning between words in English, it can make one syllable more perceptually prominent or more salient than neighboring syllables in multi-syllabic words. For example, the first syllable of the word 'cookie' is stressed, and perceptually more salient than the second syllable. The F0 or pitch (as well as intensity or loudness and vowel duration) of the stressed syllable is typically higher than its neighboring unstressed syllable. In addition, lexical stress can also be used to distinguish a compound word 'a hotdog' from a noun phrase 'a hot dog'. Variation in the linguistic functions of F0 may account for perceptual difficulty typically experienced among adult native speakers of a non-tone language when consciously perceiving and distinguishing among lexical tones differing in pitch level or pitch contours. The aim of the present ERP study was to investigate whether the processing of lexical tones is affected by the listener's native language (tone or non-tone) even when the participants are not paying conscious attention to the stimuli, and whether such non-attentive perception can be altered by laboratory training, even in adults.

Previous behavioral studies have shown that native speakers of a non-tone language (e.g. English) poorly discriminate among lexical tones as compared with native speakers of a tone language (e.g. Mandarin Chinese), even when the latter are unfamiliar with the tones being tested [15]. This perceptual difficulty for speakers of non-tone languages is due in part to differences in the way lexical tones are processed among native and nonnative listeners of tone languages. Native speakers of a non-tone language have been shown to focus more on the average F0, and F0 offset or onset values, whereas speakers of a tone language focus more on F0 contour [68]. Interestingly, previous behavioral studies have also shown that adult native speakers of a non-tone language may improve in their perception of lexical tones after exposure to the tones either in a natural or classroom setting, or during laboratory training [3, 4, 9, 10]. Training also affects the brain areas involved in lexical tone processing. fMRI studies comparing brain activation during lexical tone perception after versus before training showed an increase in activation in the left posterior superior gyrus [11, 12]. In addition, right hemisphere activation was observed [12], especially in poor learners [11]. This suggests that the perceptual and neural systems involved in processing differences in pitch and pitch contours are still malleable, even in adulthood.

The discrimination or identification tasks used in the behavioral and fMRI studies on lexical tone perception involve conscious comparison or categorization. Performance in these experiments may therefore have been affected by factors such as working memory load or attention. In the present study we therefore studied the non-attentive discrimination of lexical tones and the effect of language background and training by using Event-Related brain Potentials (ERPs). ERPs can be recorded while the participant is presented with auditory stimuli, but engaged in an unrelated task such as watching a movie. The mismatch negativity (MMN) is a frontal negative ERP component occurring about 100–300 ms after stimulus onset. It is elicited by infrequent stimuli that deviate from frequently presented (standard) stimuli in pitch, duration, voice onset time, or other acoustic or phonetic properties [13]. Since this component is elicited even while people are asleep or in a coma, this component is regarded as an index of automatic processing of auditory differences, that is, processing that does not require voluntary attention. The MMN has been shown to increase in amplitude and, in some cases, to have a shorter peak latency as behavioral discrimination performance improves. In addition, changes in the MMN have been attested before changes in behavioral discrimination performance [14]. The MMN is therefore a useful tool to study the processing and acquisition of non-native language contrasts [1420]. Since this technique taps into a different level of processing, and does not require overt attention and active comparison by the participant, this method may help us further tease apart the aspects of the stimuli that different language groups are differentially sensitive to at a non-attentive level of processing.

Only a few studies have employed the MMN to investigate the processing of lexical tones. Chandrasekaran et al. [21] investigated the effect of language background on lexical tone perception. Both Mandarin Chinese and untrained English speakers showed a MMN to tone contrasts in Mandarin Chinese. However, only the Chinese participants showed a larger MMN to a distinction that was acoustically more salient, suggesting that language background affects non-attentive processing of lexical tones to some extent. To investigate the effect of both training and language background, Kaan et al. [22] recorded ERPs from native speakers of English, Mandarin Chinese and Thai while they were presented with three Thai tones in an oddball paradigm. ERPs showed no differences between the groups before training. After a two-day perceptual training on the mid-level and low-falling tone, the English showed an increase in MMN amplitude to untrained high-rising deviants, whereas the Chinese showed a decrease in a later negativity in that condition. This suggested that native speakers of tone and non-tone languages were sensitive to different aspects of the stimuli as a result of training. However, no effect of training was observed on the (trained) low-falling tone deviant, to which all groups showed a large MMN before and after training. In addition, behavioral performance at the start of training was close to ceiling for all three subject groups. The differences found in the ERPs may therefore have not been indicative of improved perception of the tones. The ceiling performance may have been due to the use of only one token per tone condition, which did not encourage abstraction of contour categories. In the present experiment we therefore used multiple tokens of three Thai tones, all generated from one naturally produced token (see Methods and Figure 1).
Figure 1

Pitch tracks. Pitch tracks of the three high-rising (red lines), mid-level (black lines) and low-falling (blue lines) tokens.

Three subject groups (Thai, Mandarin Chinese and English speakers) were tested in an ERP oddball task in which they were presented with the stimuli while watching a silent movie. Although this task does not prevent participant from occasionally paying attention to the stimuli, the auditory stimuli are not task-relevant and do not require voluntary attention, in contrast to overt behavioral tasks. High-rising or low-falling tokens were presented as deviants among mid-level standard tokens, and vice versa. In addition a behavioral same/different discrimination task was conducted on the same stimuli. Both the behavioral discrimination and the ERP oddball task were conducted before and after a two-day perceptual categorization training task. We were particularly interested in seeing how the MMN and the later negativity for deviant versus standard stimuli would be affected by language background, training and the degree of behavioral improvement as a result of training. As one can see in Figure 1, the three tone categories differed from each other with respect to their F0 onset values, the steep F0 slope right after the F0 onset, as well as with respect to a later, more gradually developing F0 slope. Given that speakers of a non-tone language (English) have been shown to be sensitive to F0 onset and offset differences, whereas native speakers of a tone language are more sensitive to the later F0 contour, we expected the native English speakers to initially show a larger MMN than the native Chinese and Thai speakers. The Chinese and Thai speakers, on the other hand, were expected to show a more pronounced later negative effect, which may be related to the later contour differences [22]. As the native English speakers become more sensitive to the contour differences, we expected them to pattern more with the Thai and Chinese after training. Moreover, since the stimuli were meaningful words to Thai speakers, but not to Chinese and English speakers, we expected some differences related to the linguistic status of the stimuli. Linguistically perceived stimuli have been shown to involve the left hemisphere more than the right [3, 18, 2326], but see [27, 28]. The Thai were therefore expected to differ from the English and the Chinese participants in terms of the lateralization of the MMN and late negativity, at least, to the extent that the lateralization of scalp-recorded ERPs reflects hemispheric differences in the neural processes involved.


Behavioral discrimination task and categorization training

Performance on the behavioral discrimination task (see Table 1) improved after training [F(1,32) = 20.64, p < 0.001]. This was more so for Chinese and English than for Thai participants, although the LANGUAGE GROUP by TEST TIME interaction did not reach significance [interaction: F(2, 32) = 2.87, p = 0.071; Post versus pre-training: English: t(11) = 3.46, p = 0.005; Chinese: t(11) = 4.13, p = 0.002; Thai: t(10) = 0.66, N.S.]. Before training, the three language groups differed from each other, with the English performing worse than the Chinese and Thai [LANGUAGE GROUP (pre-training): F(2,32) = 7.39, p < 0.001; English versus Chinese: p = 0.007; English versus Thai: p = 0.001]. The Chinese and Thai groups did not differ in their performance [p = 0.42]. After training, the groups did not differ in their ability to discriminate among the tones [LANGUAGE GROUP: F<1, N.S.].
Table 1

Results for the behavioral discrimination task






1.19 (0.66)

1.86 (0.39)

2.06 (0.65)


2.06 (1.28)

2.57 (0.84)

2.21 (0.68)

The mean d' scores pre- and post-training per language group. Standard deviation in parentheses.

Performance in the categorization training (see Table 2) improved between the first and the last training sessions in all three language groups [F(1,31) = 33.92, p < 0.001]. The English showed the largest improvement, the Thai group the smallest [LANGUAGE GROUP by TEST TIME F(2,31) = 4.98, p = 0.013]. Overall, the English performed the worst [LANGUAGE GROUP: F(2,31) = 9.93, p < 0.001; English versus Chinese: p = 0.004; English versus Thai: p = 0.001; Thai versus Chinese: p = 0.26]. After the first training session, the language groups performed significantly different from each other [F(2,31) = 10.18, p < 0.001], with the English performing worse that the Chinese [p = 0.005] and the Thai [p < 0.001]. The Chinese and Thai did not differ significantly from each other [p = 0.19]. A similar statistical pattern was obtained after the last training session [LANGUAGE GROUP: F(2,32) = 7.18, p = 0.003; English versus Chinese: p = 0.007; English versus Thai: p = 0.001; Chinese versus Thai: p = 0.451].
Table 2

Results for the categorization training





First session

24.28 (14.28)

11.67 (7.39)

5.94 (5.59)

Last session

10.67 (7.36)

4.50 (3.65)

2.81 (3.86)

Mean percentage of errors after the first and last 30-minute training session, per language group. Standard deviation in parentheses.

Pre-and post training performance in the behavioral discrimination task correlated strongly with accuracy in the first and last categorization training, respectively [Pre-training: Pearson's ρ = -0.67, p < 0.001; Post-training: ρ = -0.63, p < 0.001]: the fewer errors made in the categorization training, the higher the d' scores in the discrimination task. This indicates that the behavioral discrimination task is a good measure of a participant's pre- and post-training perception ability.

ERP experiment: movie comprehension questions

Mean comprehension accuracy on the movie-related questions in the ERP experiment was 84% (SD 7%), before as well as after training. Before training, the English group scored 87% correct (SD 5%), the Chinese 83% (SD 6%) and the Thai 81% (SD 9%). After training, the accuracy was 85% (SD 6%) for the English group, 85% (SD 9%) for the Chinese, and 84% (SD 7%) for the Thai groups. There were no significant differences in accuracy between pre- and post training sessions and/or among the language groups [ps > 0.2].

ERPs to low-falling tones


The low-falling deviants (minus low-falling standards) showed a MMN at the F3 and F4 electrodes. ERPs for the F3 electrode are displayed in Figure 2. Figure 3 shows the isovoltage maps for the MMN. T-tests of the MMN amplitude at F3 and F4 versus a hypothetical zero showed that the MMN was significant before and after training in the English and Thai speakers [ps <0.004]. In the Chinese, the MMN was weakly present before training [p = 0.067] and significantly after [p < 0.001].
Figure 2

ERPs to Low-falling deviants and standards. ERPs at the left frontal electrode (F3) for the low-falling deviants (dotted line) versus standards (solid line).
Figure 3

Isovoltage maps to Low-falling deviants minus standards: MMN. Isovoltage maps for the 100 ms window surrounding the most negative peak between 100–350 ms, for the low-falling deviants minus standards, defined separately for the language groups and test time.

An ANOVA on the MMN amplitude (deviant minus standard) at the F3 and F4 electrodes showed an interaction of TEST TIME by HEMISPHERE [F(1,32) = 4.33, p = 0.046]: Before training, the MMN was numerically larger at the left hemisphere electrode [F3: -1.64 μV; F4: -1.49 μV ; t(34) = 0.78, N.S.]; after training, the MMN was numerically larger at the right electrode, with the difference almost reaching significance [F3: -1.54 μV; F4: -1.91 μV; t(34) = 1.97, p = 0.057]. Training also weakly affected the differences in the MMN between the groups [TEST TIME by LANGUAGE GROUP: F(2,32) = 2.69, p = 0.084]: The groups weakly differed from each other before training [F(1,32) = 2.89, p = 0.07], with the MMN being larger for the English compared to the Chinese [LSD post hoc comparison, p = 0.025], but not compared to the Thai [p = 0.12]. After training, the groups did not differ [ps > 0.68] (see Appendix).

The difference in MMN latency and amplitude after versus before training correlated weakly with the degree of learning: the greater the improvement in the behavioral discrimination task (d' scores post minus pre-training), the earlier the MMN peak and the smaller (i.e. less negative) the MMN amplitude was after compared to before training [Latency: Pearson's ρ = -0.32, p = 0.063; Amplitude: ρ = 0.32, p = 0.063].

Late negativity

A late negativity was seen for the deviant versus standard in the 350–500 ms window [Midline: F(1,32) = 7.29, p = 0.011; Lateral electrodes: F(1,32) = 6.75, p = 0.014] (see Figure 4.) The negativity was larger over the left hemisphere [F(1,32) = 9.18, p = 0.005], especially before training [CONDITION by TEST TIME by HEMISPHERE: F(1,32) = 6.95, p = 0.013; Pre-training: CONDITION by HEMISPHERE: F(1,32) = 13.94, p = 0.001; CONDITION: Left hemisphere: F(1,32) = 7.76, p = 0.009; Right hemisphere: F(1,32) = 1.32, p = 0.26; Post-training: no effects]. The degree of learning affected the change in the late negativity in the 350–500 ms interval: the larger the increase in d' scores from pre- to post-training, the smaller the late negativity in the 350–500 ms interval post- versus pre-training [Pearson's ρ = 3.68, p = 0.029].
Figure 4

Isovoltage maps to Low-falling deviants minus standards: 350–500 ms. Isovoltage maps for the 350–500 ms window for the low-falling deviants minus standards.

The negativity persisted in the 500–700 ms interval, see Figure 5 [Midline: F(1,32) = 10.11, p = 0.003; Lateral: F(1,32) = 12.70, p = 0.001], and remained larger over the left than the right hemisphere [CONDITION by HEMISPHERE: F(1,32) = 5.203, p = 0.029; Effect of CONDITION: Left hemisphere: F(1,32) = 4.201, p = 0.049; Right: F(1,32) = 2.37, p = 0.13]. Effects involving the factor TEST TIME were not significant in this time window. The later negativity was not affected by language background in either the 350–500 ms or the 500–700 ms time window.
Figure 5

Isovoltage maps to Low-falling deviants minus standards: 500–700 ms. Isovoltage maps for the 500–700 ms window for the low-falling deviants minus standards.

ERPs to high-rising tones

Results for the high-rising deviants versus high-rising standards are displayed in Figures 6 to 9.
Figure 6

ERPs to High-rising deviants and standards. ERPs at the left frontal electrode (F3) for the high-rising deviants (dotted line) versus standards (solid line).
Figure 7

Isovoltage maps to High-rising deviants minus standards:MMN. Isovoltage maps for the 100 ms window surrounding the most negative peak between 100–350 ms, for the high-rising deviants minus standards, defined separately for the language groups and test time.
Figure 8

Isovoltage maps to High-rising deviants minus standards: 350–500 ms. Isovoltage maps for the 350–500 ms window for the high-rising deviants minus standards.
Figure 9

Isovoltage maps to High-rising deviants minus standards: 500–700 ms. Isovoltage maps for the 500–700 ms window for the high-rising deviants minus standards.


Separate T-tests on the MMN amplitude (high-rising deviant minus high-rising standard) at F3 and F4 versus a hypothetical zero showed no significant differences in any of the groups before training [ps >0.18]. After training, the MMN was most robust in the native Thai speakers [Thai: p = 0.014; Chinese and English: ps >0.062]. ANOVAs showed no effects of the experimental manipulations on the MMN (amplitude or latency).

Late negativity

Between 350 and 500 ms, a negativity was elicited by the high-rising deviants versus standards, particularly over the left hemisphere [CONDITION by HEMISPHERE F(1,32) = 10.95, p = 0.002], see Figure 8. This left-lateralized negativity was only seen in the English and in the Chinese, but not in the Thai, leading to a weak interaction of CONDITION by HEMISPHERE by LANGUAGE GROUP [3-way interaction: F(2, 32) = 2.93, p = 0.068; CONDITION by HEMISPHERE, English: [F(1,11) = 6.37, p = 0.028. Chinese: F(1,11) = 9.04, p = 0.012; Thai: F(1,10)< 1, N.S.]. Training had an effect on the anterior-posterior distribution of the negativity [TEST TIME by CONDITION by ANTERIORITY F(4, 128) = 5.18, p = 0.018]: Pre-training, the negativity was numerically largest at frontal sites [CONDITION by ANTERIORITY: F(4, 128) = 3.96, p = 0.038], after training the negativity became broader in distribution and the two-way interaction between CONDITION and ANTERIORITY was no longer significant [F(4, 128) = 1.37, p = 0.26, N.S.]. Figure 8 suggests that this effect was mainly driven by the English group, however the interaction with LANGUAGE GROUP was not significant.

The negativity for the high-rising deviants persisted in the 500–700 ms interval (see Figure 9) [Midline: F(1,32) = 7.25, p = 0.011; Lateral: F(1,32) = 3.24, p = 0.066], with the negativity being larger over the left hemisphere [CONDITION by HEMISPHERE: F(1,32) = 13.619, p = 0.001; Effect of CONDITION: Left hemisphere: F(1,32) = 4.20, p = 0.049; Right hemisphere: F(1,32) = 2.73, p = 0.133]. The negativity was greater left than right at all except parietal regions [CONDITION by HEMISPHERE by ANTERIORITY: F(4,128) = 3.14, p = 0.032; CONDITION by HEMISPHERE was significant in all regions (ps ranging from 0.001 to 0.016) but for the parietal region (p = 0.15)]. The left lateralization was especially seen in the Chinese and English participants compared with the Thai [CONDITION by HEMISPHERE by LANGUAGE GROUP: F(2,64) = 3.43, p = 0.045; CONDITION by HEMISPHERE: Chinese: F(1, 11) = 8.88, p = 0.013 ; English: F(1, 11) = 9.172 p = 0.011; Thai: F <1, N.S. ]. As in the previous interval, the distribution was frontal before training and became broader after training [TEST TIME by CONDITION by ANTERIORITY: Midline: F(4,128) = 5.34, p = 0.014; Lateral: F(4,128) = 9.02, p = 0.002; CONDITION by ANTERIORITY, Pre-training: Lateral: F(4,128) = 4.66, p = 0.024; Post training: Lateral: F(4,128) = 2.44, p = 0.120. Effect of CONDITION Pre-training: Lateral frontal sites F(1,32) = 3.86, p = 0.058; Lateral: fronto-central sites: F(1,32) = 3.06, p = 0.09; Remaining regions: ps>0.18].


All groups showed a MMN before and after training to the low-falling deviants. The MMN was larger over the right hemisphere after training. The English group tended to show a larger MMN before training than the Chinese, even though they performed worse in the behavioral tasks. Both MMN amplitude and latency decreased after training the more the participant improved in the behavioral discrimination task. The MMN was followed by a slow negativity, which was slightly larger over the left than the right hemisphere, and reduced in amplitude as a function of learning. The high-rising deviants elicited no or only a small MMN. The late negativity in this condition was left-lateralized for the English and the Chinese groups. The later negativity was frontal before training, but became broader after training.


The aim of the present ERP study was to investigate the processing of lexical tones when participants are not forced to pay attention to the stimuli, as opposed to previous studies using behavioral techniques only, and to see to what extent such non-attentive processing is affected by training and by native language background. In contrast to previous ERP studies [21, 22], we used multiple tokens per stimulus type to encourage the formation of abstract contour categories and to avoid pre-training ceiling effects. Results from the behavioral discrimination task suggest that this manipulation was successful: performance significantly increased after training in the English and the Chinese groups, who were initially unfamiliar with the Thai stimuli used. Furthermore, behavioral discrimination scores correlated significantly with performance in the categorization training task.

Based on previous experiments showing that native speakers of a non-tone language are more sensitive to F0 onset and offset when discriminating lexical tones [68, 22], we predicted that the English group would show a larger MMN to the deviant categories; the Chinese and Thai on the other hand, previously shown to be more sensitive to F0 contours, were expected to show a more robust later effect. In addition, given that the stimuli were meaningful words in Thai, we predicted a lateralization difference between Thai on the one hand, and English and Chinese on the other.

Our predictions were only partly borne out. We will discuss our findings in turn for the MMN and the late negativity.


All groups showed a MMN to the low-falling tone deviants, before as well as after training; whereas no, or only a smaller MMN was elicited by the high-rising tone deviants. Note that two of the three low-falling tones have an onset frequency falling below the range of the mid-level tones (see Figure 1). The onset frequency of the high-rising tones, on the other hand, falls within the range of that of the mid-level tokens. It is therefore likely that the large MMN found for the low-falling tones reflects differences in F0 onset between the deviant and standard stimuli presented in the same block. These differences were much smaller in the high-rising tones [29, 30].

The MMN was weakly affected by native language background: the English showed a larger MMN to the low-falling tones than the Chinese before training. This supports previous findings [68, 22] that speakers of a non-tone language are more sensitive to differences in onset F0. Our English speaking participants may have been more sensitive to the early F0 differences in the Low-falling conditions, eliciting a larger MMN compared to the Chinese and Thai groups. Note that although the English language group showed the largest MMN before training, they performed worse than the Thai and Chinese in the behavioral discrimination and training. This can also be accounted for by the different sensitivity of tone versus non-tone language speakers. The behavioral tasks probed participant's sensitivity to differences in F0 slope and direction rather than F0 onset, and was therefore harder for non-tone language speakers. The categorization training with multiple tokens per type caused the English speaking participants to become more sensitive to the direction of the pitch contour. This may have induced a modulation of their non-attentive perception, hence a reduction of the MMN amplitude in the English language group after training to the level of the speakers of tone languages.

The MMN became smaller and earlier with behavioral improvement. Typically, the MMN has been found to become larger after training [15, 16, 18]. The decrease in MMN amplitude therefore suggests that the participants, and especially the learners, non-attentively perceived the stimuli in a different way and became less sensitive to the F0 onset differences after training, or at least, as a result of repeated exposure.

For all three language groups, the MMN to the low-falling deviants became more prominent over the right hemisphere after training. This is in contrast to several previous ERP studies that reported an increase in MMN over the left hemisphere after training on linguistic contrasts [18]. To the extent that the lateralization of scalp-recorded ERPs reflects hemispheric differences in the neural processes involved, our findings suggest that even native speakers of Thai employ the right hemisphere more than the left in processing the low-falling versus mid-level tone contrast. This is in spite of the fact that the stimuli are meaningful words for the Thai. A previous study on Mandarin Chinese speakers reports a similar right hemisphere distribution for meaningful lexical tone contrasts [28]. Under an alternative account of hemispheric specialization of speech, the left hemisphere is involved in processing rapid formant transitions, whereas the right hemisphere deals with slower differences in pitch [31]. It may therefore be the case that our participants became more sensitive to the gradual change in F0 contour, focused less on the differences in F0onset values and the abrupt change in F0 at the beginning of the stimuli, and thus involved the right hemisphere more as a result of training.

The later negativity

Second, we were interested in the later negativity. In contrast to our prediction, no difference was seen between English and Chinese speakers. All groups displayed a negativity to both the low-falling and high-rising deviants versus standards. Late negativities reported in the literature have been associated with cognitive, possibly non-attentive processing of sound change [32], or processing at a higher level of abstraction [33, 34] including harmonic integration in music contexts [35, 36]. Alternatively, the late negativity may reflect reorienting of attention after involuntary attention to deviant stimuli [37, 38]. A smaller late negativity may then indicate a more efficient neural processing, or less attentional reorienting. For the low-falling deviants, the late negativity became less left-lateralized after training and smaller in amplitude the more the participant improved on the behavioral task. For the high-rising deviants, the Chinese and English speaking groups showed a left-lateralization of this negativity for the high-rising deviants, regardless of training.

Note that the low-falling stimuli continue to differ from the mid-level stimuli in terms of a falling pitch slope right after the initial sharp fall in F0 (see Figure 1). The high-rising tones, on the other hand, only show a gradual increase in F0 compared to the mid-level tones, starting at around 290 ms after onset. Two of the three high-rising tokens start to exceed the F0 range of mid-level tones even later. The contour deviance is therefore more subtle in the high-rising than low-falling conditions in the current study. Since training focused on contour differences, the processing of the low-falling contour may therefore have required less effort after training in the learners, hence the reduction of the late negativity in this condition, but not in the high-rising condition in this experiment. The left-lateralization of the late negativity in the high-rising condition in the Chinese and English groups suggests that the non-native language groups process the Thai high-rising contour in a manner that is different from native Thai speakers. Comparable to the MMN, and the late negativity in the low-falling condition, this waveform may shift from the left to the right hemisphere when listeners become more proficient in detecting the contours. Apparently, the categorization training was not sufficient to give rise to these effects in the non-native speakers. It remains to be seen if a longer period of training will lead to a shift in hemispheric lateralization to be observed.

The relation between ERPs and behavioral data

Behavioral studies on the perception and acquisition of foreign language contrasts are potentially confounded by the attentional and memory load that is imposed by most discrimination or categorization tasks. Using ERPs overcomes this problem because passive listening tasks can be used which do not require any explicit attention or overt behavioral response from the participant. On the other hand, behavioral and ERP studies may tap into different aspects of processing. ERPs may be more sensitive to differences in physical properties of the stimuli than behavioral tasks. In addition, behavioral studies may encourage participants to actively form abstract perceptual categories, whereas passive listening oddball tasks, as used in the current ERP study, may do so to a lesser extent. It is therefore not surprising that we observed some discrepancies between our behavioral and ERP data. ERPs are therefore a good complementary method to behavioral studies, and are a good tool to help uncover what aspects of the stimuli different language groups are differently sensitive to.

We have already discussed the larger MMN to low-falling deviants seen in the English group pre-training in spite of this group's poor performance on the behavioral tasks. This can be accounted for by the MMN being a reflection of a participant's sensitivity to early differences between the stimuli, whereas the behavioral tasks tapped more into the participant's ability to actively form categories on the basis of the later pitch contour. In contrast to the MMN, the late negativity did not correspond to the behavioral differences observed between the groups before training. However, the amplitude of the late negativity in the low-falling conditions did correlate with behavioral improvement: the late negativity amplitude became smaller the more the participant improved in the discrimination task. In the high-rising condition, the late negativity became more broadly distributed after training. Finally we would like to point out that in spite of differences in language background, all participant groups elicited largely similar ERP components and that, with the exception of the MMN, the effect of training was largely the same among the groups. This suggests that the neural mechanisms involved in non-attentively perceiving tone stimuli and the effects of training thereon may have been largely unaffected by language background.


In sum, native speakers of English, Chinese and Thai recruited largely similar neural mechanisms when non-attentively processing Thai lexical tones. Training induced comparable changes in the language groups. However, and converging with results from behavioral methods using different stimuli and techniques, we found that native speakers of English were initially more sensitive to early F0 differences before training. After training, this language group became more similar to native tone-language speakers. In addition, native speakers of English and of Mandarin Chinese processed the late shallow contour in the high-rising Thai tone differently from native Thai speakers. Future experiments will determine whether this can be affected by a more extended period of training.



Twelve native speakers of American English (8 men), 12 native speakers of Mandarin Chinese (People's Republic of China) (6 men), and 11 native speakers of Thai (5 men) were recruited from the University of Florida community. Informed consent was obtained from each participant according to the procedures of the University of Florida Institutional Review Board. All participants were healthy young adults, aged 19–35, right handed as assessed by the Edinburgh handedness inventory [39], and with no history of neurological disease or language disorders as indicated by a self-report. All had a minimal bilateral hearing range of 500 to 8,000 Hz measured at 25 dB HL. The American English speakers did not have any experience with a tone language; the native Chinese speakers did not have experience with any other tone language, except one who spoke a Chinese dialect in addition to Mandarin Chinese. Participants were paid for participation. Ten additional participants were run, but were omitted from analysis because of incomplete data sets (due to technical difficulties or failure to return for all sessions).


Nine stimuli were synthesized on the basis of one naturally generated instance of the Thai mid-level tone syllable [kha:] produced by a female native speaker of Thai and digitized at 22050 Hz sampling rate with a 16-bit amplitude resolution. Using the Praat speech analysis software, the original mid-level tone was shortened from 610 ms to 450 ms. The pitch contour of this mid-level tone was then manually changed to approximate the pitch contours of the natural tokens of the Thai low-falling and high-rising tones. The entire F0 contour of each of the three resulting stimuli was then shifted down -15 Hz and -30 Hz to simulate three different talkers, thus yielding three tokens for each of the three tone types, see Figure 1. All stimuli were normalized for RMS amplitude (98% of the scale). All 3 tokens of each tone were then presented to two native Thai speakers (one male and one female) and were judged to be acceptable exemplars of each of the three tone categories. Sound files and spectrograms of each token are provided as supplementary materials (Additional files 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18).


Participants were tested on these stimuli on four consecutive days. Stimuli were presented binaurally, one at a time over head phones at a comfortable hearing level (65 dB). An ERP oddball task was conducted on Days 1 and 4; two categorization training sessions each were conducted on Days 2 and 3, with a behavioral discrimination task either preceding (Day2) or following (Day 3) the training.

Behavioral discrimination task

In the behavioral discrimination task (Days 2 and 3) the participant heard a sequence of three different stimuli A B C, separated by 575 ms. A and B were always from the same tone category (either low-falling, high-rising or mid-level). The last stimulus, C, was either of the same or of a different contour category, and the participant was asked to indicate whether the contour was same or different by clicking a mouse button (113 trials total: 108 experimental trials and 5 warm-up trials that were not analyzed). The response side for the 'same' and 'different' responses was counterbalanced among participants. If no response was given after 3 seconds, the next trial started. Responses longer than 3 seconds (2.2–3.5% per session and language group) were treated as no-response errors. D' scores were calculated on the percentage of hits (correct 'different' response in case tone C was of a different type than A and B) and false alarms (incorrect 'different' response when A, B and C were of the same category). Null responses were not included in d'score calculation.

Categorization training

In the categorization training sessions (Days 2 and 3), participants heard one stimulus per trial. They were asked to classify a token as being of tone type A, B or C by clicking a box on the screen [4, 5, 22]. During the introduction phase of the training, they heard the three tokens of tone type A (low-falling), followed by the three tokens of tone type B (mid-level), followed by the three tokens of tone type C (high-rising). After this was repeated three times, the tokens were presented in random order for a total of 81 trials (each token presented 9 times) and accuracy was recorded. Participants were allowed to replay the sound. If an incorrect response was given, the frame around the box with the correct answer would blink. The inter-trial interval was 3 seconds. Responses longer than 3 seconds (including replays) were omitted from analysis. This amounted to 0.6–3.2% of the data per session and language group. One session lasted 30 minutes and was repeated on the same day after a short break. Data from one Chinese participant for the first training session on Day 2 were missing due to technical failure. Hence, this participant is omitted in all analyses involving this first session.

ERP oddball experiment

In the ERP oddball task (Days 1 and 4), the stimuli were presented in a continuous stream. Four stimulus blocks were presented, the order counterbalanced across participants: (1) mid-level presented as standard, high-rising as deviant; (2) high-rising as standard, mid-level as deviant; (3) mid-level as standard, low-falling as deviant; (4) low-falling as standard, mid-level as deviant. A total of 1200 stimuli were presented per block: 1080 of the standard category and 40 of each of the three deviant tokens (i.e., 10% deviants). The inter-stimulus (offset-to-onset) interval was randomized between 500–650 ms to prevent interference from regular biological rhythms on the waveforms. The order of the stimuli was pseudo randomized such that two deviants were separated by at least two standards. The length of each block was 17 minutes. While the auditory stimuli were presented over headphones, participants watched a silent movie (Charlie Chaplin's 'The Gold Rush' or Buster Keaton's 'The General'). They were told that they would receive questions about the movie after each of blocks, and were instructed to ignore the sounds. A different movie fragment was played during each session. Each ERP session lasted about 2 hours, including set up and debriefing.

EEG was recorded from 39 Ag/AgCl scalp electrodes mounted in an elastic cap with active shielding (Easy-Cap, Falk Minow, Herrsching-Breitbrunn, Germany) combined with an ANT amplifier (ANT Software b.v., Enschede, The Netherlands). Electrode positions used were: Midline: Fz, FCz, Cz, CPz, Pz; Lateral left/right hemisphere: FP1/2, F7/8, F5/6, F3/4, FT7/8, FC5/6, FC3/4, T7/8, C5/6, C3/4, TP7/8, CP5/6, CP3/4, P7/8, P5/6, P3/4, O1/2. Horizontal and vertical EOG were recorded from the outer canthi, and below and above the right eye, respectively. Additional electrodes were placed on the right and left mastoids. The signal was acquired using the left mastoid as reference, but was arithmetically re-referenced off-line to the mean of the left and right mastoids. Electrode impedance was kept below 5 KOhm. The signal was sampled at a rate of 512 Hz, and was filtered off-line between 0.3 and 30 Hz. We only analyzed low-falling and high-rising stimuli. These were always presented with mid-level stimuli in the presentation blocks. Any differences between the ERPs to the low-falling and high-rising tones can therefore not be due to different alternate stimuli in the presentation blocks. Epochs were defined spanning -100 to 900 ms from the stimulus onset. EEG to low-falling and high-rising tone deviants were averaged separately. We also separately averaged the EEG to 120 low-falling and high-rising tones when these were used as standards. To avoid any potentially confounding effects from preceding deviant tones, we selected 120 standard stimuli that were preceded and followed by a standard stimulus. Trials with eye movements and other artifacts were rejected. The percentage of rejection was on average 28% per condition (SD 15%) in the Chinese group; 20% in the English group (SD 9%), and 26% (SD 13%) per condition in the Thai group.

The mismatch negativity was analyzed using the F3 and F4 electrodes. These were electrodes where the MMN was largest on the lateral sites. First, difference waves (deviant minus standard) were calculated for the high-rising deviants minus standards, and low-falling deviants minus standards. Next, the most negative peak was found between 100 and 350 ms, and the mean amplitude for the windows spanning 100 ms centered around this peak was calculated for every channel, participant, tone type and session. Analyses were conducted on the mean difference in amplitude thus calculated and on the peak latency.

A later negative component was observed as well. Since we had no clear prediction as to the scalp distribution of this component and since the wave did not have a clear peak, analyses were conducted on the mean amplitudes to both the deviant and standard tones, and included a large number of electrodes. Statistical analyses were conducted on the mean amplitudes between 350–500 ms and 500–700 ms, based on visual inspection, using lateral (F3/4, F5/6, F7/8, FC3/4, FC5/6, FT7/8, C3/4, C5/6, T7/8, CP3/4, CP5/6, TP7/8, P3/4, P5/6, P7/8), as well as midline (Fz, FCz, Cz, CPz, Pz) electrodes.

ERP data were analyzed separately for low-falling and high-rising tones, using an (SPSS) General Linear Model multivariate repeated measures procedure with the within-participant factors: TEST TIME (pre/post training), and, when applicable, CONDITION (standard, deviant), HEMISPHERE (2 levels) and/or ANTERIORITY (5 levels). LANGUAGE GROUP was included as a between-participants factor (3 levels). When a two or three-way interaction was significant, separate analyses were conducted to determine the source of the interaction. For the late negativity only effects involving the factor Condition are reported below. When interactions involving factors with more than two levels were significant, F- and p-values were reported after the Greenhouse-Geisser correction to control for violations of sphericity [40].


Tests involving 34 lateral electrodes did not show any significant interaction of location with Language group except for a weak interaction of TESTTIME BY ANTERIORITY BY LANGUAGE for the low-falling deviants minus standards [F(8, 128) = 2.50, p = 0.076]. Numerically, the English group showed a larger frontal negativity than the Thai and Chinese before, but not after training. This effect is the same as the weak LANGUAGE BY TESTTIME interaction found for the F3 and F4 electrodes discussed in the main text.





event-related potential


fundamental frequency


least significant difference


mismatch negativity


not significant


standard deviation.



The authors would like to thank Pedro Alcocer for his help with running participants. EK is currently supported by NIDCD #5R03DC006160-02

Authors’ Affiliations

Linguistics, University of Florida
Department of Linguistics, University of California at San Diego


  1. Bluhme H, Burr R: An audio-visual display of pitch for teaching Chinese tones. Studies in Linguistics. 1971, 22: 51-57.Google Scholar
  2. Kiriloff C: On the auditory discrimination of tones in Mandarin. Phonetica. 1969, 20: 63-69.View ArticleGoogle Scholar
  3. Wang Y, Spence MM, Jongman A, Sereno JA: Training American listeners to perceive Mandarin tone. J Acoust Soc Am. 1999, 106: 3649-3658. 10.1121/1.428217.View ArticlePubMedGoogle Scholar
  4. Wayland R, Guion S: Training native English and native Chinese speakers to perceive Thai tones. Lang Learn. 2004, 54: 681-712. 10.1111/j.1467-9922.2004.00283.x.View ArticleGoogle Scholar
  5. Wayland R, Li B: Effects of two training procedures in cross-language perception of tones. J Phon. 2008, 36: 250-267. 10.1016/j.wocn.2007.06.004.View ArticleGoogle Scholar
  6. Gandour J, Harshman R: Cross-language difference in tone perception: A multidimensional scaling investigation. Lang Speech. 1978, 21: 1-33.PubMedGoogle Scholar
  7. Gandour J: Tone perception in Far Eastern languages. J Phon. 1983, 11: 49-175.Google Scholar
  8. Lee L, Nusbaum HC: Processing interactions between segmental and suprasegmental information in native speakers of English and Mandarin Chinese. Percept Psychophys. 1993, 53: 157-165.View ArticlePubMedGoogle Scholar
  9. Wayland R, Guion S: Perceptual discrimination of Thai tones by naive and experienced learners of Thai. Appl Psycholinguist. 2003, 24: 113-129.View ArticleGoogle Scholar
  10. Wong PC: Learning pitch patterns in lexical identification by native English-speaking adults. Appl Psycholinguist. 2007, 28: 565-585.View ArticleGoogle Scholar
  11. Wong PC, Perrachione TK, Parrish TB: Neural characteristics of successful and less successful speech and word learning in adults. Hum Brain Mapp. 2006, 28: 995-1006. 10.1002/hbm.20330.View ArticleGoogle Scholar
  12. Wang Y, Sereno JA, Jongman A, Hirsch J: fMRI evidence for cortical modification during learning of Mandarin lexical tone. J Cogn Neurosci. 2003, 15: 1019-1027. 10.1162/089892903770007407.View ArticlePubMedGoogle Scholar
  13. Näätänen R: The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology. 2001, 38: 1-21.View ArticlePubMedGoogle Scholar
  14. Tremblay K, Kraus N, McGee T: The time course of auditory perceptual learning: Neurophysiological changes during speech-sound training. Neuroreport. 1998, 9: 3557-3560. 10.1097/00001756-199811160-00003.View ArticlePubMedGoogle Scholar
  15. Kraus N, McGee T, Carrell TD, King C, Tremblay K, Nicol T: Central auditory system plasticity associated with speech discrimination training. J Cogn Neurosci. 1995, 7: 25-32. 10.1162/jocn.1995.7.1.25.View ArticlePubMedGoogle Scholar
  16. Menning H, Imaizumi S, Zwitserlood P, Pantev C: Plasticity of the human auditory cortex induced by discrimination learning of non-native, mora-timed contrasts of the Japanese language. Learn Mem. 2002, 9: 253-267. 10.1101/lm.49402.PubMed CentralView ArticlePubMedGoogle Scholar
  17. Näätänen R, Schröger E, Karakas S, Tervaniemi M, Paavilainen P: Development of a memory trace for a complex sound in the human brain. Neuroreport. 1993, 4: 503-506.View ArticlePubMedGoogle Scholar
  18. Tremblay K, Kraus N, Carrell TD, McGee T: Central auditory system plasticity: Generalization to novel stimuli following listening training. J Acoust Soc Am. 1997, 102: 3762-3773. 10.1121/1.420139.View ArticlePubMedGoogle Scholar
  19. Winkler I, Lehtokoski A, Alku P, Vainio M, Czigler I, Csépe V, Aaltonen O, Raimo I, Alkho K, Lang H, Iivonen A, Näätänen R: Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations. Cogn Brain Res. 1999, 7: 357-369. 10.1016/S0926-6410(98)00039-1.View ArticleGoogle Scholar
  20. Peltola MS, Bosch L, Tuomainen J, Ek M, Aaltonen O, Näätänen R: Native and foreign vowel discrimination as indexed by the mismatch negativity (MMN) response. Neurosci Lett. 2003, 352: 25-28. 10.1016/j.neulet.2003.08.013.View ArticlePubMedGoogle Scholar
  21. Chandrasekaran B, Krishnan A, Gandour J: Mismatch negativity to pitch contours is influenced by language experience. Brain Res. 2007, 1128: 148-156. 10.1016/j.brainres.2006.10.064.PubMed CentralView ArticlePubMedGoogle Scholar
  22. Kaan E, Barkley C, Wayland R, Bao M: Effects of Native Language and Training on Lexical Tone Perception: An ERP study. Brain Res. 2007, 1148: 113-122. 10.1016/j.brainres.2007.02.019.View ArticlePubMedGoogle Scholar
  23. Van Lancker D, Fromkin VA: Hemispheric specialization for pitch and "tone": evidence from Thai. J Phon. 1973, 1: 101-109.Google Scholar
  24. Van Lancker D, Fromkin VA: Cerebral dominance for pitch contrasts in tone language speakers and in musically untrained and trained English speakers. J Phon. 1978, 6: 19-23.Google Scholar
  25. Klein D, Zatorre RJ, Milner B, Zhao V: A cross-linguistic PET study of tone perception in Mandarin Chinese and English speakers. Neuroimage. 2001, 13: 646-653. 10.1006/nimg.2000.0738.View ArticlePubMedGoogle Scholar
  26. Wang Y, Jongman A, Sereno JA: Dichotic perception of Mandarin tones by Chinese and American listeners. Brain Lang. 2001, 78: 332-348. 10.1006/brln.2001.2474.View ArticlePubMedGoogle Scholar
  27. Wong PC: Hemispheric specialization of linguistic pitch patterns. Brain Res Bull. 2002, 59: 83-95.View ArticlePubMedGoogle Scholar
  28. Luo H, Ni-Jing-Tian , Li Z-H, Li X-O, Zhang D-R, Zeng F-G, Chen L: Opposite patterns of hemisphere dominance for early auditory processing of lexical tones and consonants. Proc Natl Acad Sci USA. 2006, 103: 19558-19563. 10.1073/pnas.0607065104.PubMed CentralView ArticlePubMedGoogle Scholar
  29. Näätänen R: Attention and brain function. 1992, Hillsdale, NJ: Lawrence Erlbaum AssociatesGoogle Scholar
  30. Sams M, Paavilainen P, Alho K, Näätänen R: Auditory frequency discrimination and event-related potentials. Electroencephalogr Clin Neurophysiol. 1985, 62: 437-448. 10.1016/0168-5597(85)90054-1.View ArticlePubMedGoogle Scholar
  31. Zatorre RJ, Belin P, Penhune VB: Structure and function of auditory cortex: music and speech. Trends Cogn Sci. 2002, 6: 37-46. 10.1016/S1364-6613(00)01816-7.View ArticlePubMedGoogle Scholar
  32. Ceponienè R, Lepistö T, Soininen M, Aronen E, Alku P, Näätänen R: Event-related potentials associated with sound discrimination versus novelty detection in children. Psychophysiology. 2004, 41: 130-141. 10.1111/j.1469-8986.2003.00138.x.View ArticlePubMedGoogle Scholar
  33. Zachau S, Rinker T, Körner B, Kohls G, Maas V, Henninghausen K, Schecker M: Extracting rules: early and late mismatch negativity to tone patterns. Neuroreport. 2005, 16: 2015-2019. 10.1097/00001756-200512190-00009.View ArticlePubMedGoogle Scholar
  34. Shafer VL, Morr ML, Datta H, Kurzberg D, Schwartz RG: Neurophysiological indexes of speech processing deficits in children with Specific Language Impairment. J Cogn Neurosci. 2005, 17: 1168-1180. 10.1162/0898929054475217.View ArticlePubMedGoogle Scholar
  35. Loui P, Grent 't-Jong T, Torpey D, Woldorff MG: Effects of attention on the neural processing of harmonic syntax in Western music. Brain Res Cogn Brain Res. 2005, 25: 678-687. 10.1016/j.cogbrainres.2005.08.019.View ArticlePubMedGoogle Scholar
  36. Koelsch S, Gunter T, Friederici AD: Brain indices of music processing: "Nonmusicians" are musical. J Cogn Neurosci. 2000, 12: 520-541. 10.1162/089892900562183.View ArticlePubMedGoogle Scholar
  37. Shestakova A, Huotilainen M, Ceponienè R, Cheour M: Event-related potentials associated with second language learning in children. Clin Neurophysiol. 2003, 114: 1507-1512. 10.1016/S1388-2457(03)00134-2.View ArticlePubMedGoogle Scholar
  38. Schröger E, Wolff C: Attentional orienting and reorienting is indicated by human event-related brain potentials. Neuroreport. 1998, 9: 3355-3358.View ArticlePubMedGoogle Scholar
  39. Oldfield RC: The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia. 1971, 9: 97-113. 10.1016/0028-3932(71)90067-4.View ArticlePubMedGoogle Scholar
  40. Greenhouse SW, Geisser S: On methods in the analysis of profile data. Psychometrika. 1959, 24: 95-112. 10.1007/BF02289823.View ArticleGoogle Scholar


© Kaan et al; licensee BioMed Central Ltd. 2008

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.