The main findings were that focused listening during perceptual training, as well as passive stimulus experience during MEG recording, constituted sustained increases in evoked activity in anterior auditory cortex 200 ms after stimulus onset. The amplitude gain between pre- and post-training MEG recording was larger for the speech stimuli than for the noise stimulus, which was not involved in the training and participants were less exposed to. Amplitudes of the P1m and N1m responses, which have shorter latencies, were not significantly modulated. Multivariate analysis identified three distinct spatio-temporal patterns of brain activity related to the increase in P2m across the three recording sessions, changes between pre-training sessions, as well as differences between the responses to the trained stimuli, which were evident after training only. Each result is addressed below.
Trajectory of behavioural performance
All individuals improved in their ability to identify the stimuli over the time course of the study. Notably, the improvement became evident during the last days of training, whereas the group-mean performance was not significantly different between the first days. This trajectory of learning during the early part of training is different from the time course of early improvement in pitch discrimination learning for example. For changes in pitch discrimination performance, it has been reported that the strongest gain occurred at the beginning and performance reached a maximum asymptotically with smallest gain at the end of training [43, 44]. Similar time courses showing large initial improvement have been found when learning discrimination of interaural time and level differences . In contrast, the behavioural improvement in this study occurred during the later sessions. It seems that for the stimulus identification task in this study, the participants first had to establish and adjust a categorical boundary, and the effect of training translated later into a behavioural consequence. Thus, we speculate that some implicit learning took place at the early stage of the training procedure.
Trajectory of brain responses
Previous analysis of N1m and P2m amplitudes showed a significant difference in the trajectories of changes in the N1m response and the P2m response. The P2m amplitude was constant during a recording session while it increased between the end of the first and beginning of the second pre-training session on a later day. Consolidation during a night of sleep seems to be important for this P2m gain. In contrast, the N1m amplitude decreased within a session but recovered between sessions . Recent studies corroborated our observations about the time course of P2m changes . The finding that changes in P2m amplitude occurred with a delay of one or more days is consistent with reports of Atienza et al. [23, 47] that the amplitudes of P2 and mismatch negativity (MMN) responses increased over days after training but not during the EEG recordings. P2 amplitude changes were also not seen when training involved easier VOT contrasts in a brief single session recording . Collectively, these examples reinforce the notion that N1 and P2 reflect different neural sources that are differentially affected by time and task. Therefore we propose that the gain in P2m amplitude between sessions reflects the cumulative effect of passive and active listening during the time interval from beginning of the first to beginning of the second session. Moreover, a change in the P2m amplitude between the two pre-training recordings reflects the effect of stimulus experience during the first session but is not affected by sound exposure during the second session. Accordingly, the amplitude change between the second pre-training and the post-training sessions includes the effects of passive listening to the stimuli during the pre-training session and active listening during the training. Only the P2m amplitude recorded in the first session can serve as an estimate of a pre-experimental baseline.
According to our previous studies, when P2 amplitudes were seen across multiple days of stimulus experience, the retention of these P2 changes was surprisingly long lasting compared to the time interval of acquisition . For example, even one year after the first recording, the P2 amplitude exceeded the initial amplitude. We interpreted this type of response increase to be a part of learning in that repeatedly presented sounds become familiar  and contribute to the enhanced representation of the implicitly learned stimulus, but fall below the threshold of learning that has a behavioural consequence . Thus, a certain amount of stimulus experience without performing a specific task seems to contribute to perceptual learning. This point is reinforced by results from a frequency discrimination experiment where participants improved their ability to discriminate identical stimuli through focused listening training. Repeated exposure and focused attention resulted in perceptual gains, even though discrimination was not possible because the stimuli were identical .
P2m change in relation to the stimulus type
A gain in the P2 amplitude has been reported in several training studies involving different stimuli and tasks, thus the P2 gain is not unique to identification of a pre-voicing interval. In this study the P2m amplitude, elicited by the noise stimulus, increased between the pre-training sessions by a similar amount as the P2m amplitude gain for the speech stimuli. However, further P2m increase between the second pre-training and the post-training sessions was not significant for the noise stimulus. Keeping in mind that the noise stimuli were used during MEG recordings only, these data suggest that the effect of passive stimulus experience saturated after the first session. Considering the results of previous studies that the observed P2 gain is widely independent of the stimulus material, the time course of the P2 changes for the noise stimulus helps to make a reasonable assumption about the contribution of passive stimulus experience during the second MEG session to the P2 increase. The P2m response for the speech stimuli continued to increase between the second pre-training and the post-training sessions with an effect size similar to the effect of listening during the first MEG session. This P2m increment resulted from the cumulative effects of active listening during five days of identification training and passive listening during the second MEG session. Given the small increase for the noise stimulus, we take this as an estimate for the effect of stimulus experience during the second MEG session and assume that the P2m increase between the second and the third measurement can be attributed by far to the effect listening during the training. Still we do not know how the additive effects of continued stimulus experience and active auditory processing related to the perceptual task contributed to the modulation of P2m amplitude during training. It seems that different neural mechanisms contribute to the P2 increase.
Interestingly, the beamformer analysis revealed a spatio-temporal pattern of activity in bilateral anterior auditory cortices, which was specific for the change between the two pre-training sessions but was not involved in further change during the training. Further studies are required for identifying which property of the training procedure effectively induced performance increase and gain in brain responses.
An argument for enhanced object representation
Auditory evoked P2m responses in the 200 ms latency range were strongly modulated after active and passive listening. To interpret the functional significance of P2m changes, it is important to discuss what happens in the 200 ms latency range during auditory processing. When a sound is heard, the auditory system performs a complex spectro-temporal analysis involving a hierarchy of processing steps within the auditory pathways [50–52]. Sound features like spectral complexity, frequency transitions, and rhythm are already extracted by this time and processed by nuclei in the auditory midbrain . The role of the auditory cortex is to enhance such features and to organize the acoustical elements into an object . Näätänen and Winkler  described the initial storage of sensory information as expression of feature traces. Components of the auditory evoked N1 wave reflect this stage, indicating that the auditory information is present at the level of auditory cortex, but not yet accessible for conscious perception. As an example, changes in voice onset time are evident by the time they reach auditory cortex [35, 55] and are reflected in amplitude and latency of the N1 response. However perception of the VOT according to categorical boundaries that differentiate syllables depends on further processing, and is strongly influenced by experience . Reaction time studies also reinforce that one or two-syllable words are accessible about 200 ms after word onset . Therefore, it can be said that this 200-ms time window includes the time required for bottom up processing of acoustical information as well the time required for comparison with contextual information.
Whereas Näätänen and Winkler  used the term ‘stimulus representation’ in contrast to a ‘pre-representational’ stage as reflected in the N1 response, we prefer the term ‘auditory object representation’. The ‘auditory object’ was initially referred to as a construct having a visual equivalence , however it is now more generally used for auditory sensory information that is susceptible to figure-ground segregation and involves a level of abstraction so that information about the object can be generalized between sensory experiences even across sensory domains . At 200 ms latency, the neural representation of an auditory object is established and now accessible for further conscious processing. Näätänen and Winkler  discussed the 200-ms activity in terms of the MMN rather than the P2, which is the difference between the response to an infrequent deviant stimulus and a more frequently presented standard stimulus, and reflects the result of comparing incoming stimuli with the memory trace established by the standard stimulus. We chose to use a different experimental approach whereby repeated presentations of the same stimuli were used to evoke a P1-N1-P2 response instead of an MMN response. The intention behind our approach was to use an evoked response (e.g., P1-N1-P2) that could more easily be defined in individuals and might one day be clinically applicable in the study of people with communication disorders. Moreover, the stimulus presentation paradigm and identification task are more similar to one another in that discriminative processes are not being activated. With that said, Atienza et al. [23, 47] reported similar trajectories for plastic changes in MMN and P2 responses which supports a possible link between the two types of evoked responses and shared neural mechanisms.
Sources in anterior auditory cortex
Although P2 source localization has been described as difficult [28, 41], we found significant separation between P2m and N1m sources, which is consistent with earlier neuromagnetic findings of P2m sources located approximately 10 mm anterior and 5 mm medial to N1m . Neuroimaging studies have linked auditory object representation to the anterior auditory cortex. More specifically, there is evidence of preferred firing patterns for animal calls in anterior lateral part of monkey superior temporal gyrus and the caudolateral part responding to location cues. Together they help to established a dissociation of ‘what’ and ‘where’ pathways in auditory processing [61, 62]. The concept of processing the sound object in anterior and the spatial information in posterior auditory cortex has been reinforced by animal studies  and human studies [64, 65]. Specifically, areas in the anterior superior temporal plane have been shown to be responsive for auditory objects .
Specificity for the learned stimulus difference
Perceptual learning changes the way in which the trained object is represented and processed in the brain . Accordingly, a difference in neural representations of the trained stimuli should emerge after the training. Using entirely data-driven multivariate analysis, we found a spatio-temporal response component that differentiated the ‘mba’ and ‘ba’ responses after training only. Moreover, this analysis demonstrated that the spatio-temporal patterns of brain activity were different for the contrast between the two pre-training sessions and between post- and pre-training. Although the P2m amplitude increased bilaterally and no main effect of hemispheres was significant in the analysis of equivalent dipoles, the difference between responses to the speech stimuli was mostly expressed in the right anterior auditory cortex. This specific activity emerged during the late part of the P2m complex, and supports our opinion about auditory object representation. In the literature, discussing hemispheric specialization, the right hemisphere has been shown to be involved in spectral processing whereas the left hemisphere predominantly processes temporal fine structures . However, a specialization for fine pitch discrimination requires some integration over time and an asymmetry for integration times has been proposed with longer integration time (150–200 ms) in the right hemisphere and shorter integration time in the left hemisphere (20–40 ms) . Longer integration times may facilitate object processing in the right hemisphere. Accordingly, specific sensitivity of the right anterior auditory cortex for object processing has been concluded from a PET study . Moreover, in a study of detecting the direction of frequency sweeps in frequency-modulated tones in gerbils, a hemispheric asymmetry was found and was suggested to be a precursor of the organization of music and language in humans; the left auditory cortex was more involved in local processing of temporal fine structure, whereas more global processing used the right auditory cortex .
Sensation versus object representation
The ‘reverse hierarchy theory’ of perception  proposes that spectral components of an auditory stimulus are initially separately received, then integrated during auditory processing, and the auditory object becomes accessible for perception only at a higher level of object representation. In order to distinguish between phonologically similar stimuli, the listener has to scrutinize the sounds carefully, which usually requires stimulus repetitions, to gain access to finely structured details, represented at lower level within the sensory hierarchy. The stimuli used here were spectrally identical but differed in timing. This means perceptual training could have either improved the ability to access such lower level stimulus features (such as the temporal VOT cue) or established new object representations of each stimulus. Because our results of brain activity changes in the 200-ms latency range and sources in the anterior auditory cortex support the latter, we suggest that identification training forced participants to attach a label to each stimulus, which in turn generated separated objects. This point is reinforced by the fact that our analyses suggest that each speech stimulus was represented differently after training.
Behavioural improvement was significant in the last half of the training but not within the first days. In contrast, brain changes were evident already between the pre-training sessions. Thus the trajectories of behavioural performance and brain responses were essentially different. Changes in the brain responses seem to precede behavioural performance. This again is consistent with our concept that learning first builds a strong representation of the auditory objects, which in turn allows the participant learning to identify the subtle differences between stimuli.
Based on the temporal and spatial information obtained in our study, we propose that perceptual learning and training result in plastic reorganization at the level of object representation. In contrast, we did not find significant indications for plastic changes of the P1m and N1m responses, which are both thought to signal stimulus changes at a sensory level, which would be indicative for early sensory processing of the trained subtle stimulus differences in pre-voicing time. In contrast to the absence of detectable changes in the P1m and N1m responses in our study, strong neuroplastic changes in early primary auditory responses had been found in perceptual learning in animal studies [70–73] and in human auditory evoked responses [74–76]. Common to those studies was that the spectro-temporal differences in the stimuli were larger than in our current study and perceptual learning as well as neurophysiological changes occurred rapidly. The differences between studies indicate that it is important to discuss the experimental findings always in the context of the experimental conditions.
Potential effect of attention
Active and passive listening might have altered the way participants attended to the stimuli. Although the MEG recording was performed under passive listening conditions, that did not require directed attention, the speech stimuli may have become more salient after learning and may have captured more attention in later MEG sessions compared to the first one. The effect of attention on the auditory evoked response in the 200-ms latency range has been described in ERP recordings as a long lasting negative wave Nd  or processing negativity , both with similar scalp topography as the P2 wave. Because of increased negativity at same latency as P2 in total, the P2 amplitude decreases (rather than increases) with attention [40, 79, 80]. Although attention might have modulated the effect of stimulus experience and of training, it seems unlikely that changes in attention between blocks of different stimuli and between MEG recording sessions can explain the P2m amplitude increases observed in this study. The Nd wave or the processing negativity is strongest when an active task is involved. For this reason we chose to avoid such compromising effects on the P2m amplitude by using a passive listening paradigm for the MEG recording.
The P2 response as an indicator for learning
In this training study the P2 response showed remarkable neuroplastic modulation. However, multiple stages of learning are involved, and we have to differentiate carefully between those when relating the observed P2 changes to learning and training. It seems that the P2 amplitude does not reflect a straightforward brain-behaviour relationship. Instead it seems as if the P2 amplitude indicates facilitation of implicit memory for the auditory object that precedes any perceptual change. The increased object representation is an essential part of learning and allows the listener to access details in the sensory representation, which in turn permits the correct identification of phonetically similar objects and potentially even categorical perception. Interestingly, the amplitudes of brain activity and behaviour follow different trajectories over time. The gain in P2 amplitude was delayed with respect to the time of stimulus experiences, thus suggesting effects of neural consolidation. On the other hand, the gain in P2 amplitude preceded an improvement in performance, again suggesting its role in implicit learning.