In the present study, the pitch perturbation paradigm was used to address the question of whether the extent of disparity between voice F0 output and its auditory feedback modulates motor-induced suppression of auditory neural responses at voice onset. Results of the analysis showed that the N1 ERP component was significantly suppressed during active vocalization compared with passive listening to unaltered (0 cents) and pitch-shifted voice feedback at 50, 100 and 200 cents stimulus magnitudes (Figure 1). However, when voice F0 feedback was shifted at 400 cents, the N1 suppression was almost completely eliminated. Also, the calculation of the normalized N1 suppression showed that the mean of normalized N1 suppression was largest (almost 52%) for unaltered voice feedback (0 cents shift) and decreased to 37%, 41%, 26% and 5% for pitch shift magnitudes of 50, 100, 200 and 400 cents, respectively (see bar plots in Figure 3).
Separate analysis of active vocalization and passive listening conditions revealed that the maximum N1 suppression for unaltered feedback and its reduction or elimination for pitch-shifted feedback resulted from a finding that during vocalization, the amplitude of N1 responses became larger (less suppressed) as the PSS magnitude increased whereas no such systematic changes of N1 responses occurred across PSS magnitude during passive listening. Our results indicated that 400 cents pitch-shifts elicited N1 responses that were significantly larger (more negative or less suppressed) than those elicited by 0, 50, 100 or 200 cents shifts during vocalization. However, no such a difference was observed for N1 responses to different stimulus magnitudes during passive listening. These findings indicate that the motor-induced suppression develops for small and moderately large disparities (up to 200 cents shift) between predicted (efference copies) and actual voice F0 feedback but not for very large shifts (e.g. 400 cents).
In addition, our results revealed a significant PSS magnitude × electrode position interaction only during active vocalization, indicating that the scalp distribution of N1 responses were different across stimulus magnitudes. As can be seen in Figure 2, N1 potentials have a prominent fronto-central distribution for 400 cents PSS magnitude that is different from those for other stimuli. This difference arises from larger (less suppressed) neural responses to the largest pitch error in voice feedback (400 cents) compared with smaller stimulus magnitudes during active vocalization, indicating that the motor-induced modulation of neural generators of N1 is different for larger compared with smaller disparities between vocal pitch output and its auditory feedback.
The findings of the present study are consistent with those of Heinks-Maldonado et al.  in which it was shown that the MIS of auditory neural responses at voice onset were greater for unaltered voice feedback and became smaller as feedback was pitch shifted or modified with an alien voice. In addition, results of our study expand upon the findings by Heinks-Maldonado et al.  by showing that MIS decreases or is even eliminated (400 cents shift) with increases in the magnitude of pitch perturbation in voice auditory feedback.
Other studies have suggested that, in addition to the acoustical parameters (e.g. pitch frequency), the disparity between spatial and temporal aspects of self-generated feedback with respect to efference copies can also modulate the neural processing of sensory input during execution of motor tasks. In the somatosensory system, perturbation in the trajectory and onset time of self-generated tactile stimulations were associated with an increase in the intensity of tickle sensation, indicating that unpredictable feedback was less suppressed by efference copies of motor commands . In the auditory system, MIS of auditory responses was shown to develop only for conditions where there was no delay between the onset of motor actions such as button press  or vocalization  and the onset of auditory stimuli (temporal predictability). Similarly, auditory neural responses to self-triggered (button press) tones were shown to be maximally suppressed compared with passive listening for conditions where the frequency and onset time of stimuli were predictable, and the suppression was reduced if the frequency or onset time was unpredictable . Consistently, results of the present study indicate that pitch predictability with relevance to efference copies for unaltered voice feedback results in a greater MIS of auditory responses compared with pitch-shifted feedback.
The MIS effects described above support the notion of an internal forward model for execution and monitoring of self-produced motor tasks. The forward model is suggested to incorporate efference copies of the motor commands that are used to make a comparison with actual sensory feedback . This comparison examines the degree of disparity between spatial (e.g. trajectory), temporal (e.g. time delay) or acoustical (e.g. pitch) features of sensory feedback and the efference-based predictions that affect the processing of sensory neural information. This characteristic may enable the sensory-motor mechanisms to identify the source of sensory stimulations by monitoring the degree of feedback predictability via efference copies to distinguish between self- and externally-generated inputs. Our results suggest that the onset of unaltered voice feedback elicits N1 responses that are maximally suppressed by the efference copies of motor commands during vocal production. Suppression becomes less pronounced for moderate pitch disparities (e.g. 50, 100 or 200 cents shifts) and is almost completely eliminated for large pitch shifts (400 cents) in voice feedback. These findings indicate that the pitch frequency is possibly one of the important voice components in identification of self-voices during vocalization or speaking.
In addition to a role of MIS in identifying the source of feedback, suppression of auditory neural responses was suggested to play an important role in enhancing neural sensitivity for detecting unexpected changes (error) in self-voice feedback. A study by Eliades and Wang  showed that while Marmoset monkeys vocalized and received their own unaltered voice feedback, some cortical auditory neurons reduced their firing rate (suppression), but then significantly increased their activity in response to pitch-shifted voice feedback. However, other neurons that increased their discharge rate (excited) in responses to unaltered voice feedback during vocal production did not respond to pitch shifts in voice auditory feedback. These data suggest that during vocal production, MIS of some cortical auditory neurons by means of efference copies of motor commands may provide a mechanism to enhance their neural sensitivity for pitch error detection in the feedback of self-produced vocalizations. It has also been demonstrated in humans that when pitch shifts were presented after the onset of self-produced voice [25, 29, 30] or musical sounds , ERP responses were enhanced during active production of the motor task (e.g. vocalization or piano play) compared with when subjects passively listened to the playback of the same self-produced voices or music.
However, the extent of vocalization-induced enhancement was shown to be greater for 100 and 200 cents compared with 500 cents stimulus magnitude . This latter effect was suggested to occur due to the fact that the motor act of vocalization increases neural sensitivity to detect F0 feedback perturbations in order to accurately detect and correct for vocal pitch errors during speaking. However, when feedback pitch was shifted at 500 cents, the vocalization-induced enhancement of ERPs was reduced, suggesting the system may have interpreted it as an external sound, and consequently became less sensitive. Therefore, identification of the source of auditory stimulation and systematic tuning of neural sensitivity based on the degree of disparity between voice F0 and its feedback may be important for vocal production because if the audio-vocal system was equally sensitive to pitch changes in self and externally-generated voices, variations in the pitch of environmental sounds or voices from different speakers could possibly lead to fluctuations in a person's voice during speaking.
Despite the fact that PSS magnitude is shown to modulate neural responses to voice feedback, it is still not clearly understood why ERPs (N1 component) are suppressed at voice onset [7, 8] and, in contrast, are enhanced (predominantly P2 component) when the pitch shifts occur in the middle of vocalizations [29, 32]. One possible explanation for the differential effect of stimulus onset time is that the reduction of N1 suppression at voice onset for larger PSS magnitudes reported previously  and in the present study may reflect mechanisms that enable the system to monitor and maintain an intended vocal output by subtractive comparison between actual voice feedback and internal representations provided by efference copies. Therefore, the unaltered auditory feedback from self-generated vocalizations that closely match the internally-represented feedback are more strongly suppressed at voice onset because they are fully predicted by the efference copies of motor commands. However, after voice onset, feedback-based monitoring of vocal output may rely on comparing the current state of incoming feedback with a representation that is continuously updated by feedback from previous vocalization states. Therefore, instead of suppression, the audio-vocal system becomes more sensitive and highly responsive when disparities emerge between the parameters of voice and its feedback in the middle of vocalization.
The above explanation suggests that the system performs at least two different functions, which require different mechanisms: one function of monitoring voice auditory feedback is to identify the source of voice feedback , and the second function is to correct for errors in production . The first function takes place at the onset of vocalization, whereas the second function is activated after vocal onset and becomes important during vocalization . While the details of these processes remain unknown, the suppression of cortical neural activity at vocal onset by means of motor-driven mechanisms may contribute to enhancing neural sensitivity for detecting pitch variations during vocal production. The brain may utilize the motor predictions to determine the source of incoming feedback in order to systematically decrease neural sensitivity to variations in the feedback of those voices that are not recognized as being self-generated. This proposal is supported by earlier findings in primates suggesting that there might possibly be a link between neural suppression and sensitivity enhancement to unexpected changes in voice F0 feedback . Because the N100 component in the present study was most sensitive to feedback perturbations at vocal onset, and the P200 component is most sensitive to perturbations during vocalization , it is reasonable to suppose that these components represent the two different functions of the audio-vocal system, identification of self from external, and the monitoring of self vocalization.
With relevance to the neural processes of voice monitoring and control leading to MIS of auditory responses discussed above, a question remains as to what brain areas are involved in auditory feedback-based monitoring and control of vocal output during vocal production or speech. The anatomical organization of the audio-vocal mechanism has been widely studied using functional neuroimaging techniques in a variety of speech production and perception tasks. Results of these studies proposed an audio-vocal integration circuitry, including neural areas such as the superior temporal gyrus (STG), superior temporal sulcus (STS), planum temporale (PT), pre-motor cortex (PMC), inferior frontal gyrus (IFG), anterior insula [34, 35] and the anterior cingulate cortex (ACC)  that may be involved in online monitoring and control of voice F0. Moreover, the bilateral increase in the activity of the superior temporal areas (mainly STG and STS) was reported in studies when human subjects received pitch-shifted feedback of their own voice compared with unaltered feedback during vocal production [37, 38]. A similar effect of increased activity in superior temporal areas was also reported in conditions where feedback disparity was generated by introducing formant shifts  or voice-gated noise  in the auditory feedback during self-vocalization. The superior temporal activities were also shown to be significantly greater for passive listening to the playback of self-speech compared with when subjects actively produced them . Results of the above neuroimaging studies are consistent with findings of the electrophysiological recordings in the present study that showed the increase in N1 activity (less suppression) for larger pitch errors (e.g. 400 vs. 0 cents) in voice feedback during active vocalization, leading to diminished MIS for larger pitch errors in voice feedback during vocalization compared with passive listening. These results suggest that the suppression of cortical auditory areas is likely to arise from neural mechanisms that utilize an internally-predicted representation (efference copies) of intended vocal output to monitor and control for feedback pitch error during active vocal production of speech sounds. Such a characteristic may be an important aspect of sensory-motor integration for distinguishing external (erroneous) from self-generated stimuli for maintaining the acoustical parameters of intended vocal output.