A multitude of sensory stimuli are processed constantly by our sensory systems. It has been suggested that the sensory system allots more processing resources to unexpected stimuli, that often require our immediate reaction, than to stimuli that are the predicted consequences of our own actions . Previous research has demonstrated that the processing of sensory consequences of self-produced actions is different from that of externally-produced stimuli [2–4]. For example, responses to self-produced tactile stimuli are suppressed relative to externally-produced stimuli in the somatosensory cortex [2, 3]. The phenomenon that sensory responses to self-produced stimuli are weaker relative to externally-produced stimuli has been interpreted using an internal forward model [5, 6]. The central nervous system generates an efference copy  of the motor command as a prediction of sensory consequences of one’s own action, and compares this prediction with the actual sensory feedback. When an accurate prediction of the actual sensory feedback is available, only a small prediction error between the intended motor action and the actual sensory feedback is generated, which in turn leads to a net cancellation of sensory input. When there is no efference copy or when the signals from the predicted and actual feedback do not match, a larger prediction error is generated, which translates to a larger response in the sensory or somatorsensory cortex.
Recently, several electrophysiological and neuromagnetic studies have demonstrated a similar suppression phenomenon in the auditory system. For example, self-triggered tones elicited suppressed event-related potentials (ERPs) or magnetoencephalography (MEG) responses as compared to the playback of the identical tones triggered by a computer [4, 8–11]. Several other MEG studies in humans have also demonstrated suppressed auditory cortical responses (e.g. M100) to self-produced speech as compared to the playback of pre-recorded speech [12–17]. Ford and her colleagues conducted a series of ERP studies to investigate the efference copy mechanisms of the auditory system in normal people and patients with schizophrenia [18–20]. The results showed suppressed N1 responses to self-produced speech compared with listening to the speech playback in the normal subjects, while suppression effect was not observed in patients with schizophrenia [18, 20]. In recent ERP studies where auditory feedback was pitch-shifted during vocalization, greater suppression effects (N1) of self-produced, unaltered voice were found compared with an altered voice or alien voice [21, 22]. These suppression findings suggest that the auditory cortex compares the actual auditory feedback against a prediction of expected feedback to distinguish self-produced speech from externally-produced sounds.
In a literature review of auditory suppression studies, the timing of self-triggered stimulation was usually predictable. For example, pure tones were presented immediately following the actions of a participant’s button press. However, the onset of pure tones was unpredictable when they were triggered externally by the computer [9, 13, 17, 21]. This confound leaves open the possibility that suppressed processing of self-triggered actions may be due to a fact that humans can precisely detect the temporal patterns of auditory stimuli triggered by their own actions, while the sensory consequences of externally-generated actions cannot be predicted as such. If sensory suppression did not exist when the timing of self-triggered stimulation was as unpredictable as that of externally-triggered stimulation, the suppression effect may be primarily attributed to an accurate prediction of stimulus timing. Alternatively, self-triggered stimulation could result in a general suppression of sensory events that occur with the motor act. That is, sensory suppression is due to a non-specific suppression of sensory events that relates to the motor act, and suppression of self-triggered stimulation would be independent of the delays between the motor act and the stimulus onset.
To clarify the effect of temporal predictability on the neural processing of self-trigged stimulation relative to externally-triggered one, several studies have been conducted in the auditory modality. Schafer and Marcus reported a suppression effect for N1 responses to self-triggered click sounds even when the sound onset relative to the motor act was delayed up to 4 seconds by a fixed time . Bäß et al.  examined the cortical responses (N1) to self-triggered tones relative to the identical tones that were triggered externally, where the frequency and the onset of self-triggered tones were either predictable or unpredictable. Results showed that, even when the onset of self-triggered tones was unpredictable, N1 responses were still suppressed relative to externally-triggered tones, although the amount of suppression varied across conditions with the largest suppression for predictable frequency and predictable onset. These studies suggest that suppression of self-triggered stimulation may be due to a non-specific movement-related suppression of sensory signals.
Contrasting findings, however, were reported in several recent pitch-shifted ERP studies of self-produced vocalization [23–25]. Behroozmand et al. [23, 25] reported enhanced P2 responses during active vocalization relative to passive listening when pitch-shift stimuli (PSS) were triggered by the computer with a random delay after the vocal onset (unpredictable), and only when the PSS occurred immediately after the vocal onset (predictable) did suppression effect exist . Liu et al.  did a similar study but compared the neural responses to the self-triggered PSS with those triggered by the computer during vocalization and listening. A random delay between the mouse click and the PSS onset was introduced for the self-triggered task, and the PSS were also randomly triggered by the computer for the externally-triggered task. The results showed that unpredictable self-triggered PSS elicited larger N1/P2 responses than unpredictable externally-triggered PSS, indicating an enhancement rather than suppression effect of self-triggered stimulation . These studies suggest that enhanced brain activity can be elicited to distinguish unpredictable self-triggered from unexpected externally-triggered stimulation. It should be noted, however, that several factors could possibly confound the validity of the conclusions in these studies. For instance, Behroozmand et al. [23, 25] compared the neural responses during active vocalization with those during its playback; the cortical responses could be dampened due to the different physical qualities of the sounds resulting from the bone conduction during vocalization, middle ear muscle contraction, and the response characteristics of the ear . One of the primary limitations in Liu et al.  is that the motor responses resulting from the finger movement (i. e. mouse click) were not corrected due to the lack of a motor-only task as a control condition, although the authors argued that the motor responses would not affect the neural responses to the PSS that occurred 500–1000 ms after the mouse click. This assumption, however, was not validated in all previous research. Therefore, the enhancement effect of unpredictable self-triggered stimulation observed in Liu et al.’s study  could be due to the effect of the motor act on the neural responses rather than the random delays between the motor act and the stimulus onset.
Given these contrasting findings, the role of temporal predictability in distinguishing self-triggered from externally-triggered stimulation is controversial. Whether sensory suppression is due to an accurate prediction of self-triggered stimulation or a direct consequence of self-triggered stimulation itself remains unclear. Therefore, the present ERP study was designed to examine the effect of temporal predictability on the neural processing of self-triggered stimulation relative to unexpected externally-triggered stimulation during self-monitoring of vocal production. The altered auditory feedback protocol [26, 27] was used in this experiment: subjects vocalized a vowel sound and heard the PSS in voice auditory feedback triggered by a self- or externally-produced stimulation. Temporal predictability of self-triggered PSS was manipulated as predictable or unpredictable by introducing fixed or random delays between the motor act and the stimulus onset. For the externally-triggered stimulation, the PSS was triggered by a computer with a random delay after the vocal onset such that subjects were incapable of predicting when the PSS occurred. We expected that the temporal predictability of stimulus delivery relative to the motor act would modulate the neural processing of self-triggered stimulation relative to unpredictable externally-triggered stimulation.