The purpose of the present study was to determine whether instructing participants to ‘compensate’ or ignore’ gradual (-2 cent increments per trial down to -100 cents) or constant changes (-100 cents) in auditory feedback could result in the voluntary suppression of compensatory responses and sensorimotor adaptation. Regardless of whether participants received the gradual or constant pitch manipulation, both singers and nonsingers could not intentionally suppress the compensatory response during FAF trials. The pattern of compensation observed when participants were instructed to ‘ignore’ the FAF was indistinguishable from the compensatory responses observed when they were instructed to ‘compensate’ for the FAF. Additionally, participants’ median 50 ms F
0 values suggested that the level of sensorimotor adaptation that occurred during the ‘ignore’ condition was similar to the adaptation observed during the ‘compensate’ condition. Voice F
0 values observed throughout the FAF (gradual and constant) trials indicates that both singers and nonsingers updated their internal models by adjusting their F
0 so that they initiated their vocal productions at frequencies closer to the intended target.
When participants were instructed to ‘compensate’ for the gradual presentation of FAF they correspondingly adjusted their F
0 in the opposite direction of the manipulation. That is, within 1500 ms of vocal onset, both singers and nonsingers increased their voice F
0 in the opposite direction of the perturbations in order to maintain pitch accuracy with the intended target (see Figure 2). Participants also exhibited sensorimotor adaptation while compensating for the gradual FAF manipulation. Participants’ data within 50 ms of their vocalization onset indicates that as the pitch manipulation progressively decreased, they recalibrated their internal models to initiate vocal productions at increasingly higher frequencies. In other words, participants adjusted their entire vocal production and initiated subsequent utterances at F
0 values similar to those produced on previous FAF trials. In doing so they maintained consistency in their vocal-motor plan for how they sang musical notes. This suggests that as participants were gradually compensating for the FAF, by changing their F
0 values in the opposite direction of the perturbation, they were also continually updating their internal models to account for the gradual decrease in the F0 of their auditory feedback.
Similarly, when participants were instructed to compensate for the sudden and large change (constant condition) in their auditory feedback, results indicated that participants increased their F
0 in the opposite direction of the manipulation. Interestingly, the F
0 values, on average, obtained during the first block of FAF trials were not statistically different than the F
0 values obtained on any other block of FAF trials. Thus, participants compensated to a similar degree across all FAF trials when presented with a constant shift (1 semitone) in auditory feedback. Furthermore, participants’ median 50 ms F
0 data indicated that sensorimotor adaptation occurred. As singers and nonsingers rapidly compensated for the FAF, they also adjusted their internal models to initiate F
0 values closer to the intended target. F
0 values were determined to be significantly different than the baseline F0 values from the second block (trials 6-10) of FAF trials onward. As a consequence, instructing participants to compensate for FAF resulted in similar responses to those observed previously in our laboratory  and by others using the FAF paradigm [2, 3, 6, 7, 17, 18, 35–37].
The finding that compensatory responses are not easily suppressed by instructions to ignore feedback is consistent with previous studies using FAF , formant frequency manipulations , and masking noise . A recent study by Munhall and colleagues  found that participants rapidly compensated for formant frequency manipulations when they were instructed to ignore the modified feedback. Moreover, when the manipulations were removed participants exhibited aftereffects. As a consequence, Munhall and colleagues  suggest that their data do not necessarily provide “evidence of a fixed-response system that cannot be adjusted with practice or strategies”, but rather argue that compensatory responses to vowel modifications are not strategic responses to the detection of auditory feedback manipulations. This is also congruent with the findings from the current study; however, it is uncertain whether repeated exposure (‘practice’) to subtle (2 cents) manipulations in auditory feedback would result in the voluntary suppression of compensatory responses to FAF.
For instance, when similar pitch shift values were presented incrementally across trials in previous studies from our laboratory (+/-2 cents and +/- 4 cents) e.g., , participants stated that they were unaware that their voice was manipulated in pitch. Munhall and colleagues also indicated that participants possessed no particular knowledge of the nature of the manipulation when formants were modified in small increments trial-by-trial e.g., . Indeed, it has been reported that an early automatic response to unexpected changes in auditory feedback occurs [2, 3, 29]. If this response assists with small, unexpected perturbations (as opposed to larger more obvious changes in auditory feedback) then the presentation of gradual shifts in auditory feedback may fall within a certain automatic compensatory range that cannot be suppressed voluntarily, nor may it require the ‘conscious’ detection of the error for the compensatory response to occur; see also . This is consistent with the results of Loui et al. , who reported that amusic (‘tone-deaf’) participants were able to reproduce the pitch direction of two successive single tones, although they were at chance discriminating pitch direction. That is, although amusics have difficulty perceptually identifying pitch changes that are smaller than a semitone , they are capable of producing the correct pitch direction as accurately as controls . This supports Loui et al.’s  notion that the auditory pathway responsible for vocal production may be distinct from the pathway responsible for conscious perception. Thus, compensating for altered feedback may occur without a participant’s awareness. Alternatively, it is possible that repeated exposure to large changes (e.g., 100, 200 cents) in auditory feedback may allow compensatory responses to be voluntarily controlled , e.g., . Regardless, the data presented by Munhall et al.  and the results of the current study suggest that motor preparation, initiation, and production of vocal utterances are heavily influenced by auditory feedback. Moreover, instructing participants to ignore changes in feedback does not appear to influence compensatory responding or alter the pattern of sensorimotor adaptation.
Auditory feedback has been shown to be important for accurate F
0 control, and it has also been shown to be influential during the acquisition of a novel musical piece. Finney and Palmer  found that trained pianists performance improved when auditory feedback was provided while learning a novel song. However, when the musicians were required to produce a well-rehearsed piece from memory, the removal of auditory feedback had no effect on performance . Similar to the trained singers in Zarate and Zatorre’s study , who could suppress compensatory responses to +/-200 cents (2 semitone) manipulations, it appears that musical training may allow musicians to perform in the absence or modification of auditory feedback. In regards to singing, one possibility is that presenting the pitch manipulations so they occur later into vocal production [such as 1000-1500 ms after vocal onset, which Zarate and Zatorre 17 used] may result in easier identification of FAF (e.g., efference copy violation), or it may allow for singers to rely on alternative components (e.g., muscle memory, kinesthetic feedback) of their internal model to suppress compensatory responding.
Conceptually, internal models are hypothesized to compare sensory feedback with motor acts by means of a comparator examining differences between perception and production. These differences are hypothesized to be computed based on a corollary discharge, such that the output of an internal model maps the motor commands (e.g., efference copy) with the expected sensory feedback from the actions. When a match exists between perception and production, the result is a net cancellation of the sensory input, which in turn causes a dampened sensory experience . Conversely, when there is a discrepancy between the perception and production of a motor act, the corollary discharge does not dampen the sensory feedback. As a consequence, there is an intensification of the sensory experience that potentially alerts us to environmental events .
For instance, in a series of event related potential (ERP) and magnetoencephalographic (MEG) studies using FAF, Heinks-Maldonado and colleagues [42, 43] found that an early sensory detection component (e.g., M100, which occurs approximately 100-150 ms following auditory stimuli) generated in the auditory cortex was maximally suppressed when a participant heard his own unaltered voice. When participants received pitch-shifted feedback the researchers observed an increase in the amplitude of the M100 relative to when they received unaltered auditory feedback . Participants in the current study may have also exhibited similar cortical activity when presented with FAF, as they initiated compensatory responses. However, presenting the pitch manipulations so they coincide with vocal onset may make compensatory responses more difficult to suppress than if the FAF was to be presented mid-utterance.
When FAF is delivered mid-utterance , e.g., , the efference copy associated with the motor commands is not initially violated, as the participant initially hears exactly what they are producing. When the FAF occurs, it is possible that the nervous system has already determined that the motor commands are appropriate for the target note produced and that the error perceived is due to something external (e.g., the experimenter). A study from our group  directly addressed this issue. FAF was either presented at utterance onset or mid-utterance, and in some cases the mid-utterance FAF was induced by removing ongoing FAF (which was present from utterance onset). The results of this study showed that the compensation response to FAF at utterance onset was much larger than the response to mid-utterance FAF. Furthermore, the amplitude of compensation to removing FAF mid-utterance was identical to initiating FAF mid-utterance, indicating participants viewed the removal of ongoing FAF in a similar way as the introduction of an FAF change. The results of this study were taken as evidence for different mechanisms for vocal control at utterance onset (where the goal is to achieve a target pitch) and during mid-utterance (where the goal is to maintain pitch at a steady level). The mechanism for vocal control at utterance onset likely involves the efference copy, whereas the mid-utterance mechanism can rely more exclusively on ongoing auditory feedback. This is relevant when comparing the results of this study, which introduced FAF at utterance onset, and the results of Zarate and Zatorre’s study , which introduced FAF mid-utterance. Given the extensive experience trained singers possess with vocal control, participants in Zarate and Zatorre’s  study may have relied more on kinesthetic feedback (e.g., vocal-fold positioning) to maintain the pitch of their voice during FAF trials, whereas nonsingers, possibly due to their lack of formal music training, were unable to suppress compensatory responses. On the other hand, when the FAF coincides with vocal onset, at no point does the perceived sensory feedback match the sensory feedback predicted by the participants' internal model, resulting in an intensified sensory experience (efference copy violation) e.g., [42, 43]. Regardless of whether participants were aware of the FAF manipulations, compensatory responses were initiated to subtle and large changes in auditory feedback. This suggests that once the efference copy has been violated, participants’ internal models are automatically adjusted and compensatory responses are initiated in an attempt to offset the deviant auditory feedback.
There is substantial evidence that auditory feedback is influential in achieving precise vocal control. Murbe et al.  and Larson et al.  have also demonstrated that kinesthesia substantially contributes to singers’ pitch control at the beginning of an utterance (< 100 ms). After 100 ms, auditory feedback participates in F
0 control . However, Munhall and colleagues  found that instructing participants to rely on the kinesthetic properties for F
0 control was insufficient to suppress compensatory responding to formant frequency manipulations. As the participants in Munhall et al.  were not musically trained then it may be that they were unable to utilize the kinesthetic feedback as efficiently as trained singers to suppress compensatory responses.
Our results and those of others [17, 18, 24–26, 29] suggest that the processes involved in comparing the actual sensory consequences with the expected sensory consequences during vocalization is dependent on various forms of sensory feedback (e.g., auditory, kinesthetic). Singers’ ability to ignore FAF e.g.,  may result from relying less on auditory feedback and more on alternative feedback strategies (e.g., use of kinesthetic feedback) once a musical piece has been memorized . Alternatively, a more likely explanation is that participants may utilize the information they receive following vocal onset differently to maintain a stable voice F
0 as opposed to the information they receive mid-utterance . Overall, it appears that vocal training may only be effective in suppressing compensatory responses to FAF in instances where the perturbations are presented mid utterance.