Ethics statement
The Ethics Committee for Human Research at Macquarie University approved the experimental methods used in this study (approval number: HE23NOV2007-D05579). Written informed consent was obtained from all the participants.
Subjects
Twenty-four paid volunteers participated in the study (13 female; mean age: 22.38 years; SD: 2.42). All the subjects were native speakers of English, passed a hearing screening for both the ears (hearing thresholds within 20 dB HL for 500 Hz, 1 kHz, 2 kHz and 4 kHz in both ears), and were strongly right handed as measured by Edinburgh Handedness Inventory [30]. Data from one additional participant was removed from analysis due to excessive amount of artefact in their EEG (more than 30% of the trials were rejected).
Stimuli
Speech stimuli were 80 pairs of sentences with either an early phrase boundary (EPhB) or a late phrase boundary (LPhB). Some of the sentence pairs were taken from Frazier and Rayner [31], and the remaining were created by a linguist (third author). Both the early and late phrase boundary sentences contained the same words, but the presence of phrase boundary was different between them. For example (# indicates a phrase boundary):
-
1.
Because John studied # the subject matter is clearer now (EPhB).
-
2.
Because John studied the subject matter # it is clearer now (LPhB).
In these sentences, the noun phrase “the subject matter” could be either the subject of the second phrase “the subject matter is clearer now” (early phrase boundary, example 1) or object of the verb “studied” (late phrase boundary, example 2) depending on the position of the phrase boundary. Stimulus example waveforms are shown in Figure 1.
The sentences were spoken by an adult male speaker of Australian English. The speaker did multiple repetitions of the sentences at a normal rate, which were recorded using a unidirectional microphone, and digitized at 44100 Hz with a bit depth of 16. The recorded sentences were analysed using Praat software (Version 5.0.31; [32]) for duration, frequency and intensity measurements. From the pool of 80 sentence pairs, 48 pairs were selected as the stimuli for the present study. This selection was done so that the phrase boundary occurred at approximately the same time for all early phrase boundary sentences (M = 1225 ms, SD = 52 ms) and for all late phrase boundary sentences (M = 1986 ms, SD = 112 ms). None of the sentences were acoustically manipulated, since it would affect the naturalness of the stimuli.
Acoustic analysis data of the pre-boundary syllable was compared with the same syllable in the non-phrase boundary condition. The comparison revealed that the syllable at a phrase boundary was longer in duration [M = 363.29 ms (SD = 112.83 ms) vs. M = 237.72 ms (SD = 93.42 ms); t(95) = 20.11, p = 0.0001], lower in intensity [M = 77.92 dB (SD = 2.41 dB) vs. M = 79.49 dB (SD = 2.29 dB); t(95) = −7.21, p = 0.0001] and was followed by a pause (M = 110.63 ms, SD = 52.12 ms). The pre-boundary syllable was also characterised by a rise in pitch [M = 33.13 Hz (SD = 24.66 Hz) vs. M = −15.87 Hz (SD = 13.37 Hz); t(95) = 17.15, p = 0.0001] for 146 ms (SD = 111 ms) on average. The acoustic analysis confirmed that the phrase boundary was created by duration, intensity and pitch cues.
Ninety-six filler sentences were created that did not include a phrase boundary. The filler sentences were approximately the same length as the experimental sentences. They were included to prevent any ERP effects resulting from subjects habituating to the same type of sentence structures. The ERP responses to these filler sentences were not analysed. Two examples of filler sentences are given below.
-
1.
The reporters were frustrated by the politician’s answers to their questions
-
2.
The tourists were extremely dispirited before they reached the Himalayas
Order of presentation
The experimental and filler sentences were presented in pseudo-random order, with the constraint that the same sentence was not presented twice in a row (i.e., one with an early phrase boundary, and one with late phrase boundary). The stimuli were presented diotically via Sennheiser HD 280 Pro headphones (Sennheiser electronic GmbH, Wedemark, Germany). The sentences were presented with an inter-stimulus-interval (ISI, stimulus offset to next stimulus onset) of 2.5 seconds. The stimuli were divided into two blocks of 96 sentences each. Shorter stimuli blocks are recommended for obtaining EEG data with less movement related artifacts [11].
Recording the electroencephalogram (EEG)
Each participant was seated in a comfortable chair placed 1 m away from computer screen. They were fitted with an electrode cap that held sintered Ag-AgCl electrodes placed at 30 positions on the scalp in line with the 10–20 system (FP1, FP2, F7, F3, FZ, F4, F8, FT7, FC3, FCZ, FC4, FT8, T7, C3, CZ, C4, T8, TP7, CP3, CPZ, CP4, TP8, P7, P3, PZ, P4, P8, O1, OZ, and O2). The ground electrode was positioned between FPz and Fz. Electrical activity was recorded from both the mastoids, with the left mastoid (M1) acting as the online reference. Vertical eye movements (VEOG) were measured with electrodes placed above and below the left eye. Horizontal eye movements (HEOG) were measured with electrodes on the outer canthi of each eye. The electrodes were adjusted until the impedance was below 5 kΩ.
After the electrodes had been fitted, headphones were placed on the participant’s ears, which presented the experimental sentences diotically. The subjects were told that they should ignore the sounds in the headphones, and focus their attention on the silent video on the computer screen. This video did not include subtitles because there is evidence that commas in the written text generate a CPS-like component [13].
While participants were watching the video, and ignoring the sentences, we measured their EEG from each electrode. The signal from the scalp electrodes was amplified 20,000 times (SynAmps 2 amplifier, Compumedics), sampled at 500 Hz, and low-pass filtered at 100 Hz online. This activity was recorded continuously until all stimuli in the two stimulus blocks have been presented.
Within each participant’s EEG, the onset of each sentence was marked with a “trigger”. Additional triggers were placed on the experimental sentences to mark (1) the onset of the second phrase in the sentence (see trigger A in Figure 1), and (2) at the onset of the same word in the sentence portion where there is no phrase boundary (see trigger B in Figure 1).
Processing the EEG
The offline EEG analysis was performed using EEGLAB (http://sccn.ucsd.edu/eeglab/; [33]) and ERPLAB (http://erpinfo.org/erplab/; [34]) toolboxes in MATLAB 2012a (Mathworks, Natik, MA, USA). Portions of the EEG that contained large muscle artefacts were removed from the analysis by visual inspection. The data was then re-referenced to the average of left and right mastoids. The EEG was bandpass filtered using noncausal Butterworth infinite impulse response (IIR) filter with half power cut offs at 0.1 and 30 Hz and a roll of 12 dB/octave. Ocular artefact correction was performed using independent component analysis (ICA) as implemented in EEGLAB (‘eeg_runica’ function). Independent components with known features of eye blinks (based on activity power spectrum, scalp topography, and activity over trials) were identified visually for each participant. The contributions of these components were then removed from the continuous EEG.
In line with previous studies [17],[18], the EEG data was processed in two ways. First, we divided the EEG into sections (epochs) that started 200 ms prior to the onset of the first word of the sentence and ended 4000 ms after the first word of the sentence. Each epoch was baseline corrected from −200 to 0 ms. To remove additional artefacts, we used a moving window peak to peak procedure in ERPLAB, with a 200 ms moving window, a 100 ms window step, and a 100 μV voltage threshold. Epochs were averaged separately to produce an ERP for early phrase boundaries and an ERP for late phrase boundaries. Each participant had at least 80% accepted trials per condition (early phrase boundary M = 95.49, SD = 4.67; late phrase boundary M = 94.88, SD = 5.96). The percentage of accepted trials did not differ between experimental conditions (t(23) = 0.52, p = 0.61), ensuring no systematic signal to noise ratio differences across conditions. Individual ERP waves were averaged to get grand averaged ERP for each condition. These were our “whole sentence ERPs” for early and late phrase boundaries.
A limitation of this approach is that if a positivity was seen at early or late phrase boundaries, this could reflect (1) a genuine CPS to the phrase boundary, (2) an enhanced N1-P2 response to the onset of the first word in the second phrase which follows a silent pause, or (3) or a combination of both. In order to disentangle the CPS and N1-P2 responses, a second analysis was performed where we divided the EEG into epochs that started 500 ms before the start of the second phrase, and ended 1000 ms after the onset of the second phrase (“phrased ERP”; Trigger A). Similar time intervals were selected for the sentence portion without the phrase boundary (i.e., “unphrased ERP”; Trigger B).This time window allowed us to differentiate the ERP effects that started prior to the onset of the second phrase (−250 ms to 0 ms) where the acoustic cues of phrase boundary are available (e.g., pre-boundary lengthening, pitch change) from the ERPs effects that started after the onset of the second phrase (0 to 250 ms) where the N1-P2 response to the second phrase onset is overlapped with the ERP response to phrase boundary (if any).
Each epoch was baseline corrected from −200 to 0 ms relative to the onset of the sentence. We took the baseline before the onset of the sentence (detached baseline) because the more commonly used immediate baseline (i.e., immediately before the second phrase) is problematic especially for the late phrase boundary: Since the CPS for early phrase boundary happens in the immediate baseline period for the late phrase boundary, there are systematic differences in ERPs in the immediate baseline period. The detached baseline period also has an advantage in excluding the activity related to the last syllable of the pre-boundary word, which contains frequency, intensity and duration cues for phrase boundary.
Epochs with amplitude exceeding 100 μV in a 200-ms moving window with a 100-ms window step were removed from the analysis, and remaining epochs were averaged separately for early phrase boundary and late phrase boundary sentences. More than 80% of the trials were accepted for every participant (early phrase boundary: Trigger-A M = 97.05, SD = 3.52; Trigger-B M = 96.86, SD = 3.79; late phrase boundary: Trigger-A M = 97.31, SD = 5.37; trigger-B M = 97.04, SD = 4.05). A 2 × 2 repeated measures analysis of variance (ANOVA) on percentage of accepted trials with factors boundary type (early, late) and trigger (A,B) showed no significant effect (all F <1, all p > .05), ensuring no systematic signal to noise ratio differences across conditions. Individual ERPs waves were averaged to get grand averaged ERP for each condition. These were our “second phrase ERPs”.
Measuring the ERPs
Sentence ERPs were measured by computing the mean amplitude in 200-ms time-windows that started at the onset of the phrase boundary. Three time windows were analysed after each phrase boundary (i.e., 1225–1825 ms after the sentence onset for early phrase boundary, and 1986–2586 ms after the sentence onset for late phrase boundary). It is noteworthy that from 1225–1825 ms, the early phrase boundary condition contained a phrase boundary while the late phrase boundary condition did not. Similarly, between 1986–2586 ms, the late phrase condition contained a phrase boundary but the early phrase boundary condition did not. Since early and late phrase boundary sentences contained the same words, the comparison of ERPs in these time windows would reflect the effect of phrase boundary (e.g., early and late phrase boundary ERPs versus unphrased ERPs).
Second phrase ERPs were measured by computing amplitudes for the time window −250 to 0 ms (preceding the second phrase onset) and the time window 0 to 250 ms (following the second phrase onset). A positive response in both windows would suggest that the CPS at the phrase boundary started before the second phrase onset, and later merged with the N1-P2 response to that phrase onset.
Analysing the ERPs
Separate analyses were done for the whole sentence ERPs and second phrase ERPs. Separate analyses were also done midline and lateral electrode sites. Most of the data sets (around 85%) were normally distributed and followed the assumption of homogeneity of variance. Hence parametric statistics were used to analyse the data. For midline electrodes (Fz, Cz and Pz), a three-way repeated measures ANOVA was performed using the factors boundary type (early, late), condition (phrased, unphrased) and electrode (Fz, Cz, Pz). For lateral sites, electrodes were grouped into four regions of interest (ROIs): right anterior (F4, F8, FC4, FT8), right posterior (P4, P8, CP4, TP8), left anterior (F3, F7, FC3, FT7) and left posterior (P3 , P7, CP3, TP7). A four-way repeated measures ANOVA was performed with the factors boundary type (early, late), condition (phrased, unphrased), hemisphere (right, left) and location (anterior, posterior). These ANOVAs were performed separately for each time interval.
If a significant interaction was found between condition and any other factor, post-hoc one-way ANOVAs were computed to understand the effect of electrode, hemisphere or location for each condition separately. In case of more than one degree of freedom (df) in the numerator, Greenhouse-Geisser (G-G) correction was applied to account for potential violations of sphericity. An alpha level of .05 was set as criterion for statistical significance. Partial ŋ2 was computed as a measure of effect size.