Parametric study of EEG sensitivity to phase noise during face processing
© Rousselet et al. 2008
Received: 24 June 2008
Accepted: 03 October 2008
Published: 03 October 2008
Skip to main content
© Rousselet et al. 2008
Received: 24 June 2008
Accepted: 03 October 2008
Published: 03 October 2008
The present paper examines the visual processing speed of complex objects, here faces, by mapping the relationship between object physical properties and single-trial brain responses. Measuring visual processing speed is challenging because uncontrolled physical differences that co-vary with object categories might affect brain measurements, thus biasing our speed estimates. Recently, we demonstrated that early event-related potential (ERP) differences between faces and objects are preserved even when images differ only in phase information, and amplitude spectra are equated across image categories. Here, we use a parametric design to study how early ERP to faces are shaped by phase information. Subjects performed a two-alternative force choice discrimination between two faces (Experiment 1) or textures (two control experiments). All stimuli had the same amplitude spectrum and were presented at 11 phase noise levels, varying from 0% to 100% in 10% increments, using a linear phase interpolation technique. Single-trial ERP data from each subject were analysed using a multiple linear regression model.
Our results show that sensitivity to phase noise in faces emerges progressively in a short time window between the P1 and the N170 ERP visual components. The sensitivity to phase noise starts at about 120–130 ms after stimulus onset and continues for another 25–40 ms. This result was robust both within and across subjects. A control experiment using pink noise textures, which had the same second-order statistics as the faces used in Experiment 1, demonstrated that the sensitivity to phase noise observed for faces cannot be explained by the presence of global image structure alone. A second control experiment used wavelet textures that were matched to the face stimuli in terms of second- and higher-order image statistics. Results from this experiment suggest that higher-order statistics of faces are necessary but not sufficient to obtain the sensitivity to phase noise function observed in response to faces.
Our results constitute the first quantitative assessment of the time course of phase information processing by the human visual brain. We interpret our results in a framework that focuses on image statistics and single-trial analyses.
In primates, visual object processing unfolds from the retina to higher-order cortical areas through a hierarchy of processing steps. Although, at the neuronal level, lateral and feedback connections are integrated to the feedforward sweep of information [1, 2], at the functional level, neuronal mechanisms can still be conceptualized as performing rapid transformations of the input retinal activation to achieve increasingly refined representations . A fundamental question in vision science is thus how to uncover the mechanisms by which the pattern of retinal activation is progressively transformed into a code that is useful for making behavioural decisions. In recent years there has been an on-going debate as to what stimuli are best for probing visual neuronal mechanisms. This debate stems mostly from the study of neurons in V1, the primary visual cortex, and whether their visual response properties can be better understood by using simple well-controlled stimuli , or natural scene stimuli, the type of stimuli the visual system might have evolved to apprehend best . At the other end of the visual cortical hierarchy, in higher order visual areas, no such debate exists since those areas are mostly responsive to complex objects, and not to simple patterns [6–10]. In those areas, the emphasis has been put on object categories and their relative specificity . Although interesting in itself, the category-related parcelling of the visual cortex ignores the question of the transformation mechanisms taking place along the visual hierarchy. Because these processes occur very fast, critical information processing events may be observed at the time-scale of EEG (electroencephalography) [3, 12–14]. In humans, EEG (as well as MEG more recently) has revealed a cascade of neuronal activations following stimulus presentation . Within 200 ms, neuronal activity has been reported that dissociates among various object categories, in particular faces and words [16–19]. In particular, the larger ERP component to faces and words, compared to other control categories, and peaking at about 170 ms, the N170 [20–23], has been the subject of much debate about its categorical sensitivity . Early activity, in the time window of the P1 component (80–120 ms), also has been discussed as a potential marker of complex object processing [25–27].
On-going controversies about the time-course of object processing are due, in part, to the difficulty associated with controlling the effects of low-level sensory variables on higher-order perceptual operations. In classic categorical designs that are used to assess object-processing speed, uncontrolled physical properties tend to co-vary with the object categories that are contrasted. Such physical properties might introduce biases in our brain measurements that are unrelated to the higher-level object processing that are meant to be measured, but instead reflect the extraction of visual information by lower levels of the visual hierarchy [25, 28].
Recent advances have revealed that the activity in the N170 time window, but not earlier activity, is related to the extraction of task-related information (EEG: [14, 29–31]; MEG: [32, 33]). These advances were made possible by a tight control of the stimulus space that relied on parametric, rather than categorical, designs. Parametric designs are well suited to explore brain dynamics in a systematic fashion because, by varying one or several parameters along a continuum, they can provide a genuine information space and stronger constraints on statistical analyses [14, 34, 35].
where N is the sample size of the distribution, xi the value of the i-th member of the distribution, the mean of the i values, and σ 2 the variance (for a good visualisation of skewness and kurtosis in the context of 1/f wavelet textures, see Figure 1 from ). Kurtosis, in particular, seems to be related to the presence of edges and local contours in natural images, and might thus be a better indicator of image structure than global phase coherence per se [46, 50–52]. Therefore, we included skewness and kurtosis as predictors in our linear regression model. We report data showing, in response to face stimuli, sensitivity to phase noise that emerged very rapidly at the transition between the P1 and the N170 components, in the 120–150 ms after stimulus onset time window.
A total of 10 subjects participated in one main experiment and two control experiments. Experiment 1 included eight subjects (five males and three females). Four were tested twice (test-retest, on two different days), leading to a total of 12 experimental sessions. Only half of the subjects were tested twice because of the robust replication of the effects we obtained with the first four subjects (see Results). Subjects' mean age was 24 years old (min = 21, max = 28, SD = 2.5); seven were right handed. Experiment 2 included four male subjects, and all of them were tested twice (mean age 27, min = 24, max = 29, SD = 2.4; three right handed). Four subjects (one female) participated in Experiment 3, one of whom was tested twice (mean age 25, min = 22, max = 29, SD = 3; three were right handed). All subjects gave written informed consent and had normal or corrected-to-normal vision. Five subjects participated in one experiment only, four participated in two experiments, and only one subject participated in three experiments. This last subject, RXL, was singled-out for this reason in the results. Among the 10 different individuals, 4 received $10/hour for their participation; the others were members of the laboratory and were not compensated for participation. The McMaster University Research Ethics Board approved the research protocol.
One pair of female faces and one pair of male faces were selected from a set of 10 faces used in previous experiments [36, 53, 54]. Each subject saw only two faces, from the same gender, and male and female faces were counterbalanced across subjects. These faces were front-view greyscale photographs cropped within a common oval frame and pasted on a uniform 10° × 10° background (Figure 1). Face stimuli all had the same mean amplitude spectrum and thus differed only in terms of phase information, which carries most of the form information [55, 56]. We created four noise textures by randomizing the phase of the four faces. Thus, these patterns, which we refer to as pink noise textures, had the same amplitude spectrum as the faces, but differed from faces in terms of higher-order statistics. We also created four wavelet textures, each matched for the global image statistics of one of the faces. These textures not only matched the skewness and kurtosis of the original face stimuli, they also matched properties such as local and long-distance multiscale phase correlations. The textures were created with the Matlab toolbox provided by Portilla and Simoncelli (, http://www.cns.nyu.edu/~lcv/texture/), with the parameters set to four scales, four orientations, a 9 × 9 spatial neighbourhood, and 50 iterations.
This technique takes into account the directional nature of phase, assuring that phases are uniformly distributed after transformation. In comparison, a strict linear blend would lead to an over-representation of phases around 0°. Thus, WMP has the advantage over a linear blend technique to produce monotonic changes in third-order (skewness) and fourth-order (kurtosis) image statistics, as illustrated at the bottom part of Figure 1 and in . Kurtosis is often used as a measure of image sparseness and is highly correlated with the representation of phase structure, high levels of kurtosis corresponding to local phase-congruent structures such as edges .
For all stimuli, pixel contrasts ranged between -1 and 1, with a mean of 0. RMS contrast was kept constant across all levels of phase coherence.
EEG data were acquired with a 256-channel Geodesic Sensor Net (Electrical Geodesics Inc., Eugene, Oregon, ). Analog signal was digitized at 500 Hz and band-pass filtered between 0.1 Hz and 200 Hz. Impedances were kept below 50 kΩ. Subjects were asked to minimize blinking, head movement, and swallowing. Subjects were then given a description of the task. EEG data were referenced on-line to electrode Cz and re-referenced off-line to an average reference. The signal was then low-pass filtered at 30 Hz and bad channels removed, with no interpolation. The 30 Hz low-pass filter was justified by a previous study in which we showed that the differential activity evoked by faces and objects is contained mostly in a narrow 5–15 Hz band . Baseline correction was performed using the 300 ms of pre-stimulus activity and data epoched in the range -300 ms to 400 ms. Trials with abnormal activities were excluded based on a detection of extreme values, abnormal trends, and abnormal distributions, using EEGLAB functions [59, 60]. The threshold for extreme values was ± 100 μV for all channels. An epoch was rejected for abnormal trend if it had a slope larger than 75 μV/epoch and a regression R 2 larger than 0.3. An epoch was rejected for abnormal distribution when its kurtosis fell outside five standard deviations of the kurtosis distribution for each single electrode or across all electrodes. All remaining trials were included in the analyses, whether they were associated with correct or incorrect behavioural responses.
Using a multiple linear regression approach, the single-trial EEG amplitude in μV was expressed using one model for all three experiments:EEG = β 1 + β 2 S + β 3φ + β 4γ2 + β 5γ1 + β 6φγ2 + β 7φγ1 + ε
The fit was performed at each electrode and each time point independently using the glmfit Matlab function, with a normal distribution. Phase (φ), skewness (γ 1) and kurtosis (γ 2) were coded as continuous regressors, while the regressor for stimulus identity (e.g., Face A vs. Face B) (S) was a categorical factor. Regression coefficients (β) are expressed in an arbitrary unit that reflects the strength of the fit (i.e. the influence of the factor on the EEG signal). The terms (φγ 2) and (φγ 1) correspond to, respectively, phase-kurtosis and phase-skewness interactions. The error term is (ε).
For each subject, we report the electrode at which the model provided the best fit (i.e., where R2 was largest). In general, R2 was largest in a cluster of posterior electrodes that also exhibited large N170 responses to faces. In some cases, the largest R2 was obtained at an electrode that did not produce the strongest N170 to faces. For each of these cases, however, the time-course and relative strength of the best-fitting regression parameters were virtually identical at both sites (i.e., the one producing the largest N170 to faces and the other that produced the largest R2). In addition to the multiple intra-subject analyses, we evaluated the influence of each regressor on the EEG across subjects using semi-partial correlation coefficients.
Model fit results and 95% confidence intervals in the three experiments
Experiment 1: Faces
Experiment 2: Pink Noise Textures
Experiment 3: Wavelet Textures
36.4 [29.2 44.6]
5.8 [3.6 8.3]
21.7 [12.4 30.4]
145 [142 148]
250 [228 272]
229 [180.8 296.6]
127 [121 134]
218 [161 263]
160 [154 168]
228 [175 269]
Onset-Peak Latency Difference
33 [26 42]
10 [6 14]
127 [122 132]
175 [163 199]
143 [137 148]
204 [187 219]
Onset-Peak Latency Difference
16 [13 19]
29 [20 42]
Phase × Kurtosis
150 [133 171]
166 [151 185]
Onset-Peak Latency Difference
16 [12 20]
Across the four subjects, the mean value of the maximum R2 was 36.2% (median = 35.7%; minimum = 19.8%, maximum = 67.3%).
Our previous work showed that early evoked brain activity to faces and objects, starting at about 130–150 ms after stimulus onset, does not reflect differences in stimulus amplitude spectra, but rather is mainly driven by spatial phase information [28, 36, 37]. The current results reveal for the first time the time course of the brain sensitivity to global visual phase information. In other words, we determined how phase information shapes early brain responses to complex objects like faces. Our results show that the visual system sensitivity to global phase information emerges progressively in a short time window between the P1 and the N170 components. This sensitivity to phase noise starts roughly at about 120–130 ms after stimulus onset and continues for another 25–40 ms, as indicated by the time course of model R2 and the regression fit (Figure 4). During this delay, single-trial activity is not only sensitive to phase information, but also to kurtosis, and the two global image descriptors interact significantly with each other (Figures 4 and 5).
Our linear phase manipulation introduced non-linear monotonic changes in higher-order image statistics, as revealed by kurtosis measurements . Kurtosis is a good measure of image sparseness [50, 52, 61]. The monotonic and non-linear increase in kurtosis, from 0% to 100% phase coherence, thus most probably corresponds to the build up of local elements like edges and contours, that in turn are formed by local phase alignment across different spatial frequency bands [38, 40, 50]. The time course of our model fit might thus reveal the extraction of global image structure, and not only sensitivity to phase information.
We note that our kurtosis measurements were made directly on the pixel contrasts. Thomson  warned about measuring kurtosis from non-whitened images because coloured noise (in our case pink noise corresponding to the 0% phase coherence condition) contains pixel-wise correlations that might inflate kurtosis artificially, rendering it a non-interesting measure to detect sparseness in images. However, we were interested in the relative differences in kurtosis across image types, and, more importantly, our pink noise textures were appropriately centred around 3, the value expected from white noise distribution, not contaminated by 1/f amplitude spectrum information.
Two control experiments (Experiments 2 and 3) showed that higher-order statistics of faces are necessary but not sufficient to obtain the sensitivity to phase noise observed in response to faces (Experiment 1). First, although in Experiment 2 subjects' could discriminate two pink noise textures presented in the same conditions as faces in Experiment 1, the regression model failed to fit the data. Second, wavelet textures matched for both global and some of the local image face statistics did trigger a face-like EEG pattern, but in this case the model provided a poorer overall fit to the data, a delayed timing in the fit, and an absence of phase × kurtosis interaction. This result points to particular local phase arrangements as being responsible for the model fit observed for faces. This is not a trivial point, because it would be conceivable to observe a time-course of phase noise sensitivity for control textures that would be similar to the one observed for faces. This possibility stems from the fact that the linear regression fit used to measure sensitivity to phase noise is independent of the global shape of the ERP, i.e. its mean. Our second control experiment also raises the question of how far we can go into matching image statistics without simply reproducing the stimulus. From our results, it remains unclear how the EEG to faces and matched wavelet textures precisely compare within subjects, and this will require further investigation using a paradigm in which the two types of stimuli, as well as control object categories, are tested in the same recording session.
Parametric stimulus phase manipulations and, more generally, parametric noise manipulations have been used in the literature to investigate the spatial and temporal hierarchical encoding of visual information. In the spatial domain, fMRI has lead to the discovery of various noise tuning functions in different brain areas of human and non-human primates. The general conclusion from those studies is that there is an increasing sensitivity to noise along the ventral pathway, from V1, where for instance the signal evoked by natural images does not differ from the one evoked by pink noise [9, 62], but see , to higher-level object processing areas, where noise sensitivity tends to be the strongest [7, 9, 63, 64]. The strongest noise sensitivity in higher levels of the visual hierarchy as observed in fMRI, together with EEG and MEG source analyses of evoked activity in the time range of the N/M170 [16, 65, 66], suggests that the sensitivity to phase noise we recorded in response to faces corresponds to the activity of face or object processing areas integrating information about the global structure of the stimulus. Future studies should investigate the cortical network involved in the effects reported here, as well as the nature of these effects, essentially feedforward or reflecting the integration of information from other structures [67, 68].
In the temporal domain, EEG and MEG studies have reported results compatible with our findings. For instance, additive noise has been used to dissociate the stimulus sensitivity of early evoked responses. In EEG, it has been demonstrated that there is a linear inverse relationship between the amount of white noise added to face stimuli and the N170 amplitude. In contrast, the earlier P1 component is not affected by this noise manipulation . In MEG, another type of dissociation has been reported between the M1 and the M170, which are to some extent the magnetic counterparts of the P1 and N170 ERP components . In a parametric design in which faces were masked by narrow band-pass filtered noise, Tanskanen et al. found that the M170 amplitude was modulated by noise spatial frequencies in a very similar manner to recognition performance . When noise patterns were presented in the absence of face stimuli, the spatial frequency noise sensitivity of the M170 disappeared, whereas the earlier M1 component showed frequency tuning similar to the one triggered by face + noise stimuli. It thus seems clear from these two studies that the N170 reflects, at least in part, the activity of a mechanism that begins to respond during the N170 time window and which was not active during earlier time frames.
In our study, we made the explicit assumption that this mechanism might be related to global phase processing. We also provide for the first time a detailed timing of sensitivity to phase noise, using a single-trial model incorporating only parameters related to the global image statistics. Our approach did not depend on the identification of traditional ERP components, and therefore allowed us to track sensitivity to phase noise without limiting our analyses to ERP peaks. This aspect of the analyses is important, because we found that information extraction starts at the transition between the P1 and the N170. This result is in keeping with a series of recent studies, relying on component-free single-trial models, showing onsets of task related information accrual just before the N170, but after the P1 [14, 30, 31, 69].
Our approach was not to map the relationship between behaviour and EEG activity, but rather to focus on global image properties and how they shape early EEG activity. This is why we kept the task simple and constant. This approach is legitimate because early brain activity evoked by some categories of complex objects, like faces and words, is hardly modulated by task factors ([27, 28], but see some recent advances in [14, 69]), and because there is still much to learn about the relationship between image statistics and brain activity . Furthermore, our approach provides a potentially fruitful departure from frameworks that make assumptions about stimulus space and the nature of relevant 'features'. Like natural scenes, faces can receive a statistical description rather than a category label. For instance, contrasting faces and houses can tell us a great deal about when and where in the brain these two kinds of stimuli are differentially represented. However, other properties co-vary with these semantic categories, and it remains unclear precisely what type of information is extracted when a categorical difference is observed. Using parametric designs can circumvent this limitation. In that framework, phase manipulations constitute one way to explore face and object processing by creating an information space. This approach can be understood as an extension of classic categorical designs, in which regular stimuli are contrasted with noise (i.e. the two extreme points of the continuum). Finally, our approach can be applied to a large range of problems that researchers in behavioural neuroscience usually address using categorical designs.
NSERC Discovery Grants 42133 and 105494, and Canada Research Chairs supported PJB and ABS. A CIHR fellowship grant and a British Academy grant supported GAR. We thank Jesse Husk, Sandra Thomson, Melissa Sergi, and Richard Louka for their help collecting data.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.