 Methodology article
 Open Access
 Published:
Multivariate assessment of eventrelated potentials with the tCWT method
BMC Neurosciencevolume 16, Article number: 73 (2015)
Abstract
Background
Eventrelated brain potentials (ERPs) are usually assessed with univariate statistical tests although they are essentially multivariate objects. Brain–computer interface applications are a notable exception to this practice, because they are based on multivariate classification of singletrial ERPs. Multivariate ERP assessment can be facilitated by feature extraction methods. One such method is tCWT, a mathematicalstatistical algorithm based on the continuous wavelet transform (CWT) and Student’s ttest.
Results
This article begins with a geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the tCWT method in particular. Further, it presents for the first time a detailed, stepbystep, formal mathematical description of the tCWT algorithm. A new multivariate outlier rejection procedure based on principal component analysis in the frequency domain is presented as an important preprocessing step. The MATLAB and GNU Octave implementation of tCWT is also made publicly available for the first time as free and open source code. The method is demonstrated on some example ERP data obtained in a passive oddball paradigm. Finally, some conceptually novel applications of the multivariate approach in general and of the tCWT method in particular are suggested and discussed.
Conclusions
Hopefully, the publication of both the tCWT source code and its underlying mathematical algorithm along with a didactic geometric introduction to some basic concepts of multivariate statistics would make tCWT more accessible to both users and developers in the field of neuroscience research.
Background
An eventrelated brain potential (ERP) is extracted from an electroencephalogram (EEG) [1]. Since the EEG is a stochastic process, the ERP is a multivariate statistical object, as well. It is a set of random curves (one curve per EEG channel), and a random curve cannot be simply represented by a single univariate feature (or a curve parameter) without loosing a lot of potentially useful information. Nevertheless, ERPs are usually represented by univariate “components”, which most often are prominent peaks of the curves. An ERP component is usually assessed by its peak value or by the area under the curve in a narrow time window around the peak. This method has the huge advantage that it is both simple and visually and intuitively clear. For instance, peak latency can be interpreted as brain processing time related to an ERPtriggering event. More sophisticated methods for ERP component extraction from single trials exist as well [2]. The statistical assessment of ERP components is usually based on analysis of variance (ANOVA) [3]. Such assessment is sufficient for many applications. Some inherent limitations of this approach have been addressed by “mass univariate analysis” methods based on permutation/randomization tests [4, 5].
There are, however, cases in which univariate assessment is not sufficient. For instance, ERPbased brain–computer interfaces (BCI) require fullfledged multivariate assessment of the ERP curves [6]. The tCWT [7], a feature extraction method based on the continuous wavelet transform (CWT) [8, 9] and Student’s ttest, was first introduced as one possible multivariate solution to the problem of singletrial ERP classification at the International BCI Competition 2003, where it was compared to a variety of other feature extraction and classification methods ranging from simple and weak to advanced and powerful ones like, e.g., Support Vector Machines (SVM) [6, 10]. tCWT was a winner on two of the six ERP datasets provided for the competition [6]; these two were obtained in two different BCI paradigms, “P300 Speller” and “Self regulation of Slow Cortical Potentials (Biofeedback)” [7].
Apart from BCI applications, it has been shown how tCWT can be used for detection and quantification of ERPs in other paradigms as well, e.g., for individual (clinical) diagnostics [11, 12]. In the latter application, tCWT was combined with a multivariate randomization test based on Hotelling’s T\({}^2\)statistic. Similarly to the univariate randomization tests mentioned above [4], which provide an effective correction for multiple univariate comparisons, the multivariate randomization test [11] corrects the accumulation of chance bias arising from performing both tCWT feature extraction and hypothesis testing on the same ERP dataset.
A comparison of tCWT to classical univariate ERP assessment methods based on peak picking and area computation showed a clear advantage of multivariate tCWT features over univariate measures [11]. More recently, tCWT has been systematically evaluated in comparison to several other ERP component detection methods ranging from simple univariate peak picking to \(t_\mathrm {max}\) randomization tests [4] performed on band pass filtered EEG and on tCWT features [13]. This evaluation was performed with both simulated and real ERP data at different levels of signaltonoise ratio. Sensitivity, specificity, positive and negative predictive values were assessed, and, as a result, tCWT showed superior to all other methods, especially at low signaltonoise ratios. It should be noted, however, that, in that study, tCWT was used only as a feature extraction method and only mass univariate analysis (\(t_\mathrm {max}\)), but no multivariate statistical assessment of the obtained features was done. In another recent study, tCWT feature extraction was compared to two other waveletbased ERP component detection methods using the spikelet technique and wavelet asymmetry, respectively, with the result that both tCWT and the novel wavelet asymmetry method showed a marked advantage over the spikelet technique in terms of detection accuracy: 83 % and 91 % vs. 43 %, correspondingly [14]. In the latter study, however, only a single ERP component of interest, the N160, was detected without any multivariate or mass univariate analyses of the whole ERP curves.
Although tCWT features may well be interpreted as ERP components [7, 11, 14], the current article is mainly focused on tCWT applications in which, as in the case of BCI, the whole (multivariate) difference between two ERP curves is of primary importance, while differences in single components are only of secondary interest.
The goals of the this article are: (a) to provide a didactic geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the tCWT method in particular; (b) to present for the first time a detailed, stepbystep, formal description of the tCWT algorithm and to make its MATLAB and GNU Octave [15] implementation publicly available as free and open source code [16, 17] (Additional file 1) released under the GNU General Public License, version 3 (GPLv3, full text available in the tCWT documentation [17]); (c) to demonstrate the tCWT method in the assessment of example ERP data [18] obtained in a passive oddball paradigm [19, 20]; (d) to suggest and discuss conceptually novel applications of the multivariate approach in general and of the tCWT method in particular for the purpose of hypothesis testing rather than for BCI and singletrial classification.
An important new preprocessing step in the revised version of the tCWT algorithm presented here is the novel multivariate outlier rejection procedure based on principal component analysis (PCA, see below). Other changes in the algorithm were aimed at simplification for the sake of easier understanding. For instance, the timedependent filtering module [7] was removed from the current version and the discrete wavelet transform (DWT) [8, 21] was replaced by a discrete Fourier transform (DFT) [22]. Visualization was simplified as well by plotting the onedimensional linear discriminant function (LDF, see below) instead of the twodimensional tvalue scalogram [7].
Multivariate statistics: a geometric primer
Multivariate statistical analysis [23] cannot be substituted by a multitude of univariate analyses, because the latter cannot address adequately the covariance of the data. Even if the effect of the correlations between the variables on multiple univariate tests is taken into account by a proper correction of the overall \(\alpha \)level, e.g., by a permutation test [4], the univariate approach still misses a lot of information contained in the covariance matrix [23, p. 8].
In this section, some basic concepts of multivariate analysis are represented as geometric notions. This approach is meant as an aid to better understanding of these concepts through there association with familiar images from Euclidean space.
ERPs as points in a vector space
Consider a sample of N ERP trials obtained from K EEG channels, in the time interval (epoch) \(0\le t<T\), where t is the time relative to a triggering event. For each channel \(k=1,2,\ldots K\), and each single trial \(n=1,2,\ldots N\), the ERP voltage curve \(v_n^k(t)\) can be represented by a row vector \(\mathbf {v}^k_n=\left( v_n^{k1},v_n^{k2},\ldots v_n^{kL}\right) \) with components \(v_n^{kl} = v_n^k(t^l)\), where \(t^l=(l1)T/L\), \(l=1,2,\ldots L\) are the equidistant sampling time points whose number L is obtained from the original sampling frequency \(R_0\):
The entire Kchannel ERP trial can be represented by the vector
Thus, the ERP is represented by a random vector \(\mathbf {v}\) in a \(K\times L\)dimensional space, and the ERP sample is represented by the Nby\(K\times L\) matrix
Note that this notation is different from the standard notation in which vectors are represented by columns [23, p. 7]. Here, vectors are represented by rows, as in the MATLAB and GNU Octave [15] implementation of tCWT [16, 17].
Mahalanobis distance and Hotelling’s test
Now, consider a random vector \(\mathbf {v}\) and the corresponding singletrial ERP sample \(\mathbf {V}\) which, however, in this case, comprises two subsamples \(\mathbf {V}_{\!\!_\mathcal {A}}\) and \(\mathbf {V}_{\!_\mathcal {B}}\) obtained under two different experimental conditions \(\mathcal {A}\) and \(\mathcal {B}\) (\(\mathcal {A}\)\(\mathcal {B}\) design). The two corresponding ERPs are then represented by \(\mathbf {V}_{\!\!_\mathcal {A}}\) and \(\mathbf {V}_{\!_\mathcal {B}}\) and by the respective random vectors \(\mathbf {v}_{\!\!_\mathcal {A}}\) and \(\mathbf {v}_{\!_\mathcal {B}}\). Assuming multivariate normal distributions [23, pp. 37–59] with equal covariance matrices, we want to compare the two ERPs by testing the hypothesis \(H_0: \mathbf {\overline{v}}_{\!\!_\mathcal {A}}= \mathbf {\overline{v}}_{\!_\mathcal {B}}\), where \(\mathbf {\overline{v}}_{\!\!_\mathcal {A}}\) and \(\mathbf {\overline{v}}_{\!_\mathcal {B}}\) are the mean vectors (representing the average ERP curves). Geometrically, \(\mathbf {\overline{v}}_{\!\!_\mathcal {A}}\) and \(\mathbf {\overline{v}}_{\!_\mathcal {B}}\) represent two points in the \(K\times L\)dimensional space. Hence, we can test \(H_0\) by testing the hypothesis that the distance between these two points is zero. Thus, the problem of multivariate hypothesis testing can be reduced to the geometric problem of defining and measuring distance in this vector space and then performing a univariate test on this (random) distance.
Like in the univariate case, the scale for measuring distance is provided by the variance or, more precisely, by its square root, the standard deviation (like in Student’s ttest). There are, however, two problems in the multivariate case: (a) we have different variances corresponding to the different vector components, and (b) these variables are correlated. The latter problem can be solved by principal component analysis (PCA). The principal component transform (PCT) represented by its corresponding matrix \(\mathbf {T_p}\) is an orthogonal transformation, i.e., a kind of rotation of the coordinate axes, such that, in the new coordinate system, the covariance matrix has a diagonal form [23, pp. 343–344]. The transformation of the vector components is given by
The covariance matrix \(\mathbf {S}\) is diagonalized by this transformation, i.e., the correlations between the new variables \(v^i_\mathrm {p},\ i=1,2,\ldots K\times L\) are all zero. Because \(\mathbf {T_p}\) is an orthogonal transformation, the inverse transformation (i.e., the rotation back to the original axes) is given by the transpose
and, correspondingly, the diagonalized covariance matrix \(\mathbf {S_p}\) is given by
The diagonal elements (the eigenvalues) of \(\mathbf {S_p}\) are the squared standard deviations \(\left( \sigma ^i_\mathrm {p}\right) ^2\) of the uncorrelated variables \(v^i_\mathrm {p}\). Normalizing these variables by the transformation
we land in a familiar \(K\times L\)dimensional Euclidean space where the scale is the same in all directions, i.e., the constant density ellipsoids [23, p. 40] of the multivariate normal distribution of \(\mathbf {x}\) are spheres (the procedure (4–7) is known as PCA “sphering” or “whitening”). In Euclidean space, the squared distance \(D^2\) between two points \(\mathbf {x\,}_{\!\!_\mathcal {A}}\) and \(\mathbf {x\,}_{\!_\mathcal {B}}\) is simply given by the Pythagorean theorem
Substituting (7) into (8), we can compute the distance between \(\mathbf {\overline{v}_{p\,{\!\!_\mathcal {A}}}}\) and \(\mathbf {\overline{v}_{p\,{\!_\mathcal {B}}}}\)
In (9), we use the fact that \(\mathbf {S_p^{1}}\) is a diagonal matrix with eigenvalues \(\left( \sigma ^i_\mathrm {p}\right) ^{2}\). Substituting the right parts of (5) and (6) into (9), we obtain
In (10), we use the property that \(\mathbf {S^{1}}\) is diagonalized by the same transformation as \(\mathbf {S}\). The expression (10) is called a “Mahalanobis distance” [23, p. 22]. When \(\mathbf {S}\) is substituted with an unbiased estimator and the resulting \(D^2\) is multiplied by a proper coefficient (a function of the degrees of freedom), it turns into Hotelling’s T\({}^2\)statistic whose pvalue is obtained from the corresponding (univariate) T\({}^2\)distribution and can be used to test \(H'_0: D^2( \mathbf {\overline{v}}_{\!\!_\mathcal {A}}, \mathbf {\overline{v}}_{\!_\mathcal {B}}) = 0\), which is equivalent to \(H_0: \mathbf {\overline{v}}_{\!\!_\mathcal {A}}= \mathbf {\overline{v}}_{\!_\mathcal {B}}\) [23, pp. 60–120]. Thus, the multivariate problem of comparing two mean vectors (representing two average ERP curves) is reduced to the univariate problem of testing whether the Mahalanobis distance between the mean vectors is different from zero.
Linear discriminant analysis (LDA)
So far, we have shown that Hotelling’s T\({}^2\) is a kind of Mahalanobis distance between two mean vectors defined by the natural metric provided by the covariance. But (10) bears another geometric interpretation as well. Defining the linear discriminant function (LDF) \(\mathbf {d}\) [23, pp. 74] as a column vector by
the Mahalanobis distance (10) is expressed as a LDF value:
The column vector \(\mathbf {d}\) defines a direction in space. The projection of the mean difference \(\mathbf {\overline{v}}_{\!\!_\mathcal {A}} \mathbf {\overline{v}}_{\!_\mathcal {B}}\) onto this particular direction is the maximum of all projections of \(\mathbf {\overline{v}}_{\!\!_\mathcal {A}} \mathbf {\overline{v}}_{\!_\mathcal {B}}\) onto all possible directions [23, pp. 92]. Linear discriminant analysis (LDA) is based on the construction of a separation plane which is perpendicular to \(\mathbf {d}\) and passes through the middle between \(\mathbf {\overline{v}}_{\!\!_\mathcal {A}}\) and \(\mathbf {\overline{v}}_{\!_\mathcal {B}}\), i.e. through the point \((\mathbf {\overline{v}}_{\!\!_\mathcal {A}}+\mathbf {\overline{v}}_{\!_\mathcal {B}})/2\). This separation plane is defined by the equation
According to the optimal LDA classification rule [23, pp. 231], an ERP trial \(\mathbf {v}_{\!x}\) drawn from a new sample \(\mathbf {V\!}_\mathcal {X}\) (comprising an unknown mixture of vectors drawn from both populations \(\mathcal {A}\) and \(\mathcal {B}\)) is assigned to the population \(\mathcal {A}\) if it lies above this plane, i.e.
and it is assigned to the population \(\mathcal {B}\) otherwise. The rule (14) is optimal because it minimizes the classification error rate.
Moreover, the LDF value difference \(\left( \mathbf {v}_{\!x}\! \tfrac{1}{2}(\mathbf {\overline{v}}_{\!\!_\mathcal {A}}\!+\mathbf {\overline{v}}_{\!_\mathcal {B}}) \right) \mathbf {d}\) is proportional to the distance from the point \(\mathbf {v}_{\!x}\) to the separation plane (13) and can be seen as a measure of the affiliation of \(\mathbf {v}_{\!x}\) with \(\mathcal {A}\) or \(\mathcal {B}\).
The optimal LDA classification rule (14) is based on the assumption that the a priori probability \(p_{\!\!_\mathcal {A}}\) that an ERP trial of unknown affiliation belongs to \(\mathcal {A}\) is equal to the a priori probability \(p_{\!_\mathcal {B}}\) that it belongs to \(\mathcal {B}\). If \(p_{\!\!_\mathcal {A}}\!\ne p_{\!_\mathcal {B}}\), the optimal LDA classification rule (14) is generalized as follows [23, pp. 231]. An ERP trial \(\mathbf {v}_{\!x}\) is assigned to \(\mathcal {A}\) if
and to \(\mathcal {B}\) otherwise, i.e., the separation plane (13) is shifted in the direction of \(\mathbf {\overline{v}}_{\!\!_\mathcal {A}}\) if \(p_{\!\!_\mathcal {A}}\!< p_{\!_\mathcal {B}}\), and it is shifted in the direction of \(\mathbf {\overline{v}}_{\!_\mathcal {B}}\) if \(p_{\!\!_\mathcal {A}}\!> p_{\!_\mathcal {B}}\). Note that (15) is a generalization of (14) because the logarithmic term is zero when \(p_{\!\!_\mathcal {A}}\!=p_{\!_\mathcal {B}}\).
The LDF value \(\mathbf {v}_{\!x}\mathbf {d}\) has an interesting property that should be noted, because it could be important for some interesting applications. Since the LDF \(\mathbf {d}\) is obtained from the sample \(\mathbf {V}\) while \(\mathbf {v}_{\!x}\) is drawn from a different sample \(\mathbf {V\!}_\mathcal {X}\), the LDF value \(\mathbf {v}_{\!x}\mathbf {d}\) has univariate normal distribution [23, p. 45]. This property can be used to reduce the multivariate ERP \(\mathbf {v}_{\!x}\) to the univariate random variable \(\mathbf {v}_{\!x}\mathbf {d}\) that exclusively and fully reflects the (multivariate) ERP difference between two particular experimental conditions, \(\mathcal {A}\) and \(\mathcal {B}\). The important point here is that \(\mathbf {v}_{\!x}\mathbf {d}\) can be used for other purposes beside classification. For instance, if the \(\mathcal {A}\)\(\mathcal {B}\) structure of \(\mathbf {V\!}_\mathcal {X}\) is known, the LDF value \(\mathbf {v}_{\!x}\mathbf {d}\) can be subjected to Student’s ttest in order to test the hypothesis about the mean \(\mathcal {A}\)\(\mathcal {B}\) difference within \(\mathbf {V\!}_\mathcal {X}\). Note the difference to Hotelling’s \(T^2\)test: while in the Mahalanobis distance (12), the LDF \(\mathbf {d}\) is computed from the sample that is tested, in \(\mathbf {v}_{\!x}\mathbf {d}\) it is computed from a different sample. This means that the multivariate structure of the \(\mathcal {A}\)\(\mathcal {B}\) difference derived from \(\mathbf {V\!}\) is imposed on \(\mathbf {V\!}_\mathcal {X}\) in order to assess the (univariate) magnitude of the \(\mathcal {A}\)\(\mathcal {B}\) difference within \(\mathbf {V\!}_\mathcal {X}\). This procedure provides a methodologically important alternative to Hotelling’s \(T^2\)test, because it implies a conceptually novel approach to multivariate ERP assessment which could be useful, e.g., in clinical applications (see the “Discussion” section).
The problem of dimensionality
The multivariate ERP model presented above is idealized. It can only work with few EEG channels, low timesampling rates, short epochs and large number of ERP trials. In most applications, however, the dimensionality of the vector space is very large and exceeds the number of trials. For instance, if the number of channels is K = 32, the sampling rate (1) is \(R_0\) = 500 Hz, and the epoch length is T = 1 s, then the number of time points is L = \(R_0 T\) =500, and we have \(K\times L\) = 16,000 dimensions. If the number of trials is, e.g., N = 1,000, the rank of the covariance matrix \(\mathbf {S}\) is \(N1=999\) [23, pp. 9, 406], which means that \(\mathbf {S}\) is singular (i.e. \(\mathbf {S^{1}}\) does not exist, because \(\mathbf {S}\) has at least 15,001 zero eigenvalues), and (9–15) cannot be applied. Even if \(N1>K\times L\) and \(\mathbf {S}\) is not singular, it is still necessary to reduce the dimensionality of the model because of (a) the loss of statistical power caused by the inclusion of noise variables in the \(D^2\)sum (9), and (b) the computational problems associated with too many variables and too small eigenvalues.
A standard solution to the dimensionality problem is given by PCA (see above): the variables \(v^i_\mathrm {p}\) with small variances \(\left( \sigma ^i_\mathrm {p}\right) ^2\) are simply deleted from the model according to a certain criterion, e.g., all eigenvalues greater than the average are retained, or the largest eigenvalues explaining a certain proportion of the total variance are retained [23, pp. 347–348].
The tCWT method provides a further solution to the dimensionality problem. It uses explicitly the special fact that the random vectors represent ERP curves, i.e., continuous functions of time. The continuous wavelet transform (CWT) [8, 9] and Student’s ttest are used to extract certain “features” of the ERP curves, which build the “feature space” whose dimensionality is substantially smaller than that of the original space. All the standard multivariate procedures described above (PCA, Hottelling’s test, LDA) can then be performed in this feature space.
Here, it is interesting to mention an alternative, recently proposed, dimensionality reduction approach based on EffectMatched Spatial (EMS) filtering [24]. While tCWT reduces the dimensionality of the ERP time curves, independently for each single channel, the EMS filtering method reduces the number of channels (to one), independently for each single time point, thus representing the multichannel ERP by a single “surrogate timecourse”. Thus, in a certain sense, EMS filtering can be seen as complementary to tCWT and, for particular purposes, the two methods can be used in combination with each other.
The tCWT method
The tCWT method has already been described in the continuous notation [7]. Here, it is presented in the discrete vector and matrix notation as well, because this discrete representation is the one that is used in the computational algorithm [16, 17].
CWT
The CWT [8, 9] \(w^k_n(s,t)\) of the EEG signal \(v^k_n(t)\) of the kth channel of the nth ERP trial is given by
where \(\psi (t)\) is a wavelet function which is (a) well localized in both time and frequency, and (b) has a zero mean:
The approximate position of \(\psi \big ((\tau t)/s\big )\) in time is determined by the time shift t, while the scale s, which is the inverse frequency, defines the approximate position in the frequency domain (Fig. 1a).
The CWT (16) is a linear transformation, which means that it can be represented by a matrix \(\mathbf {T_w}\) such that
where the random vectors \(\mathbf {v}\) and \(\mathbf {w}\) represent the ERP in the time domain and in the timefrequency domain, respectively, and the matrices \(\mathbf {V}\) and \(\mathbf {W}\) represent the corresponding singletrial samples. The coefficients of \(\mathbf {T_w}\) in (18) are obtained by substituting \(v^k_n(t)\) and \(\psi \big ((\tau t)/s\big )\) in (16) with their respective discrete representations as in (2) and converting the integrals into corresponding sums. Note, however, that the CWT (16) is highly redundant and the wavelet space defined by \(\mathbf {w}\) is much “larger” than the original vector space defined by \(\mathbf {v}\). Note also that \(\mathbf {T_w}\) is actually a block diagonal matrix built from K identical blocks, one per channel. The tCWT computer application [16, 17] uses only one CWT block which is applied to each channel.
tCWT
Now, consider again two experimental conditions \(\mathcal {A}\) and \(\mathcal {B}\) (\(\mathcal {A}\)\(\mathcal {B}\) design), and the corresponding two samples of sizes M and N, respectively, of Kchannel ERPs represented by the random curves \(v^k_{{\!\!_\mathcal {A}}m}(t)\) and \(v^k_{{\!_\mathcal {B}}n}(t)\), where: \(k=1,2,\ldots K\); \(m=1,2,\ldots M\); \(n=1,2,\ldots N\); and \(0\le t<T\). The corresponding CWTs are computed by (16). Then, Student’s twosample tvalue \(\mathtt {t}^k(s,t)\) is computed for each k and each scaletime point (s, t) from the corresponding CWT values \(w^k_{{\!\!_\mathcal {A}}m}(s,t)\) and \(w^k_{{\!_\mathcal {B}}n}(s,t)\):
where \(\overline{w}^k_{\!\!_\mathcal {A}}(s,t)\) and \(\overline{w}^k_{\!_\mathcal {B}}(s,t)\) are the sample means and \(\sigma ^k_{{\!\!_\mathcal {A}}{\!_\mathcal {B}}}(s,t)\) is the pooled standard deviation computed from the corresponding sums of squares (SS).
In the next step, each of the points \((s^{kj},t^{kj})\), at which the functions \(\mathtt {t}^k(s,t)\) reach a local extremum, are detected. In the last step, we define the tCWT vector samples \(\mathbf {W}^\star _{\!\!_\mathcal {A}}\) and \(\mathbf {W}^\star _{\!_\mathcal {B}}\) by their respective components, the tCWT features \(w_{{\!\!_\mathcal {A}}m}^{\star kj}\) and \(w_{{\!_\mathcal {B}}n}^{\star kj}\) defined by
Finding the local extrema of a function of two variables is an analytical operation, but its result can be represented by a simple projection in the wavelet space, i.e. selecting the vector components that correspond to the points \((s^{kj},t^{kj})\) and discarding all other dimensions. This projection can be represented by the matrix \(\mathbf {T_w^\star }\) which is obtained from \(\mathbf {T_w}\) in (18) by deleting the columns corresponding to the discarded space dimensions. The tCWT vectors are obtained by substituting \(\mathbf {T_w}\) in (18) with \(\mathbf {T_w^\star }\)
where \(\mathbf {V}\) is the “total” ERP sample comprising the subsamples \(\mathbf {V}_{\!\!_\mathcal {A}}\) and \(\mathbf {V}_{\!_\mathcal {B}}\), and \(\mathbf {v}\) is the corresponding random vector.
Methods
This section provides a detailed, stepbystep, formal delineation of the tCWT algorithm. In the brief intuitive descriptions published before [7, 11], most of the details were omitted. Here, a rigorous mathematical delineation of all steps is presented for the first time.
Preprocessing
Theoretically, the tCWT could be performed directly in the time domain defined by (2). The CWT (16–18) is, however, highly redundant and computationally demanding. That is why the dimensionality of the vector space must be reduced substantially before computing \(\mathbf {w}\) and \(\mathbf {w^\star }\) (16–21).
Frequency domain representation
The first preprocessing step is based on a frequency domain representation of the ERPs by a discrete Fourier transform (DFT [22]; note the difference to the previous version of tCWT [7] in which timedependent filtering and DWT [8, 21] were performed instead of DFT). The dimensionality of the vector space is reduced by deleting all frequencies larger than \(2f_c\), where \(f_c\) is a cutoff frequency defined by a cutoff scale
This is done as follows. First, we compute the orthogonal (real) DFT matrix \(\mathbf {T_f}\). The ERP vectors (2–3) are then transformed by
Geometrically, the DFT (23) can be seen as a rotation of the axes, analogical to the PCT (4). Note, however, that \(\mathbf {T_f}\) is actually a block diagonal matrix built from zeros and K identical DFT blocks, one per channel. (The tCWT computer application uses only one copy of the DFT block which is applied to each channel.)
As next, we retain only those columns of \(\mathbf {T_f}\) that correspond to frequencies \(f^j\) fulfilling the cutoff condition
A “reduced” matrix \({\hat{{\mathbf {T}}}}_{\mathbf {f}}\) is obtained from \(\mathbf {T_f}\) by deleting all columns corresponding to frequencies \(f^j>2f_c\). The reduced DFT is then given by
As in (5), the “inverse” transform (i.e., the rotation back to the time domain axes) is given by
Note, however, that, since \(\hat{\mathbf {{T}}}_{\mathbf {f}}\) is not a square matrix, \(\hat{\mathbf {{v}}}\) and \(\hat{\mathbf {{V}}}\) are filtered versions of \(\mathbf {v}\) and \(\mathbf {V}\).
In order to smooth the cutoff, the vector components corresponding to the frequencies \(f^j\) of the last octave \(f_c<f^j\le 2f_c\) are attenuated gradually. This is done by multiplication with a diagonal matrix \(\mathbf {R_f}\) whose diagonal elements are given by the values of an envelope function r(f) such that \(r(f^j)=1\), for \(f^j\le f_c\), and \(r(f^j)=2f^j\!/\!f_c\), for \(f_c<f^j\le 2f_c\). Similarly, the vector components in the time domain, \(\mathbf {v}\) and \(\mathbf {V}\), can also be multiplied with an appropriate window function before DFT. This is done by left multiplication of \(\hat{\mathbf {{T}}}_{\mathbf {f}}\) with a diagonal matrix \(\mathbf {R_t}\) whose diagonal elements are given by the values of the corresponding envelope function. The current tCWT implementation uses a modified Tukey window [25] defined by the envelope function
where \(T_\mathrm {in}\) is the fadein time and \(T_\mathrm {out}\) is the fadeout time.
Chaining all transformations together, we obtain
where \(\tilde{\mathbf {{T}}}_{\mathbf {f}}\) is defined by
For some purposes (e.g. visualization), it might be useful to represent the filtered ERP back in the time domain by the “inverse” transform (26):
The number of frequency components per channel is
where T is the length of the time interval. The dimension of the frequency domain space (i.e., the number of rows of \(\hat{\mathbf {{T}}}_{\mathbf {f}}\) or \(\tilde{\mathbf {{T}}}_{\mathbf {f}}\)) is then \(KN\!_F\), where K is the number of EEG channels. Substituting (22) in (31) we obtain the following approximation:
In the tCWT software [16, 17], the timetofrequency domain transformation (28) is implemented by the function tcwt_t2f; the preceding computation of the transformation matrix \(\tilde{\mathbf {{T}}}_{\mathbf {f}}\) (22–29) is implemented by the function tcwt_prm2mat (Table 1).
PCA and outlier detection
The multivariate outlier detection procedure proposed here is heavily based on PCA not only for computational reasons (due to small eigenvalues), but also because a multivariate outlier can strongly influence the dimensionality of the model, producing “fake” dimensions that survive PCA unless the outlier is excluded from the computation of the covariance matrix [23, p. 373].
An important distinction should be made at this point. PCA, as commonly used in ERP applications [26, 27] is performed in the time domain and it usually includes an additional rotation of the axes [28] following the initial one (4). This second rotation is aimed at obtaining more meaningful components, which, however, are not uncorrelated. In tCWT, where PCA is only used in the pre/postprocessing (see below), it does not include any additional rotations and the resulting components remain uncorrelated.
Consider an ERP represented via (28) in the frequency domain by the random vector \(\tilde{\mathbf {{v}}}_{\mathbf {f}}\) and the singletrial sample \(\tilde{\mathbf {{V}}}\!_{\mathbf {f}}\), the latter comprising two subsamples \(\tilde{\mathbf {{V}}}\!_\mathbf {{f\,{\!\!_\mathcal {A}}}}\) and \(\tilde{\mathbf {{V}}}\!_\mathbf {{f\,{\!_\mathcal {B}}}}\) corresponding to two different experimental conditions; \(\tilde{\mathbf {{V}}}\!_\mathbf {{f\,{\!\!_\mathcal {A}}}}\) and \(\tilde{\mathbf {{V}}}\!_\mathbf {{f\,{\!_\mathcal {B}}}}\) may in turn comprise subsamples of trials obtained from different individuals. PCT is performed according to (4), using the total covariance matrix obtained directly from \(\tilde{\mathbf {{V}}}\!_{\mathbf {f}}\) (i.e. ignoring the subsample structure of \(\tilde{\mathbf {{V}}}\!_{\mathbf {f}}\))
Then, components (represented by columns of \(\mathbf {V\!_p}\) and \(\mathbf {T_p}\)) corresponding to eigenvalues smaller than a certain cutoff value (defined by one of the criteria mentioned above) are temporarily removed. The remaining \(Q_\mathrm {p}\) variables \(v_\mathrm {p}^i\), \(i=1,2,\ldots Q_\mathrm {p}\), are normalized by (7). From the normalized variables \(x^i\), for each n, the Mahalanobis distance \(D_n\) from the nth singletrial ERP \({\mathbf {x}}_n\) to the total mean \(\mathbf {\overline{x}}\) is computed according to (8)
With growing number of trials, each of the terms of the sum in (34) rapidly converges to the square of a standard normally distributed random variable. Hence, \(D^2\) is approximately \(\chi ^2\)distributed. With growing \(Q_\mathrm {p}\), the square root of a \(\chi ^2\)distributed random variable converges rapidly to a normal distribution as well, which means that \(D=\sqrt{D^2}\) is approximately normally distributed.
The nth ERP trial is temporarily marked as an outlier if
where \(\overline{D}\) is the mean, \(\sigma \!_{\mathbf {\tiny D}}\) is the standard deviation of D, and C is a heuristically chosen coefficient, (usually \(C\ge 2.5\)).
The steps described above are repeated iteratively. Trials marked as outliers (represented by rows of \(\mathbf {V\!_p}\)) are excluded from the PCA and the computation of \(\overline{D}\) and \(\sigma \!_{\mathbf {\tiny D}}\) in the next iteration, but then they are tested again by (35) together with all other trials. Principal components are also excluded for one iteration only until their number remains unchanged through two consecutive iterations. After that, the PCA criterion is not applied any more and PCT is performed further with a fixed number of components. This is done in order to facilitate convergence. Also, in order to prevent oscillatory behavior, if the number of marked outliers does not increase after the current iteration, the outliers detected in the previous iteration are marked again together with those detected in the current iteration. The procedure ends when the set of detected outliers does not change any more (i.e. the same trials are marked in two consecutive iterations).
If \(\tilde{\mathbf {{V}}}\!_{\mathbf {f}}\) comprises different individual datasets, a whole dataset is marked as an outlier (with all its trials) if a certain percentage of its trials are already marked. This criterion is applied, however, only if the number of singletrial outliers does not increase at the end of the current iteration. Note that, like single trials, whole data sets excluded at a certain step, can nevertheless be included again later.
As a result of the procedure described above, both rows of \(\tilde{\mathbf {{V}}}\!_{\mathbf {f}}\) representing singletrial outliers and columns of \(\mathbf {T\!_p}\) representing “noise components” or “outlier components” are deleted. The reduction of dimensionality is thus represented by the reduced PCT \(\hat{\mathbf {{T}}}_{\mathbf {p}}\) and the corresponding “reduced” ERP matrices \(\hat{\mathbf {{V}}}\!_{\mathbf {p}}\) and \(\hat{\mathbf {{v}}}_{\mathbf {p}}\) where
We use the “inverse” PCT \(\hat{\mathbf {{T}}}_{\mathbf {p}}^{\mathsf {T}}\) to represent the dimensionality reduction in the frequency domain
From (36) and (37), we finally obtain
Note that in (38) we assume that all rows of \(\tilde{\mathbf {{V}}}\!_{\mathbf {f}}\) corresponding to outliers have already been deleted.
It is important to emphasize that, the principal components obtained by this procedure are identical with those which would be obtained if PCA were performed in the time domain using the filtered ERP, \(\tilde{\mathbf {{v}}}\) (30). This is so, because the principal component axes that diagonalize the covariance matrix are unique (although their representation and the corresponding PCT depend on the representation of the input sample). The frequency domain representation is solely a matter of computational convenience due to dimensionality reduction by frequency filtering. Note also that although the dimensionality of the model is further reduced by the statistical “PCA filtering” (38), the dimensionality of the frequency domain representation remains unchanged.
The above procedure can be additionally applied to each of the two subsamples \(\tilde{\mathbf {{V}}}\!_{\mathbf {f\,{\!\!_\mathcal {A}}}}\) and \(\tilde{\mathbf {{V}}}\!_{\mathbf {f\,{\!_\mathcal {B}}}}\) separately, using the principal components obtained from the whole sample \(\tilde{\mathbf {{V}}}\!_{\mathbf {f}}\). The only difference is that no PCA is done any more (because the components have already been fixed).
It is important to note that the preprocessing procedures described above can be used independently from the tCWT feature extraction (described below). For instance, the ERP sample \(\tilde{\mathbf {{V}}}_{\mathbf {f}}^{\mathbf {p}}\) (filtered and free from outliers) can be represented back in the time domain by (26):
and then used as input for other assessment procedures. In this article, the representation (39) is used solely for visualization purposes (see Fig. 2).
In the tCWT software [16, 17], the PCAbased multivariate outlier detection procedure (33–38) is implemented by the function tcwt_f2pc; outlier detection for each experimental condition separately with fixed PCT obtained by tcwt_f2pc is implemented by the function tcwt_pc2cnd2ri (Table 1).
tCWT
Loggrid sampling
In (18), the number of rows of the CWT matrix \(\mathbf {T_w}\) is equal to the number of components of \(\mathbf {w}\), which is equal to the number of sampling points in the stplane of the wavelet \(\psi \big ((\tau t)/s\big )\) in (16). This number can be significantly reduced by using the loggrid introduced in [7] instead of a regular sampling grid. The vertices \((s^g,t^{g,h})\) of the loggrid (Fig. 1b) are defined by
where \(S_0\) is some unit scale, R is the scaleinvariant sampling rate measured in points per scale (pps), and g and h take integer values (including negative and zero). The scale invariance can be expressed by the two properties: \(s^{g+R}=2s^g\) (i.e., we have R grid lines per octave), and \(t^{g,h+R}=t^{g,h}+s^g\) (i.e., on each grid line, we have R sampling points per time interval of length \(s^g\)). The special case \(R=1\) yields the DWT with its dyadic structure [8, 21]. The tCWT application uses only the part of the infinite loggrid (40) confined by the rectangle
In (41), the minimal scale \(S_c/2\) corresponds to the maximal frequency \(2f_c\) in (24). For a large number of loggrid vertices \(N\!_G\), a good approximation is given by
Thus, the number of CWT sampling points is significantly reduced, compared to the number of vertices of a rectilinear grid with a regular spacing defined in the same rectangle (41), e.g., by the original sampling frequency \(R_0\) applied to both axes as in (1). Note that the sampling frequency R of the loggrid can be chosen to correspond to the original sampling rate \(R_0\) by setting \(R\!=\!S_c R_0\), but this is not necessary and R can as well take an independent value \(R\!\ne \!S_c R_0\).
In the tCWT software [16, 17], the loggrid \((s^g,t^{g,h})\) (40–41) is implemented by a matrix of vertices computed by the function tcwt_prm2mat (Table 1).
CWT and tCWT from the frequency domain
As in (18), we define the CWT \({\hat{\mathbf {T}}}_{\mathbf {w}}\) of the filtered ERP defined by (26) as
Substituting (26) in (43), we obtain
where
The rows of the inverse DFT \(\mathbf {T_f^\mathsf {T}}\) are the discrete vector representations of the basic EEG oscillations \(\sin (2\pi f^j t)\) and \(\cos (2\pi f^j t)\) with frequencies \(f^j\!=\!j/T\), where \(j=0,1,2...\) such that \(f^j\le 2f_c\) according to (24). Hence, \(\tilde{\mathbf {{T}}}_{\mathbf {w}}\) is computed by (16) using the loggrid sampling (40–42). The convolution integrals are represented by the corresponding sums with the original sampling rate \(R_0\).
Like \(\mathbf {T_w}\) (18), \(\tilde{\mathbf {{T}}}_{\mathbf {w}}\) is a block diagonal matrix built from K identical blocks (one per channel). The size of each block is \(N\!_F\!\times \!N\!_G\) where \(N\!_F\) and \(N\!_G\) are the number of frequency components (31, 32) and the number of loggrid vertices (42), respectively. From (32) and (42) we obtain the following approximation:
The current implementation of tCWT uses a Mexican Hat wavelet defined by
Note that (47) differs from the standard definition of the Mexican Hat, \(\psi (t)=(1t^2)\exp (t^2/2)\). The unity scale defined by (47) is four times larger than the standard. This is done for convenience: defined in this way, the scale corresponds better to the durations of the ERP waves matched by the wavelet and to the periods of the oscillations \(\sin (2\pi f^j t)\) and \(\cos (2\pi f^j t)\) in (45).
In order to exclude the outliers detected above, we apply the obtained CWT \(\tilde{\mathbf {{T}}}_{\mathbf {w}}\) to the “reduced” matrices \({\tilde{\mathbf {v}}}_{\mathbf {f}}^{{\mathbf {p}}}\) and \(\tilde{\mathbf {V}}_{\mathbf {f}}^{{\mathbf {p}}}\) defined by (38)
The tCWT features are computed by (19, 20) and the tCWT matrix \(\tilde{\mathbf {T}}^\star _{\mathbf {w}}\) is obtained from \(\tilde{\mathbf {T}}_{\mathbf {w}}\) by retaining only the columns that represent the tCWT features (20). Substituting \(\tilde{\mathbf {T}}_{\mathbf {w}}\) in (48) with \(\tilde{\mathbf {T}}^\star _{\mathbf {w}}\), we obtain
In the tCWT software [16, 17], the computation of the CWT matrix \(\tilde{\mathbf {{T}}}_{\mathbf {w}}\) (16–18, 43–45) is implemented by the function tcwt_prm2mat; the computations of the tCWT scalogram \(\mathtt {t}^k(s,t)\) (19), the tCWT extrema and the tCWT matrix \(\tilde{\mathbf {{T}}}^\star _{\mathbf {w}}\) (49) are implemented by the functions tcwt_f2cwss and tcwt_f2x (Table 1).
Postprocessing in the feature space
The tCWT features \(\mathbf {w^\star }\) are still strongly correlated, because one and the same ERP component is found in more than one EEG channel represented by at least one extremum in each channel’s subscalogram. Furthermore, not all such sets of extrema represent significant ERP components. For these reasons, the dimensionality of the feature space is reduced further by PCA and stepdown selection of principal components. Finally, the LDF is computed in the reduced feature space.
PCA and stepdown test
PCT is performed in the feature space according to (4)
and the set of components is reduced according to one of the PCA criteria mentioned above. Then, a subset of components, “selected principal components” (SPC), is selected by a stepdown test [23, pp. 111, 177, 217] based on the natural ordering of the components (sorted by eigenvalue). The “reduced” matrices \({\hat{\mathbf {T}}}^{\mathbf {\star }}_{\mathbf {p}}\), \({\hat{\mathbf {W}}}^ {\mathbf {\star }}_{\mathbf {p}}\) and \({\hat{\mathbf {w}}}^{\mathbf {\star }}_{\mathbf {p}}\) are obtained by deleting the columns corresponding to the eliminated components in \(\mathbf {T^\star _p}\), \(\mathbf {W^\star _p}\) and \(\mathbf {w^\star _p}\), respectively:
LDA
The LDF \(\mathbf {d_w^\star }\) is computed in the reduced feature space as in (11):
where \({\hat{\mathbf {\overline{w}}}^{\mathbf {\star }}_{\mathbf {p\,{\!\!_\mathcal {A}}}}}\) and \({\hat{\mathbf {\overline{w}}}^{\mathbf {\star }}_{\mathbf {p\,{\!_\mathcal {B}}}}}\) are the respective means of the two subsamples \({\hat{\mathbf {W}}^{\mathbf {\star }}_{\mathbf {{p\,{\!\!_\mathcal {A}}}}}}\) and \({\hat{\mathbf {W}}^{\mathbf {\star }}_{{\mathbf {p}}\,{\!_\mathcal {B}}}}\), and \({\hat{\mathbf {S}}^{\mathbf {\star }}_{{\mathbf {p}}\,{\!\!_\mathcal {A}}{\!_\mathcal {B}}}}\) is the pooled covariance matrix. The LDA separation plane is defined as in (13–15) by
Now, the transformations (28), (38), (49), and (51) can be chained together in order to represent the LDF \(\mathbf {d_w^\star }\) and the LDF value \({\hat{\mathbf {w}}^{\mathbf {\star }}_{\mathbf {p}}\,\mathbf {d_w^\star }}\) back in the frequency domain and in the time domain:
where
and
Finally, it might be useful to mention also the continuous time domain representation of the LDF value:
where K is the number of channels, T is the length of the time window, and \(v^k(t)\) and \(d^{\star k}(t)\) are the continuous representations of \(\mathbf {v}\) and \(\mathbf {d^\star }\) respectively (see Fig. 2).
In the tCWT software [16, 17], the PCAbased stepdown reduction of the tCWT features (50–51) and the computation of the LDF \(\mathbf {d_w^\star }\) (52) and \(\mathbf {d_f^\star }\) (55) are implemented by the function tcwt_x2ld (Table 1).
Algorithm summary with links to the tCWT software
In Table 1, the processing steps delineated above are summarized and linked to the corresponding functions and input/output data files defined by the tCWT software [16] (for a detailed description of these files and functions, see the tCWT software documentation [17]). Output file names displayed in the last column of Table 1 are provided with references to corresponding mathematical variables and equations defined in this article. The wildcard symbol “ * ” denotes an ERP dataset name and indicates that the corresponding tCWT functions accept a whole list of such names as an argument and then iterate over the list, thus processing multiple ERP datasets in a single call (see the pseudocode at the bottom of Table 1).
Computational demands
The most computationally demanding procedures are the PCA and the tCWT including CWT and extremum detection from a scalogram.
PCA
The number of matrix elements of the PCT is \(K^2 N\!_F^{\,2}\). The covariance matrix has the same number of elements. Using (32) we obtain the approximate total number of elements of both matrices:
Both the memory and the processing time required for PCA are approximately proportional to \(N\!_P\). Each doubleprecision matrix element needs eight bytes of memory. The processing time depends on the central processing unit (CPU), but, as a rule of thumb, one microsecond per matrix element can be used for rough estimation of the time needed for one iteration of the PCAbased multivariate outlier detection procedure (see above).
tCWT
The approximate number of (nonzero) CWT matrix elements per channel is given by (46). The amount of memory consumed by tCWT is proportional only to this number and does not depend on the number of channels K, because the tCWT application [16, 17] performs CWT on one channel at a time. Furthermore, if the processed dataset is too big, tCWT does not transform all trials at once, but processes smaller blocks of trials (the block size is controlled by an input parameter), thus limiting the memory demand.
The time tCWT needs to process one scalogram (comprising K subscalograms) is approximately proportional to the total number of nonzero CWT matrix elements \(N\!_W\!=\!K N\!_F N\!_G\). From (46) we obtain the following approximation:
Again, as a rule of thumb, one microsecond per matrix element can be used for rough estimation of the time tCWT needs to process one scalogram (with K channels).
The tCWT software package [16, 17] includes the function tcwt_prm2info that evaluates both the exact values of \(N\!_P\) and \(N\!_W\) and their approximations computed by (58, 59) for a given set of input parameters. Further, tcwt_prm2info makes a rough estimate of the computational demands based on the simple assumption of eight bytes of working memory and one microsecond of processing time per matrix element. All this is done in real time, without actually creating the corresponding matrices and can be very useful in the planning stage of a tCWT project when the available computational resources must be taken into account.
Example
Consider, e.g., the following settings: K = 64, T = 1 s, \(S_c\) = 50 ms (\(f_c\) = 20 Hz), and R = 15 pps. Then we would have the following (exact) numbers obtained with tcwt_prm2info: \(N\!_F\) = 81, \(N\!_G\) = 13,276, \(N\!_P\) = 53,747,712, and \(N\!_W\) = 68,822,784. The memory used for PCA would be about 430 MB and for CWT (transforming 1,000 trials at ones), about 116 MB. The estimated processing time for one PCA iteration would be about 54 s, and for one CWT scalogram, about 69 s. On a powerful hardware (see below), however, these computational times can be significantly shorter (see Table 2).
Example: oddball paradigm
In this section, the tCWT is demonstrated on example ERP data [18] obtained in a passive oddball paradigm [19, 20]. Since these datasets have already been described in detail elsewhere [11], only the most important information about the experiment is provided here.
Datasets
ERP datasets were obtained from 36 healthy participants (righthanded, mean age = 27 years, 20 females) in a passive oddball task [19, 20], in which 255 standard and 45 deviant stimuli were presented at a constant rate in a randomized sequence. The standard and the deviant stimuli were 50mslong, 75dBloud sine tones, with a frequency of 1.3 and 0.8 kHz, respectively; the interstimulus interval was 0.8 s. The participants were instructed just to listen attentively to all tones (passive task). Digitized EEG [time resolution: 2 ms/step (500 Hz), voltage resolution: 0.1678 microvolts/step] was continuously recorded from nine scalp positions according to the 10–20 system: Fz, Cz, Pz, F3, F4, C3, C4, P3, and P4. All electrodes were referenced to the linked mastoids. Electrical eye activity was recorded by bipolar acquisition from the following sites: the two lateral orbital rims for horizontal eye movements, and FP2 and a site below the right eye for vertical eye movements and eye blinks. The first nine datasets (in alphabetical order) were excluded for technical reasons (in order to reduce the whole dataset archive to 100 MB for online storage and download purposes). The remaining 27 datasets were processed with tCWT.
Data processing
Before feeding the data into the tCWT processor, the raw EEG curves were segmented and corrected for eye blink and eye movement artifacts, by a standard procedure [29, 30]. The epoch length was 1 s, starting 100 ms before stimulus onset. Then, the datasets were checked for series of more than one deviant trials. Only the first trials of such series were retained, the following deviant trials as well as the first subsequent standard trial were deleted. The first ten trials of each dataset were also deleted. As a result of this reduction, in each dataset, remained 242 standard trials and 38 deviant trials. Thus, the a priori probabilities (15, 53) were \(p_s=86.4\) % for standard trials and \(p_d=13.6\) % for deviant trials. The datasets were converted from ASCII format to the internal tCWT format by the function tcwt_ascii2tmat [16, 17] and the epochs were reduced to windows of interest starting at stimulus onset and ending 600 ms later. The signals in these windows were then referenced to the 100 ms prestimulus baseline. These signals were processed by tCWT.
In its current implementation [16, 17], tCWT starts with a call to the function tcwt_prm2info which gives a rough estimate of the memory demands and the computational time (see above). The next call is to the function tcwt_prm2mat that creates and saves to a file those transformation matrices which do not depend on the data, but are functions of global input parameters only (see Table 1). These matrices are the DFT \(\tilde{\mathbf {T}}_{\mathbf {f}}\) (29) and the CWT \(\tilde{\mathbf {T}}_{\mathbf {w}}\) (45), and they depend on the original sampling rate \(R_0\) (1), the length of the time window T (1), the fadein and the fadeout times \(T_\mathrm {in}\) and \(T_\mathrm {out}\) of the window function (27), the loggrid sampling rate R (40), and the cutoff scale \(S_c\) (22). While \(R_0\) is defined by the time resolution setting of the EEG amplifier (in this case, \(R_0\) = 500 Hz), the other parameters can be varied to achieve best LDA classification results. In the current example, T, \(T_\mathrm {in}\) and \(T_\mathrm {out}\) were kept fixed: T = 600 ms, \(T_\mathrm {in}\) = 20 ms, \(T_\mathrm {out}\) = 200 ms, while R and \(S_c\) took the following values: \(R=\) 10, 15, 20, and 25 pps; \(S_c=\) 250, 100, 50, 40, and 30 ms. Note that these values of \(S_c\) correspond to cutoff frequencies \(f_c=\) 4, 10, 20, 25, and 33.3 Hz, respectively (22).
After computing \(\tilde{\mathbf {T}}_{\mathbf {f}}\) and \({\tilde{\mathbf {T}}}_{\mathbf {w}}\), the data were represented in the frequency domain by (29) using the function tcwt_t2f (Table 1). The outlier rejection procedure described above (33–38) was performed on each dataset, first on the whole dataset (by the function tcwt_f2pc), then on each of the two subsets, standard and deviant (by the function tcwt_pc2cnd2ri, see Table 1). The greatest eigenvalues explaining a certain percentage \(P\!_v\) of the variance were retained after PCT. Different values of \(P\!_v\) were tried in order to minimize the LDA classification error rates (see below): \(P\!_v=\) 95, 97, and 99 %. The outlier criterion was defined by setting \(C=2.7\) in (35). (Other values of C ranging from 2.5 to 9.9 were tried as well, but without any substantial effect on the results; see the Limitations section in the “Discussion” section.)
As a next step, the tCWT features \(\mathbf {w^\star }\) were computed (for each dataset) as described in (43–49) using the function tcwt_f2x (Table 1). The obtained set of tCWT features was reduced further by PCAs (50) and stepdown tests (51), implemented by the function tcwt_x2ld (Table 1). The same PCA criterion with the same value of \(P\!_v\) as in the outlier rejection procedure was applied. The overall \(\alpha \)level for the stepdown test was set to \(\alpha _\text {sd}=0.3\). Finally, the LDFs \(\mathbf {d^\star }\) of the individual datasets were obtained by (52), also implemented by the function tcwt_x2ld (Table 1).
LDA classification was performed according to (13–15, 53) and classification error rates were computed using the holdout method and the splithalf method [23, p. 244] for both unknown (i.e., equal: \(p_s\!=\!p_d\) = 50 %) and known (i.e., oddball: \(p_s\) = 86.4 %, \(p_d\) = 13.6 %, see above) a priori probabilities.
In the “individual holdout” method, all of the above steps but the first one (outlier rejection from the whole dataset) were performed on a dataset obtained from the original one by excluding one single trial. The LDF thus obtained was used to classify the excluded trial as standard or deviant by (14) and (15). The error rates were obtained by repeating the procedure for each single trial and each dataset and comparing the result with the true category of each trial (standard or deviant). The holdout method is very efficient, because it is almost unbiased and it uses the whole available statistical power, but it is also a computationally demanding procedure. In the tCWT software [16, 17], the individual holdout method is implemented by the function tcwt_f2holdout.
In the “individual splithalf” method, all steps were performed on the first half of the trials of each dataset (extracted by the function tcwt_f2split [16, 17]). The LDFs obtained from these “training datasets” were then applied to the second halves of the datasets, the “test datasets”. This method is quick and simple, but it has considerable loss of statistical power as a major disadvantage.
“Individual biased” error rates were also computed in order to demonstrate the bias resulting from using the same dataset (without excluding any trials) for both training and testing.
In order to demonstrate multivariate hypothesis testing, only the last 9 deviant trials and 50 standard trials of each dataset were used as training dataset, while the first half of the trials were used as test dataset. The SPCs obtained from each training dataset by the stepdown method were applied to each test dataset and subjected to Hotelling’s T\({}^2\)test. Both the individual splithalf error rates and the biased error rates, as well as Hotelling’s T\({}^2\)tests were performed with the function tcwt_f2stats [16, 17].
Finally, all individual datasets were pooled together into one large group dataset (using the function tcwt_f2pool [16, 17]). Outlier rejection (using tcwt_f2pc and tcwt_ pc2cnd2ri) was performed with the average eigenvalue as a PCA criterion in order to emphasize group features and to suppress individual and/or oscillatory features. The singletrial outlier criterion was defined by setting \(C=2.5\) in (35). (Again, experimenting with other values of C had virtually no effect on the results; see the Limitations section in the “Discussion” section.) The dataset outlier criterion was defined by setting the minimum number of trials retained to 50 % of the trials in the dataset. tCWT, PCA (again, with the average eigenvalue criterion), stepdown test, and LDA were performed as above (using tcwt_f2x and tcwt_x2ld). A variation of the holdout method, the “group holdout” was used to obtain classification error rates for the individual datasets: instead of excluding one single trial at a time, one whole individual dataset was excluded at each iteration. Each error rate obtained in this way is based on a group LDF applied to the respective excluded dataset.
The tCWT software [16, 17], provides the function tcwt_ri2ri1out that creates systematically holdout indices of the pooled group dataset which are specially designed for the implementation of the group holdout method. In a group holdout index corresponding to a given individual dataset, all single trials belonging to this dataset are marked as “outliers”. The functions tcwt_pc2cnd2ri, tcwt_f2x, tcwt_x2ld, and tcwt_f2stats can use these index files to compute the corresponding group holdout tCWT features, group holdout LDFs, and group holdout error rates. The holdout mode of operation of these functions is very similar to their normal mode outlined in the pseudocode at the bottom of Table 1 (i.e., they iterate through the list of individual datasets, loading in each iteration the corresponding group holdout index file and other group holdout input files, performing the respective transformations and computations, and saving the results in corresponding group holdout output files).
For all obtained error rates (individual holdout, individual splithalf, individual biased, and group holdout), binomial distribution pvalues were computed (by tcwt_f2stats and tcwt_f2holdout) to test the hypotheses whether these error rates were better (i.e., smaller) than the chance classification error rates defined by the a priori probabilities \(p_d=50\) % or \(p_d=13.6\) % (see above).
All tCWT computations were performed with the tCWT software [16, 17] in 64bit MATLAB 8.1.0.604 (R2013a), under GNU Linux (Scientific Linux 6), on the high performance computing cluster bwGRiD [31] using a single quadcore CPU (Intel\({}^{\circledR }\) Xeon\({}^{\circledR }\) 5150 @ 2.66 GHz) per job (i.e. for a particular combination of input parameters). The tCWT program was also tested with 64bit MATLAB 8.3.0.532 (R2014a) and 64bit GNU Octave 3.8.2 [15], under GNU Linux (openSUSE 12.3) and Windows 7, on less powerful desktop and laptop computers (equipped with dualcore CPUs).
Results and discussion
In this section, some example results are presented and discussed. These results were obtained with the tCWT method from the example datasets [18] described above.
Results from the example oddball data
Group averages of individual holdout classification error rates for different values of the cutoff scale \(S_c\), the loggrid sampling rate R, and the percentage of variance \(P\!_v\) explained by PCA are displayed in Table 2. Most of these results are visualized by bar plots in Fig. 3. Both total errors (tot) and errors for each category of trials, standard (std) and deviant (dev), are displayed. The total error rates computed by taking into account the a priori oddball probabilities \(p_s\) = 86.4 % and \(p_d\) = 13.6 % (15) were, as expected, much smaller than those computed for \(p_s\!=\!p_d\) =50 % (14). Note, however, that the error rates for deviant trials increased when the a priori probabilities were taken into account. It is also interesting to point out the following observation: while for \(p_s\) = 86.4 % and \(p_d\) = 13.6 %, the total error (Fig. 3f) was decreased by the error reduction in the classification of deviant trials (Fig. 3e), the corresponding total decrease in the case of equal a priori probabilities \(p_s\!=\!p_d\) =50 % (Fig. 3c), was caused by the decline of error of classification of standard trials (Fig. 3a).
Both the numbers in Table 2 and the plots in Fig. 3 demonstrate how the quality of the LDF \(\mathbf {d^\star }\) (52–56) can be optimized by minimizing the errors of classification through systematic variation of the values of the different input parameters \(S_c\), \(P\!_v\), R, etc.. This optimization can be very important for different applications of the tCWT method. While this is obvious for cases in which classification is the ultimate goal of the application (as, e.g., in BCI), other applications aimed at multivariate hypothesis testing may also use optimized LDFs for their purposes (see the “Discussion” section).
In Table 2, \(N\!_F\) denotes the number of frequency components per channel (31, 32), and \(N\!_G\) is the number of loggrid vertices (42). The number of nonzero CWT matrix elements (ME) is given by \(K N\!_F N\!_G\), where K is the number of channels. Table 2 shows that the ME ratio \((N\!_F N\!_G S_c^2)/(R^2 T^2)\) (see 46) is approximately constant. Table 2 also shows how the computational time depends on the number of nonzero CWT matrix elements \(K N\!_F N\!_G\) which is a function of \(S_c\), R, \(P\!_v\), and the number of channels K. This function is defined by (59) which was confirmed by the results displayed in Table 2. The computational demands as functions of \(S_c\), \(f_c\) and R are presented in graphical form in Fig. 3gi.
The total processing time for the holdout method is the product of the processing time per nonzero CWT element, the number of such elements, and the number of scalograms. For the holdout method, as applied to the example data, the latter was equal to the total number of trials = 27 datasets \(\times \) 280 trials per dataset = 7,560. As the last column of Table 2 shows, the processing time per matrix element is approximately constant. Note, however, that this value of less than a half microsecond was obtained with MATLAB and certain hardware (see above), and that the corresponding value for GNU Octave and/or other hardware may be larger.
The results for \(S_c\) = 40 ms, R = 15 pps, and \(P\!_v\) = 99 % are presented in more detail. The latter values appeared to be optimal, because, as Table 2 and Fig. 3i show, larger values of R increased substantially the computational demands with little improvement of the results.
The average ERP curves are displayed in Fig. 2a for the whole group (grand average) and in Fig. 2b for one individual participant (ID=‘GIM’). Student’s ttest results for the difference between the ERP responses to deviant vs. standard tones are displayed in Fig. 2c for the whole group and in Fig. 2d for the participant ‘GIM’. Note that, since these tvalues were not corrected for multiple comparisons, they cannot be used for inference about the statistical significance of the difference between the two ERP curves at each point in time [4, 5]. It is interesting, however, to compare the forms of these tvalue curves with the forms of the corresponding LDFs. The LDF obtained with the tCWT method is displayed in Fig. 2e for the whole group and in Fig. 2f for the participant ‘GIM’.
The grand average plots show that the ERP response to deviant stimuli relative to the response to standards was dominated by the Mismatch Negativity (peaking about 200 ms poststimulus) [32], the P3 (300 ms) [19, 20], and the Negative Slow Wave (400 ms) [33]. The individual LDF displayed in Fig. 2f shows, however, some additional oscillatory features which are not quite discernible in the time domain (Fig. 2b, d) and which, nevertheless, improve substantially the LDA classification results obtained with this LDF for the participant ‘GIM’ compared to those computed for the same participant with the group LDF displayed in Fig. 2e (see Table 3).
The different LDA error rates as well as the pvalues obtained from Hotelling’s T\({}^2\)test (with 9 deviant and 50 standard trials) for each participant, for \(S_c\) = 40 ms, R = 15 pps, and \(P\!_v\) = 99 % are displayed in Table 3. Total error rates (tot) were tested by the binomial cumulative distribution function whether they were significantly smaller than the a priori probability for deviant (dev) trials \(p_d=\) 13.6 %. The comparison between the different methods shows an advantage of the classifications based on individual tCWT features over those based on group features. Furthermore, the individual holdout method provided lower error rates than the splithalf method. This can be explained by the splithalf method’s loss of statistical power (less that 20 deviant trials remained after splitting a dataset).
Table 3 also shows the number of tCWT extrema \(N\!_X\) obtained with each method for each data set and the number of SPCs used for classification. While the error rates obtained with the individual biased method are incorrect, the numbers of tCWT features \(N\!_X\) and the numbers of SPCs \(N\!_P\) are the correct numbers obtained from the whole datasets. They can also be seen as good approximations for the corresponding values obtained with the holdoutmethod, in which, of course, they vary with each iteration (and are not displayed in Table 3 for this reason). These numbers show that the originally very large dimensionality of the data was reduced to less than 200 (correlated) tCWT extrema which in turn were reduced further to a few (12 or less, uncorrelated) SPCs. The results from Hotelling’s T\({}^2\)test (Table 3, last column) show that, for the most individual datasets, 9 deviants and 50 standards were enough for the ERP difference to reach statistical significance.
Discussion
Table 3 shows that all error rates vary largely among the different individual datasets. While the inherent EEG noise in the data is responsible for one portion of the error, participants’ inattention to the stimuli provides an additional source of systematic error. Thus, error rates can be used as inverse measures of attention.
A much more straightforward approach, however, would be to use directly the LDF value \(\mathbf {vd^\star }\) (52–57) to measure the ERP difference between the two experimental conditions and to interpret this difference as a measure of participants’ attention to the stimuli. This could be done in different ways as described in the following three cases.
In the first case, both \(\mathbf {v}\) and \(\mathbf {d^\star }\) are obtained from the same ERP sample \(\mathbf {V}\), comprising two subsamples \(\mathbf {V}_{\!\!_\mathcal {A}}\) and \(\mathbf {V}_{\!_\mathcal {B}}\) obtained under two different experimental conditions \(\mathcal {A}\) and \(\mathcal {B}\) (\(\mathcal {A}\)\(\mathcal {B}\) design). Then, the mean difference LDF value \(\left( \mathbf {\overline{v}}_{\!\!_\mathcal {A}}\!\mathbf {\overline{v}}_{\!_\mathcal {B}}\right) \mathbf {d^\star }\) is exactly the mean Mahalanobis distance (10–12) computed in the feature space. This same Mahalanobis distance is used in Hotelling’s T\({}^2\)test, which, however, should not be used directly in this case, because the tCWT features are extracted from the very sample that is tested. As already mentioned in the Background section, a multivariate randomization test based on Hotelling’s T\({}^2\)statistic can be used instead [11].
In the second case, the LDF \(\mathbf {d^\star }\) is computed from the same sample as \(\mathbf {v}\), but, this time, using tCWT features obtained from a training dataset, \(\mathbf {V}_{\!\!_\mathcal {Z}}\) (by an \(\mathcal {A}\)\(\mathcal {B}\) design within \(\mathcal {Z}\)). In this case, Hotelling’s T\({}^2\)test may be used, as this was done above in the assessment of the example data.
The third case, in which both the tCWT features and the LDF \(\mathbf {d^\star {\!}_{\!\!_\mathcal {Z}}}\) are computed from a training dataset \(\mathbf {V}_{\!\!_\mathcal {Z}}\), while \(\mathbf {v}\) is drawn from a test dataset \(\mathbf {V}\), is the most important and will be discussed in more detail. As already mentioned above, this is the usual scenario in a typical BCI application, which was also demonstrated above by the splithalf method in the assessment of classification error rates. But the representation of the Mahalanobis distance (12) suggests another usage of the LDF apart from single trial classification. Namely, we can construct a new estimator \(D^2_{\!\!_\mathcal {Z}}\) of the Mahalanobis distance by using \(\mathbf {d^\star {\!}_{\!\!_\mathcal {Z}}}\) instead of \(\mathbf {d}\):
Since \(\mathbf {d^\star {\!}_{\!\!_\mathcal {Z}}}\) is computed from the training dataset \(\mathbf {V}_{\!\!_\mathcal {Z}}\), it can be treated as constant in all tests and analyses concerning the test dataset \(\mathbf {V}\). Consequently, \(\mathbf {v}_{\!\!_\mathcal {A}}\mathbf {d^\star {\!}_{\!\!_\mathcal {Z}}}\), \(\mathbf {v}_{\!_\mathcal {B}}\mathbf {d^\star {\!}_{\!\!_\mathcal {Z}}}\), and \(D^2_{\!\!_\mathcal {Z}}(\mathbf {v}_{\!\!_\mathcal {A}},\mathbf {v}_{\!_\mathcal {B}})\) are all (univariate) normally distributed and can be subjected to standard univariate tests. It should also be noted that in (60) a whole complex pattern recognition scheme derived from \(\mathbf {V}_{\!\!_\mathcal {Z}}\) (PCA, tCWT, etc.) is imposed on \(\mathbf {V}\) by simple multiplication.
The significance of (60) reaches, however, beyond mere computational convenience, because it conveys a whole concept of multivariate ERP assessment. While, in most BCI applications, this concept is clearly the most effective for single trial classification, and, therefore, also the standard one, it has not been used in other ERP applications for testing hypotheses about mean ERP differences. (Note that mass univariate analyses [4, 5], which have become popular recently, represent a different concept.) In the following, some general ideas about possible applications of this multivariate concept are presented.
The two kinds of tCWT features, group and individual, that can be extracted from the data suggest two different kinds of diagnostic applications of the tCWT method: withinsubject and betweensubject. Consider an ERP paradigm testing a certain cognitive function in a group of individuals under three different conditions: \(\mathcal {X}\), experimental condition (e.g. under the influence of a drug); \(\mathcal {Y}\), control condition (e.g. placebo); and \(\mathcal {Z}\), standard condition (no substances administered). Assume that the cognitive function of interest is reflected by the ERP difference between two subconditions, \(\mathcal {A}\) and \(\mathcal {B}\), (\(\mathcal {A}\)\(\mathcal {B}\) design, as above). Now, the individual tCWT features, SPCs, and LDFs obtained from \(\mathcal {Z}\) can be used to assess the difference between \(\mathcal {X}\) and \(\mathcal {Y}\) by means of Student’s twosample (\(\mathcal {X}\)\(\mathcal {Y}\)) ttest of the Mahalanobis distance (60) and/or by comparison of classification error rates. This is an example of a withinsubject application of the tCWT method. Note that the tCWT method described above for the case of an \(\mathcal {A}\)\(\mathcal {B}\) design can be easily extended to the case of only one subcondition, \(\mathcal {A}\), when the ERP \(\mathbf {v}\) is compared to 0 (\(\mathcal {A}\)0 design).
As an example of a betweensubject application, consider an ERP paradigm testing a certain cognitive function in three different samples of individuals: \(\mathcal {X}\), a sample of patients suffering a certain disorder which has the impairment of this cognitive function as a symptom; \(\mathcal {Y}\), a sample of healthy individuals; and \(\mathcal {Z}\), a mixed sample of patients and healthy individuals. In this case, the group tCWT features, SPCs, and LDFs obtained from \(\mathcal {Z}\) can be used to assess the difference between \(\mathcal {X}\) and \(\mathcal {Y}\). Further, the tCWT method can be applied directly to the ERP difference between \(\mathcal {X}\) and \(\mathcal {Y}\) and the resulting tCWT features, SPCs, and LDFs can be used for classification and diagnostics of individuals from the general population (i.e. to determine whether an individual suffers from this particular symptom or not). The greatest problem in the case of a betweensubject comparison would be the substantial increase in variance due to individual differences.
Finally, in both withinsubject and betweensubject applications, the LDF value (54–57) can be used as a measure of the magnitude of the (differential) ERP response in each of the conditions \(\mathcal {X}\), \(\mathcal {Y}\), and \(\mathcal {Z}\) in order to investigate its relationship with other behavioral measures or with the amplitudes of single (classical) ERP components.
Limitations
All of the above ideas about possible applications of the tCWT method assume the existence of a cleverly designed ERP paradigm that tests exclusively (changes in) a particular (set of) cognitive function(s) of interest without the resulting ERP differences being affected by (changes in) other cognitive functions. In many cases, however, this assumption might not be true. This is a conceptual limitation, not only of the tCWT method, but of any multivariate ERP assessment method. In certain cases, the amplitudes of single ERP components might be better measures of particular cognitive functions of interest.
A purely technical limitation of the tCWT method is imposed by the computational demands of its current software implementation [16, 17]. For instance, the usage of dense electrode arrays combined with long time windows and small cutoff scales, might result in practical unusability of the application, if no access to adequate computational resources (e.g. a high performance computing cluster [31]) is provided. In spite of the good scalability of the application (e.g. the computationally demanding CWT is performed one channel at a time, the maximum number of single trials processed by CWT is controlled by an input parameter, thus limiting working memory usage), the increase of computational time can be handled only by parallel execution of tCWT jobs on several powerful CPUs.
A notable limitation of the current study is the lack of evidence for (or against) the usefulness of the novel PCAbased multivariate outlier detection procedure introduced above. The results of the (withinsubject) assessment of the example oddball data were not sensitive to variations in the number of outliers detected and excluded by the procedure, depending on the value of the input parameter C (35). Whether zero or as much as 20 % of the trials were rejected, the statistical assessments were practically not affected by these changes and the corresponding results remained virtually the same. It should be mentioned, however, that some unpublished evidence already exists, suggesting that the outlier detection procedure could be crucial in a betweensubject design. This evidence comes from the (unfinished) tCWT assessment of ERP data obtained in a previous study [34].
Conclusions
In the present article, some basic concepts of multivariate statistics were introduced as geometric notions. ERPs were defined as random vectors (points) in a metric space, in which the distance between two points was derived in a natural way from the covariance of the data. PCT and DFT were introduced as rotations in this space. LDA classification was described as computing a LDF vector, building a separation plane perpendicular to this vector, and assigning singletrial ERP points to two different categories according to their position relative to this dividing plane. CWT and tCWT were also defined as linear transformations represented by their respective matrices. All these mathematical constructs were used to provide for the first time a detailed, stepbystep, formal description of the tCWT algorithm. Its MATLAB and GNU Octave implementation was also made publicly available as free and open source code released for the first time under GPLv3 [16, 17].
A new multivariate outlier rejection procedure based on PCA in the frequency domain was introduced. The timedependent filtering and the DWT used in the previous version of tCWT [7] were replaced in the current version by simple uniform filtering via DFT. This was done solely for simplification for the sake of easier understanding and not for improvement of the method. In fact, a new version of tCWT is planned including a more flexible procedure for timedependent filtering based on PCA and DWT.
It should be noted that tCWT is essentially a feature extraction method and tCWT based classification does not necessarily imply LDA as a postprocessing procedure. Classification can be performed using other methods as well, e.g., SVM [10]. Hypotheses can be tested by both massunivariate [13] and multivariate [11] permutation/randomization tests. Moreover, as already mentioned above, tCWT can also be used in combination with other dimensionality reduction methods, e.g. EMS filtering [24]. On the other hand, the PCAbased multivariate outlier detection introduced here, can be used independently from tCWT as a preprocessing procedure in other assessment algorithms. It is also important to emphasize that although tCWT feature extraction can be computationally demanding, taking several seconds or even minutes for large scalograms (with many channels), the tCWT features, once computed, can be applied practically instantly in real time applications (e.g., BCI) via LDA or other proper classification method.
The tCWT method was demonstrated on example ERP data [18] obtained in a passive oddball paradigm. Both group and individual tCWT features were extracted from the data and were used for LDA classification of single trials and for testing mean ERP differences for each individual dataset via Hotelling’s T\({}^2\)test. Different methods for estimation of classification errors were introduced and compared with each other.
Finally, new ideas for further applications of the multivariate approach in general and of tCWT method in particular were introduced on a conceptual level in the Discussion. Some of these ideas will be tested soon in a randomized clinical trial where ERPs are used for assessment of the sustained mindful attention developed by training in a course of mindfulnessbased cognitive therapy for recurrent depression [34].
Availability of supporting data
The example datasets [18] supporting the results of this article are available at http://tcwt.de/ or http://bioinformatics.org/tcwt/ as well as in the LabArchives repository at http://doi.org/10.6070/H4MP518T.
Abbreviations
 BCI:

brain–computer interface
 CPU:

central processing unit
 CWT:

continuous wavelet transform
 DFT:

discrete fourier transform
 DWT:

discrete wavelet transform
 EEG:

electroencephalogram
 EMS:

effectmatched spatial (filtering)
 ERP:

eventrelated potential
 GPLv3:

GNU general public license, version 3
 LDA:

linear discriminant analysis
 LDF:

linear discriminant function
 PCA:

principal component analysis
 PCT:

principal component transform
 SCP:

selected principal component (by step down selection)
 SS:

sums of squares (for computation of sample variance)
 SVM:

support vector machines
References
 1.
Picton TW, Hillyard SA (1988) Endogenous eventrelated potentials. In: Picton TW (ed) Handbook of electroencephalographic clinical neurophysiology, vol 3. Elsevier, Amsterdam, pp 361–426
 2.
Smulders FTY, Kenemans JL, Kok A (1994) A comparison of different methods for estimating singletrial P300 latencies. Electroencephalogr Clin Neurophysiol/Evoked Potentials Section 92(2):107–114
 3.
Dien J, Santuzzi AM (2005) Application of repeated measures ANOVA to highdensity ERP datasets: a review and tutorial. In: Handy TC (ed) Eventrelated potentials: a methods handbook. MIT Press, Cambridge, pp 57–82
 4.
Groppe DM, Urbach TP, Kutas M (2011) Mass univariate analysis of eventrelated brain potentials/fields I: a critical tutorial review. Psychophysiology 48(12):1711–1725
 5.
Pernet C, Latinus M, Nichols TE, Rousselet GA (2015) Clusterbased computational methods for mass univariate analyses of eventrelated brain potentials/fields: a simulation study. J Neurosci Methods 250:85–93. doi:10.1016/j.jneumeth.2014.08.003
 6.
Blankertz B, Muller K, Curio G, Vaughan TM, Schalk G, Wolpaw JR, Schlogl A, Neuper C, Pfurtscheller G, Hinterberger T et al (2004) The BCI competition 2003: progress and perspectives in detection and discrimination of EEG single trials. IEEE Trans Biomed Eng 51(6):1044–1051
 7.
Bostanov V (2004) BCI Competition 2003–data sets Ib and IIb: feature extraction from eventrelated brain potentials with the continuous wavelet transform and the tvalue scalogram. IEEE Trans Biomed Eng 51(6):1057–1061
 8.
Samar VJ, Bopardikar A, Rao R, Swartz K (1999) Wavelet analysis of neuroelectric waveforms: a conceptual tutorial. Brain Lang 66:7–60
 9.
Ende M, Louis AK, Maass P, MayerKress G (1998) EEG signal analysis by continuous wavelet transform techniques. In: Kantz H, Kurths J, MayerKress G (eds) Nonlinear analysis of physiological data. Springer, Berlin, Heidelberg, pp 213–219
 10.
Kaper M, Meinicke P, Grossekathoefer U, Lingner T, Ritter H (2004) BCI competition 2003data set IIb: support vector machines for the P300 speller paradigm. IEEE Trans Biomed Eng 51(6):1073–1076
 11.
Bostanov V, Kotchoubey B (2006) The tCWT: a new ERP detection and quantification method based on the continuous wavelet transform and Student’s tstatistics. Clin Neurophysiol 117:2627–2644
 12.
Kotchoubey B (2015) Eventrelated potentials in disorders of consciousness. In: Rossetti AO, Laureys S (eds) Clinical neurophysiology in disorders of consciousness. Springer, Vienna, pp 107–123
 13.
Real RG, Kotchoubey B, Kübler A (2014) Studentized continuous wavelet transform (tCWT) in the analysis of individual ERPs: real and simulated EEG data. Front Neurosci 8:279. doi:10.3389/fnins.2014.00279
 14.
Aniyan AK, Philip NS, Samar VJ, Desjardins JA, Segalowitz SJ (2014) A wavelet based algorithm for the identification of oscillatory eventrelated potential components. J Neurosci Methods 233:63–72
 15.
Eaton JW, Bateman D, Hauberg S, Wehbring R (2014) GNU Octave Version 3.8.1 Manual: a highlevel interactive language for numerical computations. CreateSpace Independent Publishing Platform. http://www.gnu.org/software/octave/doc/interpreter/ (ISBN 1441413006)
 16.
Bostanov V (2015) tCWT 2.01: a software implementation of the tCWT method for multivariate assessment of eventrelated potentials. Free and Open Source Code for MATLAB and GNU Octave. http://tcwt.de/ or http://bioinformatics.org/tcwt/ (Also available as a Additional file 1 to the present article)
 17.
Bostanov V (2015) tCWT 2.01 Software Documentation. http://tcwt.de/ or http://bioinformatics.org/tcwt/
 18.
Bostanov V (2015) Example datasets for testing the tCWT method for multivariate assessment of eventrelated potentials. http://bioinformatics.org/tcwt/ or doi:10.6070/H4MP518T
 19.
Polich J (1987) Comparison of P300 from a passive tone sequence paradigm and an active discrimination task. Psychophysiology 24:312–320
 20.
Polich J (1989) P300 from a passive auditory paradigm. Electroencephalogr Clin Neurophysiol 74:41–46
 21.
Demiralp T, Yordanova J, Kolev V, Ademoglu A, Devrim M, Samar VJ (1999) Timefrequency analysis of singlesweep eventrelated potentials by means of fast wavelet transform. Brain Lang 66:129–145
 22.
Proakis JG, Manolakis DG (1996) Digital signal processing: principles algorithms and applications, 3rd edn. Prentice Hall, Upper Saddle River
 23.
Rencher AC (1998) Multivariate statistical inference and applications. Wiley, New York
 24.
Schurger A, Marti S, Dehaene S (2013) Reducing multisensor data to a single time course that reveals experimental effects. BMC Neurosci 14:122
 25.
Tukey JW (1967) An introduction to the calculations of numerical spectrum analysis. In: Harris B (ed) Spectral analysis of time series. Wiley, New York, pp 25–46
 26.
Dien J, Frishkoff GA (2005) Principal components analysis of eventrelated potential datasets. In: Handy TC (ed) Eventrelated potentials: a methods handbook. MIT Press, Cambridge, pp 189–208
 27.
Dien J (2010) The ERP PCA Toolkit: an open source program for advanced statistical analysis of eventrelated potential data. J Neurosci Methods 187(1):138–145
 28.
Dien J (2010) Evaluating twostep PCA of ERP data with Geomin, Infomax, Oblimin, Promax, and Varimax rotations. Psychophysiology 47(1):170–183
 29.
Gratton G, Coles MG, Donchin E (1983) A new method for offline removal of ocular artifacts. Electroencephalogr Clin Neurophysiol 55:468–484
 30.
Miller GA, Gratton G, Yee CM (1988) Generalized implementation of an eye movement correction procedure. Psychophysiology 25:241–243
 31.
bwGRiD http://www.bwgrid.de. Member of the German DGrid initiative, funded by the Ministry for Education and Research (Bundesministerium für Bildung und Forschung) and the Ministry for Science, Research and Arts BadenWürttemberg (Ministerium für Wissenschaft, Forschung und Kunst BadenWürttemberg)
 32.
Näätänen R, Paavilainen P, Rinne T, Alho K (2007) The mismatch negativity (MMN) in basic research of central auditory processing: a review. Clin Neurophysiol 118(12):2544–2590
 33.
Squires NK, Squires KC, Hillyard SA (1975) Two varieties of longlatency positive waves evoked by unpredictable auditory stimuli in man. Electroencephalogr Clin Neurophysiol 38(4):387–401
 34.
Bostanov V, Keune PM, Kotchoubey B, Hautzinger M (2012) Eventrelated brain potentials reflect increased concentration ability after mindfulnessbased cognitive therapy for depression: a randomized clinical trial. Psychiatry Res 199:174–180
Acknowledgements
This research was partially supported by Research Grants HA 1399/181 and KO 1753/111 from the German Research Society (DFG). The author also gratefully thanks the bwGRiD project [31] for the provided computational resources. The publication of this article was partially supported by the DFG and the Open Access Publishing Fund of the University of Tübingen.
Compliance with ethical guidelines
Competing interests The author declares that he has no competing interests.
Author information
Additional file
12868_2015_185_MOESM1_ESM.zip
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Eventrelated brain potentials
 ERP
 Continuous wavelet transform
 CWT
 tCWT
 Principal component analysis
 PCA
 Multivariate statistics