A toolbox for the fast information analysis of multiplesite LFP, EEG and spike train recordings
 Cesare Magri^{1},
 Kevin Whittingstall^{2},
 Vanessa Singh^{2},
 Nikos K Logothetis^{2, 3} and
 Stefano Panzeri^{1, 2}Email author
DOI: 10.1186/147122021081
© Magri et al; licensee BioMed Central Ltd. 2009
Received: 14 May 2009
Accepted: 16 July 2009
Published: 16 July 2009
Abstract
Background
Information theory is an increasingly popular framework for studying how the brain encodes sensory information. Despite its widespread use for the analysis of spike trains of single neurons and of small neural populations, its application to the analysis of other types of neurophysiological signals (EEGs, LFPs, BOLD) has remained relatively limited so far. This is due to the limitedsampling bias which affects calculation of information, to the complexity of the techniques to eliminate the bias, and to the lack of publicly available fast routines for the information analysis of multidimensional responses.
Results
Here we introduce a new C and Matlabbased information theoretic toolbox, specifically developed for neuroscience data. This toolbox implements a novel computationallyoptimized algorithm for estimating many of the main information theoretic quantities and bias correction techniques used in neuroscience applications. We illustrate and test the toolbox in several ways. First, we verify that these algorithms provide accurate and unbiased estimates of the information carried by analog brain signals (i.e. LFPs, EEGs, or BOLD) even when using limited amounts of experimental data. This test is important since existing algorithms were so far tested primarily on spike trains. Second, we apply the toolbox to the analysis of EEGs recorded from a subject watching natural movies, and we characterize the electrodes locations, frequencies and signal features carrying the most visual information. Third, we explain how the toolbox can be used to break down the information carried by different features of the neural signal into distinct components reflecting different ways in which correlations between parts of the neural signal contribute to coding. We illustrate this breakdown by analyzing LFPs recorded from primary visual cortex during presentation of naturalistic movies.
Conclusion
The new toolbox presented here implements fast and datarobust computations of the most relevant quantities used in information theoretic analysis of neural data. The toolbox can be easily used within Matlab, the environment used by most neuroscience laboratories for the acquisition, preprocessing and plotting of neural data. It can therefore significantly enlarge the domain of application of information theory to neuroscience, and lead to new discoveries about the neural code.
Background
Information theory [1, 2], the mathematical theory of communication, has been successfully used to address a number of important questions about sensory coding [3–5]. For example, information theoretic tools have been used to characterize the stimulus selectivity of neurons, by revealing the precise sensory features (or combinations of them) which modulate most reliably the responses of neurons or of neural populations [6–8]. Information theory has been used to investigate whether the fine temporal structure of neural activity contains important information which is lost when neural signals are averaged over longer time scales. These studies have shown that the precise timing of spikes measured with respect to the stimulus onset [5, 9–12], or the relative time of spikes with respect to the ongoing network fluctuations [13, 14] provides important information that cannot be extracted from the spike count. Information theory has also been used to study population codes: several studies have developed measures to quantify the impact of crossneuronal correlations in population activity [15–19] and have applied them to the study of population coding across various sensory modalities [20–22]. Another application of information theory to neuroscience is the measure of causal information transfer [23] to quantify the amount of interactions between neural populations [24].
Information theory has been used widely for the analysis of spike trains from single neurons or from small populations [3], and is now beginning to be applied systematically to other important domains of neuroscience data analysis, such as the information analysis of analog brain signals like BOLD fMRI responses [25–27] and Local Field Potentials (LFPs) [28–30]. However, the application of information theory to analog brain signals has remained relatively limited compared to the potentials it offers. One reason that has limited a wider use of the information analysis to analog signals is that estimates of information from real neurophysiological data are prone to a systematic error (bias) due to limited sampling [31]. The bias problem can be alleviated by the use of several advanced techniques [5, 32–36] or by combinations of them [31]. However, a detailed implementation and testing of these (often complicated) techniques is in many cases beyond the time resources available to experimental laboratories. Moreover, the performance of bias correction techniques was so far tested thoroughly primarily on spike trains, and much less on analog brain signals. Since the performance of bias correction techniques depends on the statistics of the data [31], it is crucial that tests and comparisons of the performance of the various techniques are also carried out on analog brain signals. Clearly, the availability of a toolbox that implements several of these accurate information calculation techniques after having validated them on analog brain signals would greatly increase the size of the neuroscience community that uses information theoretic analysis.
Another problem, which is particularly prominent when computing information quantities for multiple parallel recordings of neural activity, is the speed of computation. Multielectrode recordings now allow the simultaneous measurement of the activity of tens to hundreds of neurons, and fMRI experiments allow a broad coverage of the cerebral cortex and recording from a large number of voxels. Therefore speed of calculation is paramount, especially in cases when information theory is used to shed light on the interactions between pairs or small groups of recorded regions. In fact, the number of these subgroups (and thus the time needed to compute their information or interaction) increases fast with the number of recorded regions.
This article aims at meeting the demand for fast and publicly available routines for the information theoretic analysis of several types of brain signals, by accompanying and documenting the first release of the Information Breakdown ToolBox (ibTB). This toolbox can be downloaded from the URL http://www.ibtb.org or from the Additional Files provided with this Article (Additional file 1: ibtb.zip). The toolbox has several key features that make it useful to the neuroscience community and that will widen the domain of application of information theory to neuroscience. In particular (i) it can be used within Matlab (The Mathworks, Natick, MA), one of the most widely used environments for the collection, preprocessing and analysis of neural data, (ii) it is algorithmically optimized for speed of calculation (iii) it implements several uptodate finite sampling correction procedures (iv) it has been thoroughly tested for the analysis of analog neural signals such as EEGs and LFPs (v) it implements the information breakdown formalisms [17, 18, 37] that are often used to understand how different groups of neurons participate in the encoding of sensory stimuli.
Definitions and meaning of neural Entropies and Information
Before proceeding to describe the implementation of the Toolbox and to discuss its use, in the following we will briefly define the basic information quantities and describe their meaning from a neuroscientific perspective.
Consider an experiment in which the experimental subject is presented with N_{ s }stimuli s_{1},... and the corresponding neural response r is recorded and quantified in a given poststimulus timewindow. The neural response can be quantified in a number of ways depending on the experimental questions to be addressed and on the experimenter's hypotheses and intuition. Here, we will assume that the neural response is quantified as an array r= [r_{1},..., r_{ L }] of dimension L. For example, the experimenter may be recording the spiking activity of L different neurons and may be interested in spike count codes. In this case r_{ i }would be the number of spikes emitted by cell i on a given trial in the response window. Alternatively, if the experimenter is recording the spiking activity of one neuron and is interested in spike timing codes, the response could be quantified by dividing the poststimulus response window into L bins of width Δt, so that r_{ i }is the number of spikes fired in the ith time bin [33]. Or, the experimenter may be recording LFPs from L channels and be interested in how their power carries information. In this case r_{ i }would be quantified as the LFP power in each recording channel [29].
Unless otherwise stated, we will assume that the neural response in each element of the response array is discrete. If the signal is analogue in nature – such as for LFP or EEG recordings – we assume it has been discretized into a sufficient number of levels to capture the most significant stimulusrelated variations. For example, the power at L distinct frequencies extracted from a singletrial LFP recording will be quantized, for each frequency, into a finite number of levels [29]. This assumption is (unless otherwise stated) necessary for the analysis and the correct functioning of the algorithms in the toolbox. The reason why it is convenient to quantify the neural response as a discrete variable is that it makes it easier to quantify the probabilities necessary for information calculation (see below). For analog signals, the discretization can be circumvented only when there are suitable analytical models for the probability distribution (e.g. a gaussian distribution), in which case particular algorithms can be applied (see e.g. the Gaussian Method detailed below).
where P(rs) is the probability of observing response r when stimulus s is presented; P(s) is the probability of presentation of stimulus s, and is defined as , N_{ tr }(s) being the number of trials available for stimulus s and is the total number of trials to all stimuli. Mutual information quantifies how much of the information capacity provided by stimulusevoked differences in neural activity is robust to the presence of trialbytrial response variability. Alternatively, it quantifies the reduction of uncertainty about the stimulus that can be gained from observation of a single trial of the neural response.
The mutual information has a number of important qualities that make it well suited to characterizing how a response is modulated by the stimulus. These advantages have been reviewed extensively [3–5, 25, 27, 38]. Here we briefly summarize a few key advantages. First, as outlined above, it quantifies the stimulus discriminability achieved from a single observation of the response, rather than from averaging responses over many observations. Second, I(S; R) takes into account the full stimulusresponse probabilities, which include all possible effects of stimulusinduced responses and noise. Thus, its computation does not require the signal to be modeled as a set of response functions plus noise and can be performed even when such decomposition is difficult. Third, because information theory projects all types of neural signals onto a common scale that is meaningful in terms of stimulus knowledge, it is possible to analyze and combine the information given by different measures of neural activity (for example: spike trains and LFPs) which can have very different signal to noise ratios.
The contribution of correlations between different neural responses to the transmitted information
Neural signals recorded from different sites are often found to be correlated. For example, spikes emitted by nearby neurons are often synchronous: the probability of observing nearsimultaneous spikes from two nearby neurons is often significantly higher than the product of the probabilities of observing the individual spikes from each neuron [39–41]. LFPs recorded from different electrodes are typically correlated over considerable spatial scales [42, 43]. Moreover, neural signals can be correlated in time as well as in space. In fact, different temporal aspects of the neural activity from the same location are often correlated. For example, spike counts and latencies covary in some systems [44], and the powers of different LFPs bands recorded from the same location can also exhibit significant correlations [29].
The ubiquitous presence of correlations of neural activity across both space and time has raised the question of what is the impact of this correlation upon the information about sensory stimuli carried by a combination of distributed sources of neural activity (see [16, 45] for recent reviews). Theoretical studies have suggested that correlations can profoundly affect the information transmitted by neural populations [16, 46]. It is therefore of great interest to quantify the impact of correlations on the information carried by a population of simultaneously recorded neurons. Mutual information, Eq. (1), can tell us about the total information that can be gained by simultaneously observing the L considered neural responses, but not about the specific contribution of correlations to this total value. However, a number of informationtheoretic quantities have been developed to quantify how correlations affect the information carried by different neural signals (see below). In our information toolbox we have implemented several of such approaches, which will be briefly define and summarize in the rest of this section.
Different types of correlations affecting information
Before we describe the information theoretic tools for quantifying the impact of correlations on coding, it is useful to briefly define the types of correlations usually considered in the studies of neural population activity.
The conditional probability P_{ ind }(rs) can be computed by taking the product of the marginal probabilities of individual elements of the response array, as in Eq. (4), or alternatively by the empirical "shuffling" procedure described as follows. One generates a new set of shuffled responses to stimulus s by randomly permuting, for each element of the response array, the order of trials collected in response to the stimulus s considered, and then joining together the shuffled responses into a shuffled response array. This shuffling operation leaves each marginal probability P(r_{ j }s) unchanged, while destroying any withintrial noise correlations. The distribution of shuffled responses to a given stimulus s is indicated by P_{ sh }(rs).
Many authors further distinguish noise correlations (which, as explained above, exclude sources of correlations due to shared stimulation) from "signal correlations" [47], which are correlations entirely attributable to common or related stimulus preferences. Signal correlations manifest themselves in similarities across stimuli between the response profiles of the individual elements of the response array. For example, neurons in different channels may all have a very similar mean response to the stimuli. There are several ways to quantify the amount of signal correlations [17, 37, 47]. We will not report them in this Article, but we will only focus on quantifying their impact on the information carried by the response array (see next section).
The importance of separating noise from signal is, as revealed by theoretical studies, that signal and noise correlations have a radically different impact on the sensory information carried by neural populations (see below). In particular, signal correlations always reduce the information, whereas noise correlations can decrease it, increase it or leave it unchanged, depending on certain conditions [46, 48].
Information Breakdown
We next describe and define briefly several mathematical techniques to quantify the impact of correlations of information. Several different approaches have been proposed (see e.g. [17, 18, 21, 49, 50]). Here we present one, called the "information breakdown" [17], which takes the total mutual information I(S; R) and decomposes it into a number of components, each reflecting a different way into which signal and noise correlations contribute to information transmission. We decided to focus on the information breakdown formalism partly because it was developed by one of the authors of this article, and partly because it naturally includes several of the quantities proposed by other investigators [18, 21].
The linear term I_{ lin }is the sum of the information provided by each element of the response array. This is a useful reference term because if all the elements of the array were totally independent (i.e. with both null noise and signal correlation) then the total information transmitted by the response array would be equal to I_{ lin }.
The amount of synergistic information Syn. The difference between I(S; R) and I_{ lin }is called synergy. Positive values of synergy denote the presence of synergistic interaction between elements of the response array, which make the total information greater than the sum of that provided by each element of the response array. Negative values of synergy (called "redundancy") indicate that the elements of the response array carry similar messages, and as a consequence information from the response array is less than the sum of the information provided by each individual element. The synergy can be further broken into the contributions from signal and from noise correlations, as follows.
The signal similarity component I_{sigsim}is negative or zero and quantifies the amount of redundancy specifically due to signals correlation. We note that the negative of I_{sigsim}equals the quantity named ΔI_{ signal }which was defined in Ref. [18].
The stimulus independent correlational term, I_{corind}, reflects the contribution of stimulusindependent correlations. In general, if noise and signal correlations have opposite signs, I_{corind}is positive. In this case, stimulusindependent noise correlations increase stimulus discriminability compared to what it would be if noise correlations were absent [17, 48]. If, instead, noise and signal correlations have the same sign, I_{corind}is negative and stimuli are less discriminable than the zero noise correlation case. In the absence of signal correlation, I_{corind}is zero, whatever the strength of noise correlation.
The stimulus dependent correlational term I_{cordep}is a term describing the impact of stimulus modulation of noise correlation strength. I_{cordep}is nonnegative, and is greater than zero if and only if the strength of noise correlation is modulated by the stimulus. I_{cordep}was first introduced in Ref. [21] with the name ΔI. I_{cordep}is an upper bound to the information lost by a downstream system interpreting the neural responses without taking into account the presence of correlations [19].
Implementation
Computing environment
Our Information Breakdown Toolbox (ibTB) has been implemented in Matlab (The Mathworks, Natick, MA) and C taking advantage of Matlab's MEX technology. It has been tested on several platforms (Mac OS X, Windows 32 and 64 bits, Ubuntu Linux) and it can be downloaded, together with a documentation for its installation and its use, from our website [52].
Data input/output
As shown in Figure 2, two routines are available for the preprocessing of the input to entropy.m. buildr.m allows to build the responsematrix, to be fed to entropy.m, starting from L + 1 long arrays: the first array stores a list specifying, for each trial, which stimulus was presented to the subject while the L remaining arrays specify the L corresponding recorded responses. binr.m allows to discretize continuous responsematrices – prior to calls to entropy.m with the the Direct Method – according to a binning method chosen among a list of available binning options. This list includes the equipopulated binning (which returns responses whose stimulusindependent probabilities are approximately uniform) and different types of equispaced discretizations. Users are also given the opportunity to define their own binning strategies by easily linking their binning routine to binr.m (we refer the reader to the documentation to this function for more information on how to define custom binning methods).
Finally, information.m is a wrapper around entropy.m which directly computes the breakdown terms by combining the outputs from this main function. Its input is identical to that of entropy.m except for the list of possible outputs options which can be any or several of the following: I(S; R), I_{ sh }(S; R), syn, syn_{ sh }, I_{ lin }, I_{sigsim}, I_{ cor }, I_{corsh}, I_{corind}, I_{cordep}and I_{cordepsh}.
Direct Method
The Direct Method [33] for computing information and entropies consists in estimating the probabilities of the discrete neural responses by simply computing the fraction of trials in which each response value is observed, and then by simply plugging into the information and entropy equations the response probabilities estimated in this way. In this subsection, we describe a novel, computationally optimized algorithm for computing the empirical estimates of the probabilities, which is at the core of our toolbox and is the principal reason for its speed.
The steps required for computing (R) according to (7) are the following. First the routine has to run through all of the available trials and compute the C(r) values. The program must then loop through all the N_{ r }possible r responses, normalize each C(r) by and sum the C(r) log C(r) values together:
for trial from 1 to
read r in current trial
C(r) → C(r) + 1
end loop
for r from 1 to N_{ r }
end loop
where ℋ(R) = ∑_{ r }f(C(r)) and f (·) denotes the function f(x) = x · log x.
First of all, Equation (8) suggests that, instead of normalizing each count C(r) and then summing over r, we can instead first compute ℋ(R) and then normalize. This has the advantage of reducing the number of division operations from N_{ r }to just one.
This observation suggests that, instead of computing the final value of C(r) we can update ℋ(R) directly at each trial, inside the first loop, therefore skipping the second loop over the N_{ r }responses. The procedure for computing (R) thus becomes:
for trial from 1 to
read r in current trial
ℋ(R) → ℋ(R) + f(C(r) + 1)  f(C(r))
C(r) → C(r) + 1
end loop
normalize ℋ(R)
where the length of the loop is determined only by the number of available trials.
The previous procedure can be extended to the direct computation of (RS), (R) and (RS). Since the argument of f(·) is always an integer, we can store the values computed for f(·) and use them for the computation of all four entropic quantities. In the current implementation of the toolbox, the values of f(·) persist in memory as long as does not change. Calls to the toolbox with matrices with the same number of trials perform increasingly faster and the maximum computation speed is achieved when all values of f(x) for x = 1,..., have been computed.
The number of products required to compute P_{ ind }(s) and, consequently, also the time required to compute P_{ ind }(r), (R) and χ^{(d)}(R), increases rapidly together with the number, L, of responses.
Bias Correction for the Direct Method: plugin vs biascorrected procedures
The Direct Method relies on the empirical measure of the response probabilities as histograms of the fraction of trials in which each discrete response value was observed. Naturally, this procedure gives a perfect estimate of the information and entropies only if the empirical estimates of the probabilities equal the true probabilities. However, any real experiment only yields a finite number of trials from which these probabilities have to be estimated. The estimated probabilities are thus subject to statistical error and necessarily fluctuate around their true values. If we just plug the empirical probabilities into the information equations (a procedure often called the "plugin" procedure in the literature), then the finite sampling fluctuations in the probabilities will lead to a systematic error (bias) in the estimates of entropies and information [31]. In some cases, the bias of the plugin information estimate can be as big as the information value we wish to estimate. It is therefore crucial to remove this bias effectively in order to avoid serious misinterpretations of neural coding data [31].
Next, we describe and compare four bias correction procedures that we implemented in our toolbox. These procedures, which are among those most widely used in the literature, were selected for inclusion in our toolbox because they are applicable to any type of discretized neural response (whatever its statistics), because they are (in our experience) among the most effective, and because they are guaranteed to converge to the true value of information (or entropy) as the number of trials increases to infinity.
Quadratic Extrapolation (QE)
where a and b are free parameters that depend on the stimulusresponse probabilities, and are estimated by recomputing the information from fractions of the trials as follows. The dataset is first broken into two random partitions and the information quantities are computed for each subpartition individually: the average of the two values obtained (for each quantity) from the two partitions provides an estimate corresponding to half of the trials. Similarly, by breaking the data into four random partitions, it is possible to obtain estimates corresponding to a fourth of the trials. Finally, a and b are extrapolated as parameters of the parabolic function passing through the /2 and /4 estimates.
Panzeri & Treves (PT) Bias Correction
This correction technique computes the linear term a in the expansion (11) through an analytical approximation rather than from the scaling of the data of the QE procedure. This approximation depends on the number of response bins with nonzero probability of being observed which is estimated through a bayesianlike procedure. The implementations of this algorithm in our toolbox follows closely the one originally described in Ref. [32].
The Shuffling (sh) Procedure
where H_{ sh }(RS) is the shuffle noise entropy, i.e., the noise entropy computed after randomly permuting, independently for each response, the order of trials collected in response to a stimulus (in other words, it is the noise entropy of the distribution p_{ sh }(rs)). I_{ sh }(S; R) has the same value of I(S; R) for infinite number of trials but has a much smaller bias for finite , owing to the bias cancelation created by the entropy terms in the right hand side of Eq. (12).
This three shuffledcorrected quantities, syn_{ sh }, I_{corsh}and I_{cordepsh}, converge to the same values of their uncorrected counterparts syn, I_{ cor }and I_{cordep}, respectively, for infinite number of trials. However the bias of the shufflecorrected quantities is much smaller when the number of trials is finite. This is especially critical for the computation of I_{cordep}which is by far the most biased term of the information breakdown [36].
Bootstrap Correction
The bootstrap procedure [53, 54] consists of pairing stimuli and responses at random in order to destroy all the information that the responses carry about the stimulus. Because of finite data sampling, the information computed using the bootstrapped responses may still be positive. The distribution of bootstrapped information values (over several instances of random stimulusresponse pairings) can be used to build a nonparametric test of whether the information computed using the original responses is significantly different from zero. Moreover, the average of the bootstrapped values instances can be used to estimate the residual bias of the information calculation, which can be then subtracted out. The bootstrap evaluation and subtraction of the residual error can be applied to any method to compute information (such as the Direct Method explained above and the Gaussian Method which will be explained below), with or without one of the bias correction procedures described above. In our toolbox, bootstrap estimates can be computed for the quantities H(RS), H_{ lin }(RS), H_{ ind }(R), H_{ ind }(RS), χ(R) and H_{ sh }(RS) from which bootstrapped estimates of I(S; R) and I_{ sh }(S; R) are easily obtained. The remaining quantities, H(R) and H_{ sh }(R), are not affected by the bootstrapping.
Gaussian Method
The Direct Method, being based on empirically computing the probability histograms of discrete or discretized neural responses, does not make any assumption on the shape of the probability distributions. This is a characteristic which makes the Direct Method widely applicable to many different types of data.
An alternative approach to the Direct estimation of information is to use analytical models of the probability distributions; fit these distributions to the data; and then compute the information from these probability models. This method has been applied so far relatively rarely in Neuroscience (e.g. [55]). In fact this approach may prove difficult to apply to distributions of spike patterns since in this case appropriate analytical forms of probability distributions are usually not available. However, several situations exist for which it is possible to fit the response distribution to Gaussian functions, especially when dealing with analog brain responses and their transformations, such as Fourier transforms (see the Results section). The Gaussian Method for computing information and entropies is the one based on fitting response probabilities to Gaussian functions.
where σ^{2}(R) and  (R) are the determinants of the matrices of covariance computed across trials and stimuli, and across trials to stimulus s, respectively.
Note that the Gaussian Method – which we implemented using a straight computation of variances which are then fed into the above equations – does not necessarily require data discretization prior to the information calculation.
The advantage of the Gaussian Method is that it depends only on a few parameters that characterize the neural response (i.e., the variances and covariances of the responses), and is thus more datarobust, and less prone to sampling bias than the Direct calculation. The potential danger with this approach is that the estimates provided by Eq. (15) may be inaccurate if the underlying distributions are not close enough to Gaussians.
and ψ(·) is the polygamma function (implemented in Matlab as the psi function).
Our toolbox allows the computation of Gaussian estimates without bias correction, with the analytical Gaussian bias correction, Eqs. (16,17), and also with the quadratic extrapolation correction QE. However, using simulated data, we found that quadratic extrapolation did not correct well for bias for the Gaussian Method (results not shown). This can be understood by noting that Eqs. (16,17) indicate that a quadratic data scaling may not necessarily describe well the bias in the Gaussian case.
The Gaussian Method can also be used to compute the terms I_{ lin }, I_{sigsim}and I_{ cor }of the information breakdown [17] by approximating, H_{ ind }(R) with H_{ sh }(R), the response entropy computed after shuffling the neural responses at fixed stimulus. Note that in this case also H_{ ind }(R S) needs to be replaced by H_{ sh }(R S). The calculations of all these quantities is implemented in the Toolbox. Note that the quantity χ(R) cannot be easily computed with the Gaussian Method, thereby preventing the separation of I_{ cor }into I_{corind}and I_{cordep}.
Results and discussion
In the following we present several case studies and tests of the performance of our ibTB Toolbox on analog neural signals, such as EEGs and LFPs. We emphasize that the toolbox can be effectively applied to spike trains as well as to EEGs and LFPs. The reason why we focus our presentation on EEGs and LFPs is that the very same algorithms implemented in our toolbox have been already illustrated and tested heavily on spike trains; therefore the illustration and test on EEGs and LFPs is more interesting. We however report that we have thoroughly tested our toolbox on spike trains. In particular, we have used our Toolbox to successfully replicate a number of previously published spike train information theoretic studies from our group, reported in Refs. [8, 13, 14, 17, 31, 36, 59].
Finite sampling bias corrections of information measures from analog neurophysiological signals
We start by testing the performance of bias correction procedures on simulated data. These procedures have been previously tested on simulated spike trains [31], but not yet on analog neural signals. We therefore tested the bias corrections on realistically simulated LFPs whose statistical properties closely matched those of real LFPs recorded from V1 of an anaesthetized macaque in responses to Hollywood color movies presented binocularly to the animal (data from [29]). Details of the simulations procedure are reported in Appendix A. Our goal is to estimate the information (about which of 102 different 2.048 s long movie scenes was presented) carried by the twodimensional response array made of the power of the simulated LFP at frequencies 4 and 75 Hz. The power in each frequency was discretized into 6 equipopulated bins. Choosing the boundaries of the discretization bins so that each bin is equipopulated is a simple but effective way to obtain high information values even when discretizing an analog signal into a relatively small number of bins. This is because equipopulated binning maximizes the response entropy H(R) that can be obtained with a given number of response bins [2].
Figure 3A also shows that, in contrast to H(RS), estimates of H(R) are essentially unbiased. It is a very common finding that H(R) is less biased than H(RS) [31] because the former depends on P(r) which, being computed from data collected across all 102 stimuli, is better sampled than P(rs). However, in this specific case, the fact that H(R) is unbiased stems from the chosen response discretization method: by discretizing each analog LFP power response at the two considered frequencies into 6 equipopulated bins we obtain 36 equiprobable bidimensional responses which provide H(R) = log_{2}(36) ~ 5.17.
Figure 3C shows that the plugin estimate of information I(S; R) is biased upward: it tends to be higher than the true value for small number of trials, and then it converges to the true value as the number of trials grows. This upward sampling bias, which originates from the downward sampling bias of H(RS) (see Eq. (1)), can be as large as the true information value if the number of available trials is low (Figure 3C). Previous research [32] showed that the most crucial parameter in determining the size of the bias for the plugin estimator is the ratio between the number of trials per stimulus, N_{ tr }(s), and the cardinality of the response space, which we denote by N_{ r }. (In this particular example, the cardinality N_{ r }of the two dimensional response space equals 36). The ratio N_{ tr }(s)/N_{ r }tells us how wellsampled are the responses.Figure 3C shows that the plugin estimator requires N_{ tr }(s) to be at least two orders of magnitude larger than N_{ r }for it to become essentially unbiased, a characteristic that makes this estimator of limited experimental utility. This is why bias correction procedures are needed.
The performance of two such procedures implemented in our Toolbox (PT and QE) is reported in Figure 3C. Both corrections substantially improve the estimates of I(S; R), which, in this simulation became accurate for N_{ tr }(s) ≈ 2^{7} = 128 (i.e. N_{ tr }(s)/N_{ r }≈ 3, to be compared with N_{ tr }(s)/N_{ r }≈ 100 for pure plugin).
Figure 3D reports the effect of subtracting bias corrections based on data bootstrapping [53, 54]. Computing the bootstrap value of the plugin estimate and then subtracting it from the plugin information value is not very effective: it leads to a large overestimation of the bias and thus to a big underestimation of the information. The reason why this is the case can be understood by comparing the sampling behavior of the plugin estimator of H(RS) before (Figure 3A) and after (Figure 3B) the data bootstrapping. For large numbers of trials, bootstrapping makes H(RS) equal to H(R). However, the bias of H(RS) is much greater after bootstrapping. This is because bootstrapping enhances the support and spread of the stimulusresponse distributions P(rs), which makes these probabilities more difficult to sample [32]. As a consequence the bootstrap estimate of the bias of I(S; R) is exaggerated and its subtraction from the plugin estimate of I(S; R) leads to a severe underestimation of the information. However, if we apply a PT or QE bias correction first, and then estimate and subtract out the remaining residual error by bootstrap, we obtain a far more effective bias removal (Figure 3D). This shows that (consistent with the analytic arguments in [32]), the data bootstrapping gives a good evaluation of the residual bias error after correction, though it overestimates the actual value of the bias per se.
Finally, we considered the effect of computing information through I_{ sh }(S; R) (Eq. (12) rather than through I(S; R)). Let us consider first the sampling behavior of the plugin estimate of the four entropies that make up I_{ sh }(S; R). Because H_{ ind }(RS) depends only on the marginal probabilities of the response array, it typically has very small bias (Figure 3A). H_{ sh }(RS) has the same value of H_{ ind }(RS) for infinite number of trials, but it has a much higher bias than H_{ ind }(RS) for finite number of trials. In fact, Figure 3A shows that the bias of H_{ sh }(RS) is approximately of the same order of magnitude as the bias of H(RS). Intuitively, this is expected because P_{ sh }(rs) is sampled with the same number of trials as P(rs) and from responses with the same dimensionality [21, 36]. In this simulation, the biases of H_{ sh }(RS) and H(RS) were not only similar in magnitude but actually almost identical (Figure 3A). This, as explained in [36], reflects the fact that for the data simulated here (which represent a low and a high LFP frequency from the same electrode) the correlations among elements of the response array were relatively weak [29] and thus H_{ sh }(RS) and H(RS) were very close both in value and sampling properties. Because the biases of H_{ sh }(RS) and H(RS) almost cancel each other and since H(R) is unbiased, the bias of I_{ sh }(S; R) is almost identical to that of H_{ ind }(RS). This in turn implies that the plugin estimator of I_{ sh }(S; R) must have a much smaller bias than I(S; R), a fact clearly demonstrated by the results in Figure 3E.
Due to its intrinsically better sampling properties, I_{ sh }(S; R) has an advantage over I(S; R) not only when using a plugin estimation but also when using bias subtraction methods. Figure 3E shows that when using PT or QE corrections, I_{ sh }(S; R) can be computed accurately even when using only 2^{6} = 64 trials per stimulus (corresponding to N_{ tr }(s)/N_{ r }≈ 2). When using an additional bootstrapping procedure to subtract the residual bias (Figure 3F), the estimate of I_{ sh }(S; R) becomes almost completely unbiased, independently of the bias correction used, even with as little as 2^{5} = 32 trials (corresponding to N_{ tr }(s)/N_{ r }≈ 1).
It should be noted that this behavior applies to cases in which (like the one we simulated) correlations among elements of the response array are relatively weak. In conditions when the correlations among elements of the response array are very strong (as it is often case with both LFPs and spikes recorded from the nearby electrodes), then the sampling behavior of I_{ sh }(S; R) is still qualitatively similar to that reported here, with the main difference that in cases of stronger correlation I_{ sh }(S; R) tends to have a small downward (rather than upward) bias. This stems from the fact that in the presence of stronger correlations the bias of H_{ sh }(RS) tends to be more negative than that of H(RS) [36], and was verified extensively on simulated spike trains in previous reports [36] and by increasing the level of correlations in these simulated LFPs (results not shown).
In summary, we presented the first detailed test of bias corrections procedures (originally develop for spike trains) on simulated analog neural signals. These simulations (i) confirm that these procedures are effective also on data with statistics close to that of LFPs; (ii) show that in such case it is highly advisable to use I_{ sh }(S; R) as method to compute information; (iii) indicate that evaluating and subtracting the residual bootstrap errors of I_{ sh }(S; R) (as done in [29]) is particularly effective. We recommend this procedure to Toolbox users interested in computing information form multidimensional LFP or EEG responses.
Correlations between different frequency bands of Local Field Potentials in Primary Visual Cortex
In this section we illustrate the Information Breakdown formalism [17] to study whether signal or noise correlations between the LFP powers at different frequencies made them to convey synergistic or redundant information about visual stimuli with naturalistic characteristics. For this study, we analyzed LFPs recorded from the primary visual cortex of anesthetized macaques in response to a binocularly presented naturalistic color movie [29]. Each recording site (51 in total) corresponded to a welldefined V1 visual receptive field within the field of movie projection. From each electrode, we measured LFPs as the 1–250 Hz bandpassed neural signal. Each movie was 5 min long and was repeated 40 times in order to sample the probability distribution over the neural responses to each scene. Full details on the experimental procedures are reported in Ref. [29]. The correlation between the LFP activity in different frequency bands on this dataset was studied in Ref. [29] using only linear (Pearson) correlation. Here, we extend these previous results by using the information breakdown, which takes into account both linear and nonlinear correlations at all orders [17].
We used the informationtheoretic procedure to compute how the power of LFPs at these different frequencies reflects changes in the visual features appearing in the movie. We divided each movie into nonoverlapping time windows of length T = 2.048 s. Each window was considered as a different "stimulus", s, and the corresponding power spectra (obtained in each trial and in each window by means of the multitaper technique [60, 61]) were considered as the neural response. From the distribution across trials of the power at each frequency and stimulus window, we computed the mutual information between the stimulus (i.e. the considered movie fragment) and the joint power of the LFP at two selected frequencies f_{1} and f_{2}. (Thus, the response r was a two dimensional array [r_{1}, r_{2}] containing the power at frequency f_{1} and f_{2}, respectively.) To compute information, we used the Direct Method together with the shuffled information estimator corrected with the Quadratic Extrapolation bias correction and the bootstrap subtraction (as done in Figure 3F for simulated data).
The first maximum I(S; R) occurs when f_{1} and f_{2} are in the low (below 12 Hz) frequency range. A second, broader maximum is present for f_{1} and f_{2} in the high gamma range. The highest maximum, however, is obtained when combining a low power response with an highgamma one. An interesting question is whether the LFP powers belonging to the highly informative lowfrequency range and the highgamma range carry independent or redundant information about the stimuli, and whether any potential redundancy is due to shared sources of variability (i.e. noise correlation) or to similarities in the tuning to different scenes of the movie (i.e. signal correlation). In the following, we will use the information breakdown to address this question.
Let's first consider any two frequencies f_{1} and f_{2} belonging to the low frequency range. A comparison of I(S; R) (Figure 4A) and I_{ lin }(Figure 4B) shows that I(S; R) is only slightly less than I_{ lin }. Thus, as made explicit in Figure 4C, there is only a very small negative synergy (i.e. a positive redundancy) between low LFP frequencies. To understand the origin of this small redundancy, we used the information breakdown [17]. This formalism shows that low LFP frequencies have little redundancy not because they are independent, but because they share noise correlations whose effect cancel out. In particular, there is a negative stimulusindependent correlation component I_{corind}(Figure 4E) which is almost exactly compensated by a stimulusdependent correlation component I_{cordep}(Figure 4F). Unlike noise correlations, signal correlations have a very little specific impact on the information carried by the two frequencies (because I_{sigsim}is zero; Figure 4D). These results suggest that the lowfrequency LFP bands share a strong common source of variability and thus do not originate from entirely distinct processing pathways, even though they add up independent information about the external correlates.
We then considered the case in which f_{1} belongs to the the low frequency range while f_{2} is in the highgamma range. In this case, the powers at f_{1} and f_{2} added up independent information about the external correlates, because the joint information I(S; R) (Figure 4A) was equal to I_{ lin }(Figure 4B), and as a consequence there is zero synergy between (Figure 4C) between the low and highgamma LFP frequencies. The information breakdown [17] shows that signal correlations have no impact on the information carried by the two frequencies (because I_{sigsim}is zero; Figure 4D), and that the same applies to both stimulusdependent and stimulusindependent noise correlations (Figure 4F and 4E). Thus, low LFP frequencies and highgamma LFPs generated under naturalistic visual stimulation share neither noise nor signal correlations. They appear to be uncorrelated under naturalistic visual stimulation, and are thus likely to arise from fully decoupled neural phenomena.
We finally examined the case in which both f_{1} and f_{2} belong to the highgamma frequency range. In this case, there is considerable negative synergy (i.e. positive redundancy) between such frequencies. This redundancy can be attributed to signal correlation (because I_{sigsim}is strongly negative), which means that high gamma frequencies have a similar response profile to the movie scenes. The redundancy between high gamma frequencies is further enhanced by a negative effect of stimulusindependent noise correlation I_{corind}, Figure 4E. This can be explained by the results of a previously reported linear correlation analysis [29] which suggested a presence of a small amount of positive noise correlation between high gamma frequencies that accompanies the positive signal correlation, and with the fact that a combination of signal and noise correlation with the same sign leads to a negative I_{corind}.
It is interesting to note that the results obtained with the information breakdown are compatible with those obtained on the same dataset using a simpler linear signal and noise correlation [29]. Since the information breakdown, unlike the linear correlation theory, is able to capture the impact of nonlinear signal and noise correlations if they were present, the equivalence between linear correlation theory and information breakdown can be taken as strong evidence that the linear correlations individuated in [29] are a sufficient description of the functional relationship between LFP responses at different frequencies.
Performance of the Gaussian Method
In this section, we illustrate the accuracy and performance on real LFPs responses of the Gaussian Method. We consider again the calculation of how the LFP power encodes information about naturalistic movies, and we use again the same set of LFPs recorded from the primary visual cortex of an anesthetized macaque in response to a binocularly presented naturalistic color movie [29], which were analyzed in the previous section. We computed the information I(S; R) – about which of the 2.048 s long movie scene in which we divided the movie was being presented – carried by the LFP power at a given frequency f. The response r was a scalar, r_{ f }, containing the power at frequency f. We estimated the information I(S; R) either with the Gaussian Method or with the Direct Method.
When using the Gaussian Method, we first estimated the power in each stimulus window and trial using the multitaper technique. We then took the cubic root of this power; we fitted the distribution of this response to each stimulus to a Gaussian; and we finally computed the information through Eqs. (13) and (14) subtracting the analytic gaussian bias correction, Eqs. (16) and (17). The reason for applying the cubic root transformation is that multitaper power estimates are asymptotically chisquare distributed [60] thus their cubic root is approximately Gaussian [62]. The cubic root operation, being monotonic, does not affect the underlying information values of the power, but it makes response probabilities much more Gaussian and thus facilitates information estimation with the Gaussian Method. When using the Direct Method, we simply discretized these transformed power values into M equipopulated bins and computed information through Eq. (1) and corrected it for bias using QE. The number of response bins M was varied in the range 4–8 (see below).
A comparison between Gaussian and direct estimates may be useful to evaluate the effect of refining the discretization of neural responses. For this dataset, we found that when using more bins to discretize responses for the Direct Method (e.g. M = 10 or M = 12 bins), the direct information values I^{(d)}do not change appreciably (data not shown). However, when decreasing the number of bins to M = 4 and M = 6 the resulting scatterplot of I^{(g)}versus I^{(d)}was again distributed along a line (like in Figure 5A) but with slopes of I^{(g)}= 1.1 · I^{(d)}and I^{(g)}= 1.2 · I^{(d)}, respectively. These findings suggest several conclusions. First, differences that could be observed between gaussian and direct estimates were due to loss of information caused by poor discretization when using low number of bins for the Direct estimate. Second, using 8 response bins is sufficient to capture of the information of the LFP power and using less bins leads to very moderate information losses (of the order of 10–20%). Third, this suggests that knowledge of the second order statistics of the roottransformed powervalues is sufficient for extracting the bulk of the information from the LFP power fluctuations.
To demonstrate the sampling properties and data robustness of the Gaussian information estimates, we proceed as we did previously for the Direct Method, generating realistically simulated LFPs whose statistical properties closely matched those of real V1 LFPs (see Appendix A for a description of how data were generated). This time we considered the information about the 102 presented movie sequences carried by the power of either one, two or three different simulated LFP frequencies (i.e., the response was, respectively, one, two or three dimensional). Results are reported in Figure 5B. We found that if no bias correction was used, the Gaussian information values were all upward biased, and the bias grew with the dimensionality of the response space (Figure 5B). However, using the analytic bias correction in Eqs. (16) and (17) eliminated the bias completely, with essentially identical accuracy for all considered dimensions.
Taken together, these results indicate that the Gaussian Method can be an extremely accurate and useful tool for studying the information content of analog neural signals. Because of its great data robustness, we strongly recommend its use on any neural signal whose response probabilities are consistent with Gaussian distributions.
EEGs frequencies encoding visual features in naturalistic movies
We next demonstrate the applicability of our toolbox to the analysis of singletrial EEGs. We considered EEGs recorded from a male volunteer with a 64channel electrode cap while the subject was fixating the screen during the repeated presentation of a 5 slong naturalistic color movie presented binocularly. Full details on experimental procedures are reported in Appendix B. We then used our Toolbox to investigate which frequency bands, which signal features (phase or amplitude), and which electrode locations better encoded the visual features present in movies with naturalistic dynamics.
To understand which frequency bands were more effective in encoding the movie, we used a causal bandpass filter (see Appendix B for details) to separate out the range of EEG fluctuations at each electrode into distinct frequency bands (delta: 0.1–4 Hz; theta: 4–8 Hz; alpha: 8–12 Hz; beta: 12–20 Hz). We then extracted, by means of Hilbert transforms of the bandpassed signal, the instantaneous phase and power of the EEG fluctuations in each electrode, frequency band, and trial and examined the time course of amplitude and phase during the movie.
To compare the reliability of phase and power of the deltarange fluctuations at different points of the movie, we discretized the power of the delta band EEG from electrode PO8 at each time point into four equipopulated bins. We found that power was much less reliable across trials than phase (Figure 6D). As a consequence, we also found that power carried only 0.05 bits of information about the movie, and was thus much less informative than the delta phase from the same electrode.
Having illustrated the encoding of the movie by EEGs with an example recording channel and a selected EEG frequency range, we next characterized the behavior across all electrodes and over a wider range of EEG frequencies. Results are plotted in Figure 6E (phase information) and 6F (power information). We found that only low frequency ranges (delta) were informative, and that phase was far more informative than power at all electrodes. This is consistent with the attenuation properties of the skull, which is more likely to attenuate power but with relatively little introduction of phase shifts. The most informative regions were found in the right occipital parietal lobe covering the right visual cortices. It has been hypothesized that such informative low frequency fluctuations observed in visual cortex may reflect the entrainment to slowly changing informative features in the sensory signal [63, 64]. To gain some insights on why EEGs recorded from the right hemisphere were far more informative about the movie than those recorded from the left hemisphere, we calculated the mean pixel luminance of the movie clip over the 5 s stimulation window separately for left and right hemifields. We found that the mean luminance was greatest in the left visual field as compared to the right. This provides one potential reason to explain the lateralization of information about the movie observed in this subject.
This example demonstrates the capabilities of the information analysis to extract the most informative components of EEG signals even when using complex dynamic stimulation paradigms and illustrates the potentials of this toolbox for singletrial EEG analysis.
In order to allow users to familiarize with the Toolbox, we have included (as Additional File to this Article) the entire dataset of EEG Delta Phases for all 64 channels and all trials, together with a commented script that loads the data and computes information through the appropriate calls to the Toolbox (Additional file 2: eegtest.zip). Running the script contained in the afore mentioned file outputs the results plotted in top left plot of Fig 6E.
Comparison with other available toolboxes
Other groups have developed, or are currently developing, toolboxes for the information analysis of neural responses. Here we briefly discuss some of the relative features of current releases of other information theoretic toolboxes, and their complementariness.
Ince and colleagues [65] recently released an information theoretic toolbox for neuroscience data called pyentropy based on the Python programming language [66]. This toolbox has the so far unique feature of including an advanced and memoryefficient algorithm for the computation of entropies which are maximal under given constraints, thereby allowing an easy calculation of many of the "maximum entropy" quantities which have received substantial attention in recent years for the study of neuronal interactions [67–69]. The choice of Python as a programming language comes with several advantages provided by its open source nature, its flexibility, and its very efficient use of memory. The use of Python could however be problem for most experimental neuroscience laboratories, which currently make use of Matlab for preprocessing, analysing and plotting the data.
Another available information theoretic toolbox for spike train analysis is the Spike Train Analysis Toolkit (STAToolkit) [70, 71]. STAToolkit is, like our toolbox, based on CMEX technology and like ours can be easily used in Matlab by experimental neuroscience laboratories. A unique and important feature of STAToolkit is the large number of estimation methods for the information carried by spike trains, including techniques such as the binless estimation [72] and the metric space approach [73].
With respect to the two above toolboxes, our new toolbox presents two distinctive features. First, it is the only package which has been tested heavily non only on spike trains but also on analog brain recordings such as LFPs, and EEGs. It also includes algorithms which are specific for these signals, such as the Gaussian Method information calculation and its bias correction. The second distinctive features of our toolbox is the speed of computation. This speed advantage is not only due to the C implementation, but also to the new algorithm for fast entropy calculation that we presented here. By comparing systematically the speed of our toolbox on simulated data with the speed of pyentropy [65], we found that our toolbox has a speed advantage of typically an order of magnitude to the one of pyentropy. We found similar speed advantage in comparison to STAToolkit.
Future Directions
This paper accompanies the first release of ibTB, which we will continue to be developed over the coming years. Some features that we are working to implement in future releases include:
• Additional bias corrections procedures. Currently, we implemented some of the best known and most useful bias correction procedures for the computation of mutual information. Other important corrections exist (e.g. [34, 35]), however, which we plan to implement and include in the toolbox in the near future. Additionally, starting with the next release of ibTB, users will be given the opportunity to plugin their own custom bias correction routines linking them very easily to the main routines in the toolbox.
• Additional methods. The Gaussian Method is one of the many analytical procedures existing for the computation of entropy and mutual information: actually, several other methods are available which take into account other probability distributions [74]. The modular structure of the toolbox allows to very easily add new methods to the toolbox: these will be gradually introduced with future releases.
• fMRI analysis. We are currently in the process of testing and adapting our toolbox to its use with BOLD fMRI data. Although we developed [32] the bias procedure used successfully in recent information analysis of fMRI data some papers [26, 75], more work is needed to understand the specific problems caused by the statistics of fMRI data, and how best to use information theory to detect voxels significantly tuned to the stimuli [25]. We plan to report thorough studies of this issues on the toolbox website as soon as possible.
Conclusion
Neuroscientists can now record, simultaneously and from the same brain region, several types of complementary neural signals, such as spike, LFP, EEG or BOLD responses, each reflecting different and complementary aspects of neural activity at different spatial and temporal scales of organization. A current important challenge of computational neuroscience is to provide techniques to analyze and interpret these data [27, 76–81]. We believe that the new fast information theoretic Matlab Toolbox presented here offers a useful technology tool to analyze these complementary brain signals and understand how the brain may combine together the information carried by aspects of neural activity at these different levels of its organization.
Availability and requirements
• Project name: Information Breakdown ToolBox
• Project home page: http://www.ibtb.org
• Operating system: tested on Mac OS X, Windows 32 and 64 bits, Linux
• Programming language: Matlab (toolbox tested on R2008a and successive releases) and C
• Other requirements: Microsoft Visual C++ 2008 Redistributable Package x86 (or x64) for use on Windows 32 bit (or 64 bit) machine. The package is freely downloadable from Microsoft's website and is only required if Visual C++ is not installed.
• Licence: ibTB is distributed free under the condition that (1) it shall not be incorporated in software that is subsequently sold; (2) the authorship of the software shall be acknowledged and the present article shall be properly cited in any publication that uses results generated by the software; (3) this notice shall remain in place in each source file.
• Any restriction to use by nonacademics: none.
Appendix A – Simulation of LFP responses
We simulated the LFP power of a recording site in primary visual cortex (V1) in response to many different movie scenes. In brief, data were simulated as follows. We selected from the dataset of [29] a given example recording channel (channel 7 from animal D04), and we computed multitaper estimates of the power at three chosen frequencies (4, 25 and 75 Hz) in response to approximately 2slong scenes of Hollywood color movies presented binocularly to the animal. The multitaper technique allows to reduce the variance of the spectral estimates while keeping the bias under control: this is achieved by means of taking the average of different direct spectra computed using tapers which are orthogonal to each other (see [60] for more details). The maximum number of averaged spectra is a free parameter (named K) which is set by the user. Here we chose K = 3, thereby providing power estimates which are distributed approximately as a chisquare with 6 degrees of freedom. We then applied Wilson and Hilferty's cuberoot transformation [62]: this transformation, being monotonic does not affect the information content of the responses while making the responsedistributions to a fixed movie scene approximately gaussian (a fact that we also verified empirically). We use the same approach for simulation of multidimensional responses, by assuming that the joint distribution of the roottransformed power at two or three different frequencies during each fixed movie scene was a multivariate Gaussian. We generated many instances of this Gaussian powerresponses by means of Matlab's mvnrnd function using mean and standard deviation values which were computed, for each scene, from the real data. For entropy estimates computed using the Direct Method, the data simulated in this way have been further discretized into 6 equipopulated response bins.
Appendix B – Methods of EEG recording during presentations of short naturalistic movies
The EEG was acquired using a 64 channel electrode cap (BrainAmp MR, BrainProducts). Electrode placement followed the International 10–20 System and electrodes were all referenced to a frontal central electrode (FCz). Electrode impedances were kept below 15 KOhms. Horizontal and vertical eye movements were recorded using an electrooculogram (EOG) with electrodes placed over the outer canthus of the left eye as well as below the right eye. Subjects were comfortably seated in a dimly lit room. EEG recordings were digitally recorded at 1000 Hz with a bandpass of 0.1–250 Hz and stored for offline analysis. A small fixation cross on black background was shown in order to indicate the beginning of the trial. After 2 seconds of fixation, a 5 second movie segment (full field) was presented, followed by 2 seconds of continued fixation, resulting in trials totaling 9 seconds of fixation. A movie clip, consisting of fast moving and colorful scenes from a commercially available movie, was presented 50 times. All data analysis procedures were implemented with the Matlab programming language in combination with the EEGlab analysis toolbox [82] as described below. Postprocessing was performed using the EEGlab analysis software (Neuroscan). EEG epochs (1000 to 5000 ms temporal range) were created based on the onset of triggers recorded during the recording session. An EOG artifact correction algorithm was used to remove all trials with amplitudes that exceed ± 75 mV. After artifact rejection, 30 movie presentation trials remained.
To obtain bandpassed EEGs from each electrode, we bandpassed the raw EEG signal sampled at 1 KHz with a zerophaseshift Kaiser filter with sharp transition bandwidth (1 Hz), very small passband ripple (0.01 dB), high stopband attenuation (60 dB), and bandwidth corresponding to the considered band (e.g. 0.1–4 Hz for delta; 4 – 8 Hz for theta, etc. – see main text). These filters were exactly equal to those used for LFPs in Refs [13, 14]; we refer to these references for more details.
Abbreviations
 fMRI:

functional magnetic resonance imaging
 LFP:

Local Field Potential
 ibTB:

Information Breakdown ToolBox
 EEG:

Electroencephalogram
 BOLD:

Bloodoxygenationleveldependent
Declarations
Acknowledgements
We thank C. Kayser, A. Belistki, R. Ince, N. Ludtke, A. Mazzoni, F. Montani, M. Montemurro and G. Notaro for many useful discussions and for testing the code, and E. Molinari for useful discussions. This research was supported by the BMI project of the Department of Robotics, Brain and Cognitive Sciences at the Italian Institute of Technology, and by the Max Planck Society.
Authors’ Affiliations
References
 Shannon C: A mathematical theory of communcation. Bell Systems Technical Journal. 1948, 27: 379423.View Article
 Cover TM, Thomas JA: Elements of information theory. 2006, New York: John Wiley
 Rieke F, Warland D, van Steveninck R, Bialek W: Spikes: exploring the neural code. 1997, Cambridge: MIT Press
 Borst A, Theunissen FE: Information theory and neural coding. Nature Neuroscience. 1999, 2: 947957. 10.1038/14731.View ArticlePubMed
 Victor JD: Approaches to informationtheoretic analysis of neural activity. Biological Theory. 2006, 1 (3): 302316. 10.1162/biot.2006.1.3.302.PubMed CentralView ArticlePubMed
 Reich DS, Mechler F, Victor JD: Formal and attributespecific information in primary visual cortex. Journal of Neurophysiology. 2001, 85: 305318.PubMed
 Adelman TL, Bialek W, Olberg RM: The Information Content of Receptive Fields. Neuron. 2003, 40: 823833. 10.1016/S08966273(03)006809.View ArticlePubMed
 Arabzadeh E, Panzeri S, Diamond ME: Whisker vibration information carried by rat barrel cortex neurons. Journal of Neuroscience. 2004, 24 (26): 60116020. 10.1523/JNEUROSCI.138904.2004.View ArticlePubMed
 Optican LM, Richmond BJ: Temporal encoding of twodimensional patterns by single units in primate inferior temporal cortex: III. Information theoretic analysis. J Neurophysiol. 1987, 57: 162178.PubMed
 Schnupp JWH, Hall TM, Kokelaar RF, Ahmed B: Plasticity of temporal pattern codes for vocalization stimuli in primary auditory cortex. J Neurosci. 2006, 26: 47854795. 10.1523/JNEUROSCI.433005.2006.View ArticlePubMed
 Foffani G, Tutunculer B, Moxon KA: Role of Spike Timing in the Forelimb Somatosensory Cortex of the Rat. J Neurosci. 2004, 24: 72667271. 10.1523/JNEUROSCI.252304.2004.View ArticlePubMed
 Panzeri S, Petersen RS, Schultz SR, Lebedev M, Diamond ME: The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron. 2001, 29: 769777. 10.1016/S08966273(01)002513.View ArticlePubMed
 Montemurro MA, Rasch MJ, Murayama Y, Logothetis NK, Panzeri S: Phaseoffiring coding of natural visual stimuli in primary visual cortex. Current Biology. 2008, 18: 375380. 10.1016/j.cub.2008.02.023.View ArticlePubMed
 Kayser C, Montemurro MA, Logothetis N, Panzeri S: Spikephase coding boosts and stabilizes information carried by spatial and temporal spike patterns. Neuron. 2009, 61: 597608. 10.1016/j.neuron.2009.01.008.View ArticlePubMed
 Panzeri S, Schultz SR, Treves A, Rolls ET: Correlations and the encoding of information in the nervous system. Proc Biol Sci. . 1999, 266 (1423): 10011012. 10.1098/rspb.1999.0736.PubMed CentralView ArticlePubMed
 Averbeck BB, Latham PE, Pouget A: Neural correlations, population coding and computation. Nat Rev Neurosci. 2006, 7 (5): 358366. 10.1038/nrn1888.View ArticlePubMed
 Pola G, Thiele A, Hoffmann KP, Panzeri S: An exact method to quantify the information transmitted by different mechanisms of correlational coding. Network. 2003, 14: 3560.View ArticlePubMed
 Schneidman E, Bialek W, Berry MJ: Synergy, redundancy, and independence in population codes. J Neurosci. 2003, 23 (37): 1153953.PubMed
 Latham PE, Nirenberg S: Synergy, redundancy, and independence in population codes, revisited. J Neurosci. 2005, 25 (21): 5195206. 10.1523/JNEUROSCI.531904.2005.View ArticlePubMed
 Petersen RS, Panzeri S, Diamond ME: Population coding of stimulus location in rat somatosensory cortex. Neuron. 2001, 32 (3): 50314. 10.1016/S08966273(01)004810.View ArticlePubMed
 Nirenberg S, Carcieri SM, Jacobs AL, Latham PE: Retinal ganglion cells act largely as independent encoders. Nature. 2001, 411 (6838): 698701. 10.1038/35079612.View ArticlePubMed
 Montani F, Kohn A, Smith MA, Schultz SR: The role of correlations in direction and contrast coding in the primary visual cortex. J Neurosci. 2007, 27 (9): 233848. 10.1523/JNEUROSCI.341706.2007.View ArticlePubMed
 Schreiber T: Measuring Information Transfer. Phys Rev Lett. 2000, 85: 461464. 10.1103/PhysRevLett.85.461.View ArticlePubMed
 Honey CJ, Kötter R, Breakspear M, Sporns O: Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proc Natl Acad Sci USA. 2007, 104 (24): 1024010245. 10.1073/pnas.0701519104.PubMed CentralView ArticlePubMed
 FuhrmannAlpert G, Sun FT, Handwerker D, D'Esposito M, Knight RT: Spatiotemporal information analysis of eventrelated BOLD responses. Neuroimage. 2007, 34 (4): 154561. 10.1016/j.neuroimage.2006.10.020.View ArticlePubMed
 Pessoa L, Padmala S: Decoding nearthreshold perception of fear from distributed singletrial brain activation. Cereb Cortex. 2007, 17 (3): 691701. 10.1093/cercor/bhk020.View ArticlePubMed
 Panzeri S, Magri C, Logothetis N: On the use of information theory for the analysis of the relationship between neural and imaging signals. Magnetic Resonance Imaging. 2008, 26: 10151025. 10.1016/j.mri.2008.02.019.View ArticlePubMed
 Rubino D, Robbins KA, Hatsopoulos NG: Propagating waves mediate information transfer in the motor cortex. Nat Neurosci. 2006, 9 (12): 15491557. 10.1038/nn1802.View ArticlePubMed
 Belitski A, Gretton A, Magri C, Murayama Y, Montemurro MA, Logothetis NK, Panzeri S: Lowfrequency local field potentials and spikes in primary visual cortex convey independent visual information. J Neurosci. 2008, 28 (22): 56965709. 10.1523/JNEUROSCI.000908.2008.View ArticlePubMed
 Waldert S, Preissl H, Demandt E, Braun C, Birbaumer N, Aertsen A, Mehring C: Hand movement direction decoded from MEG and EEG. J Neurosci. 2008, 28 (4): 10001008. 10.1523/JNEUROSCI.517107.2008.View ArticlePubMed
 Panzeri S, Senatore R, Montemurro MA, Petersen RS: Correcting for the sampling bias problem in spike train information measures. J Neurophysiol. 2007, 98: 10641072. 10.1152/jn.00559.2007.View ArticlePubMed
 Panzeri S, Treves A: Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems. 1996, 7: 87107. 10.1088/0954898X/7/1/006.View Article
 Strong S, Koberle R, de Ruyter van Steveninck R, Bialek W: Entropy and Information in Neural Spike Trains. Physical Review Letters. 1998, 80: 197200. 10.1103/PhysRevLett.80.197.View Article
 Paninski L: Estimation of Entropy and Mutual Information. Neural Computation. 2003, 15: 11911253. 10.1162/089976603321780272.View Article
 Nemenman I, Bialek W, de Ruyter van Steveninck R: Entropy and information in neural spike trains: progress on the sampling problem. Physical review E, Statistical, nonlinear, and soft matter physics. 2004, 69 (5 Pt 2): 056111.View ArticlePubMed
 Montemurro MA, Senatore R, Panzeri S: Tight datarobust bounds to mutual information combining shuffling and model selection techniques. Neural Computation. 2007, 19 (11): 291357. 10.1162/neco.2007.19.11.2913.View ArticlePubMed
 Nirenberg S, Latham PE: Decoding neuronal spike trains: how important are correlations?. Proc Natl Acad Sci USA. 2003, 100 (12): 734853. 10.1073/pnas.1131895100.PubMed CentralView ArticlePubMed
 Quian Quiroga R, Panzeri S: Extracting information from neural populations: Information theory and decoding approaches. Nat Rev Neurosci. 2009, 10 (3): 173185. 10.1038/nrn2578.View ArticlePubMed
 Li CL: Synchronization of unit activity in the cerebral cortex. Science. 1959, 129: 783784. 10.1126/science.129.3354.969a.View ArticlePubMed
 Perkel DH, Gerstein GL, Moore GP: Neuronal spikes trains and stochastic point processes II. Simultaneous spikes trains. Biophys J. 1967, 7: 419440. 10.1016/S00063495(67)865974.PubMed CentralView ArticlePubMed
 Mastronarde DN: Correlated firing of cat retinal ganglion cells: Spontaneously active input to x and ycells. J Neurophysiol. 1983, 49: 303324.PubMed
 Koenig P, Engel AK, Singer W: Relation between oscillatory activity and longrange synchronization in cat visual cortex. Proc Natl Acad Sci USA. 1995, 92: 290294. 10.1073/pnas.92.1.290.View Article
 Goense JBM, Logothetis NK: Neurophysiology of the BOLD fMRI Signal in Awake Monkeys. Current Biology. 2008, 18 (9): 631640. 10.1016/j.cub.2008.03.054.View ArticlePubMed
 Oram MW, Xiao D, Dritschel B, Payne KR: The temporal resolution of neural codes: does response latency have a unique role?. Philos Trans R Soc Lond, B, Biol Sci. 2002, 357 (1424): 9871001. 10.1098/rstb.2002.1113.PubMed CentralView ArticlePubMed
 Salinas E, Sejnowski TJ: Correlated neuronal activity and the flow of neural information. Nat Rev Neurosci. 2001, 2 (8): 53950. 10.1038/35086012.PubMed CentralView ArticlePubMed
 Abbott LF, Dayan P: The effect of correlated variability on the accuracy of a population code. Neural Comput. 1999, 11: 91101. 10.1162/089976699300016827.View ArticlePubMed
 Gawne TJ, Richmond BJ: How independent are the messages carried by adjacent inferior temporal cortical neurons?. J Neurosci. 1993, 13: 27582771.PubMed
 Oram MW, Foldiak P, Perrett DI, Sengpiel F: The 'Ideal Homunculus': decoding neural population signals. Trends Neurosci. 1998, 21 (6): 259265. 10.1016/S01662236(97)012162.View ArticlePubMed
 Nakahara S, Amari S: Information geometric measures for neural spikes. Neural Computation. 2002, 14: 22692316. 10.1162/08997660260293238.View ArticlePubMed
 Scaglione A, Foffani G, Scannella G, Cerutti S, Moxon KA: Mutual Information Expansion for Studying the Role of Correlations in Population Codes: How Important Are Autocorrelations?. Neural Computation. 2008, 20: 26622695. 10.1162/neco.2008.0807595.View ArticlePubMed
 Hatsopoulos NG, Ojakangas C, Paninski L, Donoghue JP: Information about movement direction obtained from synchronous activity of motor cortical neurons. Proc Natl Acad Sci. 1998, 95 (26): 1570615711. 10.1073/pnas.95.26.15706.PubMed CentralView ArticlePubMed
 Information Breakdown ToolBox. [http://www.ibtb.org]
 Optican LM, Gawne TJ, Richmond BJ, Joseph PJ: Unbiased measures of transmitted information and channel capacity from multivariate neuronal data. Biological cybernetics. 1991, 65 (5): 305310. 10.1007/BF00216963.View ArticlePubMed
 Tovee MJ, Rolls E, Treves A, Bellis R: Information encoding and the responses of single neurons in the primate temporal visual cortex. Journal of Neurophysiology. 1993, 70 (2): 640654.PubMed
 Maynard EM, Hatsopoulos NG, Ojakangas CL, Acuna BD, Sanes JN, Normann RA, Donoghue JP: Neuronal interactions improve cortical population coding of movement direction. J Neurosci. 1999, 19 (18): 80838093.PubMed
 Goodman NR: The distribution of the determinant of a complex Wishart distributed matrix. Annals of Mathematical Statistics. 1963, 34: 178180. 10.1214/aoms/1177704251.View Article
 Misra N, Singh H, Demchuk E: Estimation of the entropy of a multivariate normal distribution. Journal of Multivariate Analysis. 2005, 92 (2): 324342.View Article
 Oyman O, Nabar RU, Bolcskei H, Paulraj AJ: Characterizing the statistical properties of mutual information in MIMO channels. IEEE Transactions on Signal Processing. 2003, 51 (11): 27842795. 10.1109/TSP.2003.818153.View Article
 Golledge HDR, Panzeri S, Zheng F, Pola G, Scannell JW, Giannikopoulos DV, Mason RJ, Tovée MJ, Young MP: Correlations, featurebinding and population coding in primary visual cortex. NeuroReport. 2003, 14 (7): 10451050. 10.1097/0000175620030523000028.PubMed
 Percival DB, Walden AT: Spectral Analysis for Physical Applications: Multitaper and Conventional Univeriate Techniques. 1993, New York: Cambridge University PressView Article
 Jarvis MR, Mitra P: Sampling Properties of the Spectrum and Coherency of Sequences of Action Potentials. Neural Computation. 2001, 13 (4): 717749. 10.1162/089976601300014312.View ArticlePubMed
 Wilson E, Hilferty M: The distribution of chisquare. Proceedings of the National Academy of Sciences. 1931, 17 (12): 684688. 10.1073/pnas.17.12.684.View Article
 Mazzoni A, Panzeri S, Logothetis NK, Brunel N: Encoding of naturalistic stimuli by local field potential spectra in networks of excitatory and inhibitory neurons. PLoS Computational Biology. 2008, 4 (12): e100023910.1371/journal.pcbi.1000239.PubMed CentralView ArticlePubMed
 Schroeder CE, Lakatos P: Lowfrequency neuronal oscillations as instruments of sensory selection. Trends in Neurosciences. 2009, 32: 918. 10.1016/j.tins.2008.09.012.View ArticlePubMed
 Ince RAA, Petersen RS, Swan DC, Panzeri S: Python for information theoretic analysis of neural data. Frontiers in Neuroinformatics. 2009, 3 (4):
 Pyentropy. [http://code.google.com/p/pyentropy/]
 Schneidman E, Berry MR, Bialek W: Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006, 440 (7087): 10071012. 10.1038/nature04701.PubMed CentralView ArticlePubMed
 Shlens J, Field G, Gauthier J, Grivich M, Petrusca D, Sher A, Litke A, Chichilnisky E: The structure of multineuron firing patterns in primate retina. Journal of Neuroscience. 2006, 26 (32): 825410.1523/JNEUROSCI.128206.2006.View ArticlePubMed
 Tang A, Jackson D, Hobbs J, Chen W, Smith J, Patel H, Prieto A, Petrusca D, Grivich M, Sher A: A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. Journal of Neuroscience. 2008, 28 (2): 505518. 10.1523/JNEUROSCI.335907.2008.View ArticlePubMed
 Spike Train Analysis Toolkit. [http://neuroanalysis.org]
 Goldberg DH, Victor JD, Gardner EP, Gardner D: Spike Train Analysis Toolkit: Enabling Wider Application of InformationTheoretic Techniques to Neurophysiology. Neuroinformatics. 2009.
 Victor J: Binless strategies for estimation of information from neural data. Phys Rev E Stat Nonlin Soft Matter Phys. 2002, 66 (5 Pt 1): 05190310.1103/PhysRevE.66.051903.View ArticlePubMed
 Victor JD, Purpura KP: Nature and precision of temporal coding in visual cortex: a metricspace analysis. Journal of Neurophysiology. 1996, 76 (2): 13101326.PubMed
 Lazo A, Rathie P: On the entropy of continuous probability distributions (Corresp.). IEEE Transactions on Information Theory. 1978, 24: 120122. 10.1109/TIT.1978.1055832.View Article
 Rolls E, Grabenhorst F, Franco L: Prediction of Subjective Affective State from Brain Activations. Journal of Neurophysiology. 2009, 101: 12941308. 10.1152/jn.91049.2008.View ArticlePubMed
 Brown EN, Kass RE, Mitra PP: Multiple neural spike train data analysis: stateoftheart and future challenges. Nature Neuroscience. 2004, 7 (5): 456461. 10.1038/nn1228.View ArticlePubMed
 Buzsáki G: Largescale recording of neuronal ensembles. Nature Neuroscience. 2004, 7 (5): 446451. 10.1038/nn1233.View ArticlePubMed
 Csicsvari J, Henze DA, Jamieson B, Harris KD, Sirota A, Bartho P, Wise KD, Buzsaki G: Massively parallel recording of unit and local field potentials with siliconbased electrodes. Journal of neurophysiology. 2003, 90 (2): 13141323. 10.1152/jn.00116.2003.View ArticlePubMed
 Logothetis NK: What we can do and what we cannot do with fMRI. Nature. 2008, 453 (7197): 869878. 10.1038/nature06976.View ArticlePubMed
 Kraskov A, Quiroga RQ, Reddy L, Fried I, Koch C: Local field potentials and spikes in the human medial temporal lobe are selective to image category. Journal of cognitive neuroscience. 2007, 19 (3): 479492. 10.1162/jocn.2007.19.3.479.View ArticlePubMed
 Kreiman G, Hung CP, Kraskov A, Quiroga RQ, Poggio T, DiCarlo JJ: Object selectivity of local field potentials and spikes in the macaque inferior temporal cortex. Neuron. 2006, 49 (3): 433445. 10.1016/j.neuron.2005.12.019.View ArticlePubMed
 Delorme A, Makeig S: EEGLAB: an open source toolbox for analysis of singletrial EEG dynamics including independent component analysis. Journal of Neuroscience Methods. 2004, 134: 921. 10.1016/j.jneumeth.2003.10.009.View ArticlePubMed
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.