Higher-Order Correlations in Non-Stationary Parallel Spike Trains: Statistical Modeling and Inference

The extent to which groups of neurons exhibit higher-order correlations in their spiking activity is a controversial issue in current brain research. A major difficulty is that currently available tools for the analysis of massively parallel spike trains (N >10) for higher-order correlations typically require vast sample sizes. While multiple single-cell recordings become increasingly available, experimental approaches to investigate the role of higher-order correlations suffer from the limitations of available analysis techniques. We have recently presented a novel method for cumulant-based inference of higher-order correlations (CuBIC) that detects correlations of higher order even from relatively short data stretches of length T = 10–100 s. CuBIC employs the compound Poisson process (CPP) as a statistical model for the population spike counts, and assumes spike trains to be stationary in the analyzed data stretch. In the present study, we describe a non-stationary version of the CPP by decoupling the correlation structure from the spiking intensity of the population. This allows us to adapt CuBIC to time-varying firing rates. Numerical simulations reveal that the adaptation corrects for false positive inference of correlations in data with pure rate co-variation, while allowing for temporal variations of the firing rates has a surprisingly small effect on CuBICs sensitivity for correlations.

. In particular, whether or not coincident spikes of pairs of neurons participate in synchronized "cluster-events" cannot be decided on measurements of pairwise correlation alone; this can only be achieved by the systematic assessment of higher-order correlations, i.e., statistical couplings among triplets, quadruplets, and larger groups (Martignon et al., 1995;Staude et al., 2010). Importantly, the nonlinear dynamics of spike generation makes neurons extremely sensitive for synchrony in their input pools (Softky, 1995;König et al., 1996). Ignoring these higher-order correlations in the statistical description of spiking populations is therefore hardly advisable (Bohte et al., 2000;Kuhn et al., 2003).
Initially, the main obstacle for assessing the higher-order structure of neuronal populations were limitations in experimental methodology, as until recently state-of-the-art electrophysiological setups allowed to record only few neurons simultaneously. The advent of multi-electrode arrays and optical imaging techniques, however, now reveals fundamental shortcomings of available analysis tools (Brown et al., 2004). Mathematical frameworks to model and estimate higher-order correlations typically assign one "interaction parameter" for every subgroup of the population, leading to a 2 N − 1 dimensional model for a population comprising N neurons (Martignon et al., 1995(Martignon et al., , 2000. The associated estimation problem greatly suffers from this combinatorial explosion: the number of parameters to be estimated from the available sample size (a population of N = 100 neurons implies ∼10 30 parameters while 100 s of data provide only ∼10 6 samples) illustrates the principal infeasibility of this approach. In fact, the estimation of such higher-order

IntroductIon
It has long been suggested that fundamental insight into the nature of neuronal computation requires the understanding of the cooperative dynamics of populations of neurons (Hebb, 1949). A controversial issue in this debate is the role of correlations among nerve cells. On the one hand, an increasing body of both experimental (e.g., Gray and Singer, 1989;Vaadia et al., 1995;Riehle et al., 1997;Bair et al., 2001;Kohn and Smith, 2005;Shlens et al., 2006;Fujisawa et al., 2008;Pillow et al., 2008) and theoretical (Abeles, 1991;Diesmann et al., 1999;Kuhn et al., 2003) literature supports the concept of cooperative computation on various temporal and spatial scales. On the other hand, the mostly detrimental effect of correlations on rate-based information transmission and processing (Abbott and Dayan, 1999;Averbeck and Lee, 2006;Josic´ et al., 2009) has generated a strong opposition toward correlation-based concepts of cortical coding (Shadlen and Newsome, 1998;Schneidman et al., 2006;Ecker et al., 2010). Evidently, a thorough description of the correlation structure of neuronal populations is an indispensable prerequisite to resolve these opposing theoretical viewpoints (Brown et al., 2004).
Experimental reports on coordinated activity at the level of spike trains resort almost exclusively to correlations between pairs of nerve cells (e.g., Eggermont, 1990;Vaadia et al., 1995;Kreiter and Singer, 1996;Riehle et al., 1997;Kohn and Smith, 2005;Sakurai and Takahashi, 2006;Fujisawa et al., 2008;Ecker et al., 2010). Such pairwise correlations cannot, as a matter of principle, resolve the cooperative activity of neuronal populations to the extent required for rigorous hypothesis testing ; Martignon correlations runs into severe practical problems even for populations of N ∼ 10 neurons (Martignon et al., 1995(Martignon et al., , 2000Del Prete et al., 2004;Shlens et al., 2006;Montani et al., 2009). The severeness of this limitation is further underscored by the fact that the significance of higher-order correlations computed from small populations (N ∼ 10; Schneidman et al., 2006;Shlens et al., 2006) can generally not be extrapolated to large populations (Roudi et al., 2009). Taken together, while recent progress in experimental technique allows for the simultaneous recording of the spiking activity of tens to hundreds nerve cells, a faithful statistical description of the resulting activity that includes correlations of higher order is greatly hampered by the limitations of available data analysis techniques.
We have recently presented a novel method for a cumulant-based inference for the presence of higher-order correlations (CuBIC) that avoids the need for extensive sample sizes ). Instead of directly estimating correlation parameters from all subgroups, CuBIC aims only at population-average correlations, estimated via the cumulants of the pooled and discretely sampled spiking activity of all recorded neurons (population spike counts). The presence of higher-order correlations is then inferred from measured cumulants of low order by exploiting certain constraining relations among correlations of different orders in a statistical model of correlated spiking. CuBIC avoids the direct estimation of higher-order correlations, but decides whether or not lower order cumulants require the presence of higher-order correlations. Focusing on such less specific questions drastically reduces the requirements with respect to the sample size: when applied to artificial data, CuBIC reliably infers higher-order correlations from large (N  100), even weakly correlated populations (pairwise correlation coefficient c ∼ 0.01) that were generated with reasonable sample sizes (T < 100 s, Staude et al., 2009). As a statistical model, CuBIC employs the compound Poisson process (CPP), where correlations are induced by the insertion of coincident events in continuous time, i.e., before binning is applied (Ehm et al., 2007;Johnson and Goodman, 2007;Brette, 2009;Staude et al., 2010). Interestingly, this model of correlation fits perfectly to measuring (higher-order) correlations via connected cumulants of the binned spike trains (Staude et al., 2010), a common framework for (higher-order) correlation measures. The simple relationship of the unknown model parameters, i.e., the orders of correlation present in the data, and the observable cumulants of the population spike count allows to devise null-hypotheses concerning the orders of correlation in the data (the details of the CPP and CuBIC are explained in Section "The Stationary Case"). Combining tests against different null-hypothesis yields a lower bound ξ  for the maximal order of correlation in the data.
A central assumption in the original presentation of CuBIC  was that the statistics of spiking in the population does not change over time (stationarity). As both experimental cues and/or internal processes often induce transients or fluctuations of firing rates, this central assumption is frequently violated in electrophysiological data.
In the present study, we describe a non-stationary version of the CPP by decoupling the correlation structure from the spike intensity of the population (see Section "The Non-stationary Case"). Using the "law of total cumulance" we are able to incorporate nonstationarities in firing rate into the computation of the cumulants of the population spike counts. These rate-adjusted cumulants are then used to adapt CuBIC to infer higher-order correlations also from non-stationary data. This adaptation requires a specification of the kind of non-stationarity in terms of a parametric family of distributions for the bin-wise mean firing rates (the "carrier distribution"). Allowing for uniform rate fluctuations, for instance, yields as a result that the data must have correlations of some minimal order ξ  even if firing rates fluctuated uniformly from bin to bin. In this sense, the choice of a family for the carrier distribution implies a demarcation line between "genuine" correlation and "artifacts" due to rate (co-) variation (Staude et al., 2008). Numerical simulations reveal that the adaptation corrects for false positive inference of correlations in data with pure rate co-variation, while allowing for potential variations in firing rates has a surprisingly small effect on CuBIC's sensitivity for correlations (see Case Studies). Furthermore, we find that a perfect match between the true carrier family and the family allowed in the adapted CuBIC does not seem to be fundamentally important to guarantee reliable test performance.

Population spike count
The basic observable of this study is the pattern vector X(s): = (X 1 (s),…,X N (s)), where X i (s) is the discretized spike count of the ith neuron in the bin [sh,(s + 1)h) of width h (a complete list of symbols is provided in Section "List of Symbols" in Appendix). Given X(s), we define the population spike count Z(s) as the total number of spikes in the population in the sth bin (Figure 1) In the case where the X i are binary ("1" for one or more spikes in the bin, "0" for no spike), Z(s) is simply the number of neurons that spike in the sth bin. As opposed to other frameworks for correlation analysis (e.g., Aertsen et al., 1989;Martignon et al., 1995;Grün et al., 2002a;Nakahara and Amari, 2002;Shlens et al., 2006), however, the method presented in this study does not assume binary variables.
We here assume that Z(s) and Z(s + k) are independent for k ≠ 0 (zero memory). Furthermore, let us for now assume that the distribution of Z(s) does not depend on the time bin s (stationarity). This critical assumption will be relaxed in Section "The Non-stationary Case".

Correlations and cumulants
In the present framework, correlations among the variables X i are measured by mixed or "connected" cumulants. Like the more familiar (raw) moments E[Z m ] of a random variable Z, the univariate cumulants κ m [Z] characterize the shape of its distribution (see, e.g., Stratonovich, 1967;Gardiner, 2003). For the first two cumulants, the expectation value and the variance, the latter can be expressed in terms of the former by the well-known expressions κ 1 . Similar equalities for higher cumulants are exceedingly complicated, but algorithms for their computations are available (see Stuart and Ord, 1987 for explicit expressions for m ≤ 10, Section "Cumulants of the Non-stationary CPP" for a straightforward, and Di Nardo et al., 2008 for a more advanced algorithm). For notational consistency, we will from now on use the cumulant notation, e.g., use the terms "first/second cumulant" instead of the more familiar "mean/variance".  Staude et al., 2009), correcting higher cumulants for the influence of correlations of lower order is exceedingly complicated. We therefore employ a parametric model for Z, the CPP (see next section), the parameters of which can be interpreted straightforwardly in terms of higher-order correlations among the X i .
Before a discussion of our model, we wish to stress that the cumulant correlations presented here do not comply with the interaction parameters of the more familiar log-linear model. In particular, data sets can have higher-order log-linear interactions without having higher-order cumulant correlations and vice versa (see Staude et al., 2010 for concrete examples, and e.g., Darroch and Speed, 1983;Streitberg, 1990;Staude et al., 2009 for more general discussions).

The compound poisson process
As opposed to the discretized, binned population spike count Z(s) of the previous section, the proposed model operates in continuous, i.e., unbinned time. That is, we model the process z t denotes the ith unbinned, continuous-time spike train with spike event times t j i (i = 1,…,N, j ∈ N). The model we propose for z(t) is that of a CPP Multivariate, or "connected", cumulants arise when the variable under consideration is a sum of correlated variables. For m = 2 and Z X i N i = = Σ 1 , for instance, we have the well-known formula Hence, the second-order cumulant correlations Cov[X i ,X j ] = κ 1,1 [X i ,X j ] measure the degree of additive linearity of κ 2 [Z]. Higher-order cumulant correlations are generalizations of the covariance in exactly this sense, and mth order correlations arise when κ m [Z] is decomposed into expressions involving the individual X i . The following definition fixes the notation used in the remainder of this study, for a precise definition we refer to the literature (e.g., Stuart and Ord, 1987;Gardiner, 2003;Staude et al., 2009; for details on cumulant correlations see also Streitberg, 1990;Staude et al., 2010).  [X]. We say that X has correlations of order k if and only if at least one kth order connected cumulant of X is non-zero.
The following generalization of Eq. 1 is a straightforward consequence of the construction of connected cumulants (Staude et al., 2010). Poisson fit, corresponding to an independent population with the same firing rates). To construct a population of correlated spike trains, amplitudes a j are drawn for all events t j in the carrier process i.i.d from f A . Individual processes x i (t) are constructed by assigning subsequent events of the carrier process z(t) into a j "child" processes x i (t) (here, events are assigned randomly to the specific process IDs). Correlations of order ξ are induced, whenever events in the carrier process are copied into more than ξ processes, i.e., if the amplitude distribution assigns non-zero probabilities for amplitudes ≥ ξ.
Non-stationary higher-order correlations cuBIc This section summarizes the stationary version of CuBIC to the extent that is needed to understand its adaptation to non-stationary populations (see Staude et al., 2009 for details). In brief, CuBIC quantifies the following thought experiment. Consider the situation of four simultaneously recorded neurons, where all neuron pairs have a correlation coefficient of c = 1. As c = 1 implies identity for all pairs of spike trains, all four spike trains must in fact be identical. In the framework of the CPP, this translates to the existence of events of amplitude a j = 4, and hence correlation of order 4 (Theorem 1). This illustrates that it is possible, in principle, to infer the existence of fourth order correlations from estimated pairwise correlations. In Staude et al. (2009), this inference was generalized by a hierarchy of statistical hypothesis tests H m 0 ,ξ , labeled by the order m of the correlation estimated from the population spike count, and the test parameter ξ, which indicates the maximal order of correlation allowed in the null hypothesis. For given m and ξ, the rejection of a hypothesis H m 1 as a lower bound on the order of correlation in the data. In the thought experiment above, we estimated pairwise correlation, hence m = 2, and rejected tests with ξ = 1,2,3, such that ξ  2 4 = . In principle, the order of the estimated correlation m is a free parameter. However, as shown in Staude et al. (2009), tests with m = 3 are already extremely sensitive, such that we will present both the stationary CuBIC and the non-stationary adaptation only for the case m = 3.
Assume one is given the first three cumulants of a population spike count variable Z′. Then, for a fixed value of the test parameter ξ, consider the following constrained maximization problem κ κ * ν 3 3 , , : where Z is the population spike count of a model with parameters ν and f A . The model that solves Eq. 5 has the maximal third cumulant, i.e., triplet correlations, among all models that do not have any correlations beyond order ξ, and have the same populationaveraged first-and second-order properties, i.e., firing rates and pairwise correlations, as the given spike count variable Z ′. As a consequence, κ κ ′ 3 3

> [ ]
Z implies that the third order correlations in Z ′ cannot be realized with correlations of orders ≤ξ. Thus Z ′ must have correlations of order ≥ξ + 1.
To solve Eq. 5, we use Eq. 4 and obtain the equivalent problem In Eq. 6, the objective function and the constraints depend linearly on the model parameters  ν ξ . Problems of this type, so-called Linear Programming Problems, are uniquely solvable, e.g., by the Simplex where the event times t j constitute a Poisson process, and the marks a j are i.i.d. integer-valued random variables, drawn independently for all t j (Figure 1, left). The marks a j determine the number of neurons that fire at time t j , and will be referred to as the "amplitude" of the event at time t j . The probability that an event has a specific amplitude is determined by the amplitude distribution f A , i.e., f A (ξ) = Pr{a j = ξ} for all j ∈ N (Figure 1, top right). The Poisson process that generates the events t j is called the "carrier process" of the model and its rate ν is the "carrier rate". Processes of this type are also referred to as generalized, or marked, Poisson processes (see e.g., Snyder andMiller, 1991 for a general definition, andEhm et al., 2007 for an alternative application to spike train analysis).
With the above model, the generation of a population of spike trains proceeds in two steps. First, realize a Poissonian carrier processes m(t) = Σ j δ(t = t j ) and draw for each of its events t j an i.i.d. amplitude a j from the amplitude distribution f A . In the second step, assign the spike at t j to a j individual processes, where the process IDs are determined from a separate "assignment distribution". The simplest scenario assumes uniform assignment, where the a j neuron IDs that receive the spike at t j are drawn randomly from {1,…,N}, resulting in a homogeneous population. As CuBIC only aims for a lower bound on the order of correlation, irrespective of the neuron IDs that realize these correlations, we here ignore the assignment distribution, and focus on the amplitude distribution only. The following theorem clarifies the relationship between cumulant correlations and the amplitude distribution in the framework of the CPP (see Staude et al., 2009 for a proof).
Note that correlations in the above theorem are measured strictly on the basis of the discretized counting variables X i . As a consequence, they do not resolve (and do not depend on) the perfect temporal precision of the coincident events in the CPP. That is, if the events of z(t) were assigned to the individual processes with a temporal jitter that is small with respect to the bin size h, the effect of the jitter on the correlations is negligible. Now let ξ ≤ N be the maximal order of correlation in the model, i.e., f A (k) = 0 for k > ξ, and denote by ν k (k = 1,…ξ) the compound rates of events of amplitude k, i.e., ν k = ν·f A (k). Then Eq. 3 can be written as is the vector dot product.
Non-stationary higher-order correlations be included into the CPP by allowing a time-dependent carrier rate, amplitude distribution, assignment distribution, or combinations thereof. Rather than presenting a general model for non-stationary populations, the focus of this study is to adapt the analysis technique CuBIC for potential variations in the firing rates. CuBIC, however, aims only at the population-average correlation structure f A , and inference is based on the population spike count Z. Time dependencies in the assignment distribution only change the neuron IDs that realize correlations over time, but do no alter the order of correlations. As a consequence, the population spike count Z is not influenced by potential temporal variations of the assignment distribution (see Figure 2A). Furthermore, CuBIC aims only at a lower bound for the maximal order of correlation, not on the precise values of correlations of different orders. As such, it aims for the largest entry with non-zero probability in the amplitude distribution, irrespective of whether this entry was present in the whole data stretch or if it occurred only within a short period. Taken together, CuBIC is blind for non-stationarities in the amplitude distribution and the assignment distribution. We therefore assume both these objects to be constant over time and consider only time-varying carrier rates ν(t) (see top panels in Figures 2B,C).

cumulants of the non-statIonary cpp
To relate the model parameters to the cumulants of Z for time varying ν(t), observe that the value of Z(s) in the window [sh,(s + 1)h) does not depend on the precise time course of the carrier rate ν(t) in this window, but only on the integral R h t dt s s h s h : / ( ) .
( ) = ∫ + 1 1 ν Substituting the carrier rate ν(t) with a piecewise constant function whose value in the interval [sh,(s + 1)h) is R s thus results in an identical population spike count Z. Furthermore, CuBIC ignores the temporal order of Z(1),…,Z(L) and assumes subsequent values of Z to be i.i.d. variables. As a consequence, CuBIC is also blind for the temporal order of the rate values R s , and we therefore assume them to be i.i.d. with a common "carrier distribution" f R (compare panels B and C of Figure 2).
The above setting characterizes the population spike count Z as a parameter-dependent random variable, where the outcome of Z in the sth bin depends not only on the outcome of the CPP realization, but also on the (random) value of the rate variable R s . For such "doubly stochastic" variables, the raw moments are given as where the inner expectation is the expectation value of Z m for a given value of the rate R, and the outer expectation is with respect to the distribution f R . Now recall the definition of the moments as the coefficients of the Taylor series expansion of the characteristic function Method or any of its variants (Press et al., 1992, Chapter 10.8). The solution yields an upper bound for the third cumulant κ * 3,ξ and the corresponding parameter vector  ν * ξ . As it turns out, the only non-zero components of the solver  ν ν ν * * * ξ ξ = … ( , , ) 1 are ν * 1 and ν * ξ , i.e., ν * k = 0 for all k ∉{1,ξ} (see The Solution of the Maximization in Appendix). The carrier rate and amplitude distribution of the CPP that maximizes Eq. 6 are then given by ν ν ν * With the solution of Eq. 5, the null hypothesis H 0 3,ξ is against H 0 3,ξ , we estimate its cumulants by the so-called k-statistics k m (Stuart and Ord, 1987; the well-known sample mean and unbiased sample variance are the first two k-statistics). To derive the required distribution of the test statistic k 3 under H 0 3,ξ , we assume that Z ′ is the population spike count of the model with parameters ν * and f A * , i.e., the solution of Eq. 6 after the unknown cumulants κ 1 [Z ′] and κ 2 [Z ′] have been replaced by their estimates k 1 and k 2 . Thus, under H 0 3,ξ the distribution of k 3 has expectation value κ * 3,ξ , and its variance is given as (e.g., Stuart and Ord, 1987) Var where κ i [Z ′] are the cumulants of Z ′, obtained by inserting ν * and f A * into Eq. 3. Finally, with sample sizes of L > 10000 (corresponding to a data set of 10 s duration, sampled with a bin width of h = 1 ms), the distribution of k 3 is well approximated by a normal distribution, such that the p-value of H 0 3,ξ is given by As mentioned above, the rejection of a specific hypothesis H 0 3,ξ implies that the data have correlations beyond order ξ or, in other words, that ξ + 1 is a lower bound for the order of correlation. The final result of CuBIC is the maximum of these lower bounds where α is a predefined test level (see Staude et al., 2009 for details).

the non-statIonary case the model
In the previous section, the CPP was presented as a model for populations with constant firing rates. However, (co-)variations in firing rate are a common feature of neuronal populations. To incorporate potential non-stationarities into the CPP, recall its central ingredients: the intensity of the population is described by the carrier rate ν, the population-averaged correlation structure is determined by the amplitude distribution f A , and the precise composition of spikes and correlations within the population is determined by the assignment distribution. Given this parametrization, non-stationarities can in principle With these maps at hand, the mth cumulant of the non-stationary CPP can be computed by the following procedure.
1. For i = 1,…,m express the conditional moment μ i [Z | R] in terms of the first i cumulants, i.e., write The results are summarized as the "law of total cumulance". The first three orders read A similar parameter transformation that leads from Eq. 3 to Eq. 4 simplifies Eqs. 18-20. In slight abuse of notation, we use the same symbol for the expected compound rates as for the constant compound rates of Eq. (4), i.e., write ν k : = f A (k)·κ 1 [R]. Using the such that the moments can be obtained from Analogously, cumulants are the coefficients of the logarithm of the characteristic function such that A straightforward strategy to relate cumulants to moments is to insert Eq. 11 into the right hand side of Eq. 12, writing the logarithm as a power series, and collecting coefficients for identical powers of s. This procedure illustrates in particular that the mth cumulant is a function of the first m raw moments only, and, reversely, that the mth moment can be expressed as a function of the first m cumulants. We denote the maps relating cumulants to moments and moments to cumulants by F m and G m , respectively, such that in Hz (top panels), the distributions of the bin-wise mean carrier rates ("carrier distributions" f R , small panels on the right; bin size h = 20 ms), the event-times of the carrier process z(t) (second panels from top), raster plots (third panels from top) and population spike counts Z(s) (bottom panels) for three different data sets. (A) Time varying assignment distribution produces non-stationary single processes, although the carrier process z(t) is stationary (constant carrier rate ν(t) = 50 Hz; all carrier events with t j < 1 s are assigned randomly, all carrier events with t j ≥ 1 and amplitude a j = 1 are assigned to neuron 3).

Gamma-distributed carrier rate
If the carrier variable R follows a Gamma-distribution, the third cumulant can be expressed in terms of the first two as κ κ κ For the normalized third cumulant β 3 we thus have β β As for symmetric carrier distributions, the objective function in Eq. 25 is linear in the expected component rates ν k (k = 1,…,ξ) and quadratic in β 2 . Thus, the resulting problem is also a convex quadratic programming problem and can be solved with the same solvers.
We wish to stress once again that the choice made here concerns only the family of the carrier distribution, not its particular shape. That is, if we chose e.g., a uniform distribution, we only determine that κ 3 [R] = 0 but do not fix the support of the distribution. Finally, we note that the only non-zero components of the solution to the maximization problem are ν * 1 and ν * ξ , i.e., ν * k = 0 for all k ∉{1,ξ}, as in the stationary case (see The Solution of the Maximization in Appendix).

Variance of test statistic
As a final ingredient, CuBIC requires the variance of the test statistic k 3 in order to be able to compute the p-values. Assuming that the solution of Eq. 25 is available, we compute the second, fourth and sixth cumulant of the solution by inserting the solving  ν * ξ and β 2 * into the algorithm leading to Eq. 17. Plugging these cumulants into Eq. 7 yields Var[k 3 ], the variance of the test statistic k 3 under the non-stationary nullhypothesis. Insertion into Eq. 8 yields the corresponding p-value.

case studIes
To illustrate the application of the adapted CuBIC, we consider here two typical experimental situations. In the first situation, data are recorded in early sensory systems, where characteristic firing rate profiles closely follow the stimulus. In such a scenario, information about the rate distribution can be obtained from the type of the stimulus. In the second situation, there is no direct relation between the experimental paradigm and non-stationarities in firing rates. In this case, ad hoc assumptions of the family of carrier rate are the only option. We illustrate both scenarios and the steps required when applying the nonstationary version of CuBIC by analyzing simulated spike trains.

Pure non-stationarity
Our first example mimics a recording in visual cortex with a oriented sinusoidal grating as stimulus. We model the population response in such an experiment by a CPP model with sinusoidal carrier rate ν(t), i.e., where B is the offset, C ≤ B is the amplitude of the modulation, ω is the temporal frequency of the stimulus and d is the phase, i.e., the sum of the stimulus phase and the delay it takes for the stimulus-driven activity to propagate to the recording electrodes cuBIc for non-statIonary data To adapt CuBIC to non-stationary populations, we need to formulate the general maximization problem (Eq. 5) for the case of timedependent carrier rates. Using Eqs. 21-23 instead of Eq. 4, we obtain As opposed to the Linear Programming Problem of the stationary case (Eq. 6), the constraints in Eq. 25 do not apply to all free variables: the third standardized cumulant of the rate variable β 3 can, in principle, be arbitrarily large. As a consequence, also the objective function in Eq. 25 is unbounded. We therefore have to impose additional constraints on the carrier distribution f R in order to ensure convergence of the maximization. The approach taken here is to prescribe a two-dimensional parametric family for the distribution of the rate variable, such that its third (standardized) cumulant can be expressed in terms of its first two cumulants. The choice for the family of carrier distributions determines the form of the objective function. Let us present two specific cases (see Section "Discussion" for more details on the role of this choice).

Symmetric carrier distributions
If the carrier distribution is symmetric about its mean, like e.g., the uniform distribution, we can exploit the fact that the third cumulant κ 3 [R] of symmetric distributions vanishes 1 . As a consequence, also β 3 = κ 3 [R]/κ 1 [R] 3 = 0. The objective function of the maximization problem (Eq. 25) for symmetric carrier distributions thus becomes This objective function is linear in the (expected) compound rates ν k (k = 1,…,ξ) and quadratic in β 2 . As the constraints are also linear, using Eq. 26 as the objective function in Eq. 25 yields July 2010 | Volume 4 | Article 16 | 8 Staude et al.
Non-stationary higher-order correlations panel). We found that only the p-value with ξ = 1 fell below the chosen significance level of α = 0.05 (Figure 3, green arrowhead in bottom left panel). Hence, H 0 3 1 , was rejected while all hypotheses with ξ > 1 were retained. As a consequence, the stationary CuBIC yields ξ ξ α ξ  = < + = max{ | } , p 3 1 2 (Eq. 9) as a lower bound for the maximal order of correlation. As the data do not contain correlations beyond those induced by the population-wide non-stationarity, the stationary CuBIC thus leads to false positive inference.
Application of the adapted CuBIC requires the specification of a parametric family of carrier distributions f R ("carrier family"). We put ourselves in the position of an experimenter, to whom the carrier rate is unknown and cannot be estimated directly; the only observable quantity is the population activity, which is a combination of the carrier rate and potential events of high amplitude. However, as the stimulus was a cosine, we assume also the carrier rate to be a cosine as in Eq. 28, only that the parameters B, C and d are unknown (ω can be estimated from the stimulus frequency  (Figure 3, top panels of left and right columns). The first data set of this example models the case of pure rate non-stationarity (Figure 3, left column). The amplitude distribution is a delta-peak at ξ = 1, such that all events of the marked process z(t) have amplitude a j = 1. As a consequence, z(t) is a simple Poisson process with rate ν(t).
Setting B = C = 500 and ω = 500 ms, d = 0, and assigning the carrier spikes uniformly to a population of N = 50 spike trains, we obtain a homogeneous population where each neuron oscillates with a temporal frequency of 2 Hz, assuming firing rates between 0 and 20 Hz (second panel from top of Figure 3, left). Note that assigning the events of z(t) to individual spike trains was done only for illustrative purposes, as it does not influence the results of CuBIC.
To analyze the data, we chose a bin size of h = 5 ms, computed the population spike count Z(s), and estimated its distribution f Z (k) = Pr{Z = k} from a total simulation time of T = 100 s (Figure 3, third and fourth panels from top, respectively). Note that the population spike count Z does not provide unambiguous information about the carrier rate. Next, we applied the stationary version of CuBIC with m = 3. That is, we ignored the apparent non-stationarities that are present in the data and computed p-values p 3,ξ for ξ = 1,…,7 as described in Section "CuBIC" (Figure 3 Outlined bars and arrowheads in bottom panel show results of data where interspike intervals below 2 ms were removed before the analysis. Data with pure rate non-stationarity (left column) have a sinusoidal carrier rate ν(t) (top panel) and an amplitude distribution with mass concentrated at ξ = 1 (f A (k) = 0 for k >1; see text for details). Pure correlation (middle column) is modeled with a stationary carrier rate and correlation up to order 7 (ν(t) = const., f A (k) = 0 for k∉{1,7}). The probability for the high-amplitude events results in a pairwise correlation coefficient of c = 0.01 if the events of the carrier process are distributed uniformly among the processes N = 50. The combined data set with non-stationarity and correlation (right column) has the same correlation structure as the data in the middle column, and the same carrier rate as in the first column.

Non-stationary higher-order correlations
In this data set, the stationary CuBIC rejects all hypotheses up to a value of ξ = 6. Hence, CuBIC performs optimally in this data set, as the resulting lower bound ξ  = 7 corresponds to the maximal order of correlation ξ syn = 7 in this data set.
To our big surprise and satisfaction, p-values for the adapted version of CuBIC were very similar to those of the stationary CuBIC (Figure 3, bottom panel in middle column, red and green bars respectively). In particular, the adapted CuBIC rejected the same hypotheses, and, as a consequence, also yielded the same lower bound ξ  = 7. Contrary to our assumption, the proposed adaptation thus did not compromise CuBIC's sensitivity in this stationary data set.

Combined effects
Finally, we generated a third data set that combined the properties of the first two examples (Figure 3, right column). The amplitude distribution was the same as in the example of pure correlation, i.e., with an additional entry at ξ syn = 7, while the carrier rate was the same cosine as in the data with pure non-stationarity (Figure 3, top panels of right and left columns, respectively).
For this data set, we expected the stationary CuBIC to overestimate the order of correlation, i.e., to yield ξ ξ  > = syn 7, as the considerable rate co-variation produces correlation among the X i in addition to the events of high amplitude (Staude et al., 2008). To the contrary, however, p-values for the stationary CuBIC fell below the significance level only up to ξ = 4 (Figure 3, green arrowheads in the bottom panel of the left column), yielding ξ  = 5. Allowing for cosine-like non-stationarities in the null-hypotheses reduces the lower bound to ξ  = 3 (Figure 3, red arrowheads in the bottom panel of the left column). Thus, rather than a false positive inference of correlation, the additional non-stationarity resulted in a reduction of the inferred order of correlation as compared to the stationary scenario.

Different carrier family
To investigate the robustness of the proposed adaptation with respect to faulty choices for the carrier family, we also analyzed the data under the assumption that the carrier rate switches between a state of low rate, ν min , and of high rate, ν max . This can be described by a bimodal carrier distribution f R (ν;ν min ,ν max ,η) = (1 − η)δ(ν − ν min ) + ηδ(ν − ν max ), where δ(ν) is the Dirac-delta function and η∈[0,1] parametrizes the relative proportion of bins where ν(t) is in the low (high) rate state. Inspecting the time course of Z(s), we chose the mass of the two peaks identical (η = 0.5), which leaves a two-parameter family for f R (its cumulants are derived in Section "Cumulants of Carrier Distributions" in Appendix). For the data shown in Figure 3, allowing for such drastic rate dynamics hardly changed p-values as compared to the cosine carrier rate (compare red with blue bars in bottom panel of Figure 3). Most importantly, the bimodal adaptation resulted in the same lower bounds ξ  as the cosine adaptation.

Gamma-distributed carrier rate
In the examples of the previous section, we inferred the type of non-stationarity from the experimental paradigm, i.e., from the properties of the stimulus. In many experimental situations, such samples the cosine faithfully, such that both the phase offset d and the oscillation frequency ω do not influence f R . In this case, the moments (and cumulants) of all orders can be computed from the parameters B and C in a straightforward manner (see Cumulants of Carrier Distributions in Appendix). In our example, f R is symmetric, which implies β 3 = 0 (see Symmetric Carrier Distributions). Furthermore, as the carrier rate ν(t) has to be non-negative, the model parameters must satisfy B ≥ C, which implies β 2 ≤ 1/2. The red bars of the lower left panel in Figure 3 show the p-values computed from Eq. 8, after (a) the solution of stationary problem, κ * 3,ξ , has been substituted with the solution of Eq. 25 with objective function Eq. 26 and additional constraint β 2 ≤ 1/2, and (b) Var[k 3 ] was computed with the algorithm explained in Section "Variance of Test Statistic" with the cumulants of the rate variable R given in Section "Cumulants of Carrier Distributions " in Appendix. We find that p 3,ξ > 0.05 for all ξ = 1,…,7, hence no hypothesis is rejected and the lower bound is ξ  = 1. Thus, the adapted CuBIC does not infer correlation in this data set, and the adaptation successfully corrects for the faulty inference of correlation of the stationary version.

Reduced sensitivity in stationary data?
In the first data set, the stationary CuBIC assigned the rate-generated correlation among the counting variables X i to events of amplitude ≥2. Allowing for potential (co-)variations of firing rates in the adapted CuBIC corrected for this faulty inference of correlations. Mathematically, allowing for rate (co-)variations allows non-zero values of the parameter β 2 in the maximization of the third cumulant (Eq. 25). Maximizing over a larger parameter space may then increase κ * 3,ξ ,ns as compared to the stationary maximum κ * 3,ξ . Thus, hypotheses that are rejected by the stationary CuBIC may be retained by the adapted version, as the latter can produce larger maximal third cumulants for a given value of ξ. Consequently, we expected the adapted version to be generally less sensitive for existing events of high amplitude. To investigate this issue, we generated a data set of pure correlation by choosing a constant carrier rate, but allowing for events of amplitude ξ syn = 7 on top of the "background spikes" with ξ = 1 (Figure 3, middle column). The probability of these events was set to f A (7) = 0.0125, which resulted in a average pairwise correlation coefficient of c = 0.01 if the events of z(t) were assigned homogeneously to a population of N = 50 spike trains. The value for ν = 500 Hz was chosen to match the average carrier rate of the first example. Note that the additional events of amplitude ξ = 7 resulted in a slight increase of the population spike count from κ 1 [Z] = κ 1 [R]hμ 1 = 500·0.005·1 = 2.5 in the first example to κ 1 [Z] ≈ 500·0.005·1.0753 ≈ 2.7 in this example, and thereby also increased the firing rates of the N = 50 spike trains. The remaining parameters in this and all following examples were: bin width h = 5 ms, simulation time T = 100 s.
Compared to the size of the population (N = 50), the rate of the high-amplitude events (ν 7 = κ 1 [I]·f A (7) ≈ 6 Hz) is relatively small. As a consequence, these are hardly visible in the raster displays and population spike counts (Figure 3, second and third panels from top in the middle column). In the distributions of the population spike count, they lead to a slight increase for the frequency of patterns of size k ≈ 10, visible only on a logarithmic scale (Figure 3, middle column, green solid line in fourth panel from top).
the three examples shown in Figure 4 are as in those of Figure 3. In data with pure rate non-stationarity (Figure 4, left column) the amplitude distribution has an isolated peak at ξ = 1. For pure correlation (Figure 4, middle column), the carrier rate is constant [ν(t) = 500 Hz] and the amplitude distribution has an additional peak at ξ syn = 7 that resulted in a pairwise correlation coefficient of c = 0.01 among the N = 50 spike trains. Finally, the combined data set has the same time-varying carrier rate as the purely nonstationary data, and the same amplitude distribution as the data with pure correlation (Figure 4, right column).
The gamma-distributed rate variable generates strongly fluctuating rate profiles, with peak values of the carrier rate above 3000 Hz. This leads to strong fluctuations in the population spike count even in the case of pure rate non-stationarity (Figure 4, third panel from top in left column), such that its distribution has a fairly elongated tail (Figure 4, fourth panel from top in left column).

Results of the stationary CuBIC
Despite quantitative differences, stationary CuBIC performs qualitatively similarly for gamma-and cosine-like carrier rate. For pure rate non-stationarity, it wrongly interprets the rate-induced correlations among the X i as events of high amplitudes. The nullhypotheses are rejected up to ξ = 3, yielding a lower bound of ξ  = 4 (Figure 4, green arrowheads in the bottom panel of the left column). Similarly, adding gamma-like non-stationarities to a data set with correlation decreases the inferred lower bound, here from ξ  = 7 in the stationary data (Figure 4, green arrowheads a priori assumptions on the rate variable cannot be justified, and changes in firing rates can have diverse statistical properties. General excitability of the tissue can change firing rates either in the form of slow drifts or abrupt transitions between so called up-and down-states. Furthermore, both local computations and additional cortical or sub-cortical inputs may change firing rates of the recorded population. A further feature of the previous examples was that the carrier rate ν(t) changed rather slowly. As mentioned above, however, the temporal dynamics of the carrier rate does not influence the statistics required for CuBIC, i.e., the population spike count Z, as long as the distribution of the rate values per bin, f R , is not altered.  Figure 2C). Here, f R is a gamma distribution, i.e., f r k If p-values changed, they increased, which makes CuBIC more conservative but does not generate false positives. From all simulation, the increased p-value reduced the lower bound ξ  only in a single instance (cosine rate modulation with correlation, analyzed with stationary CuBIC at ξ = 4; green arrowhead in bottom right panel of Figure 3); in all remaining cases the refractory period changed p-values where they were above the significance level of α = 0.05. We thus conclude that CuBIC is reasonably robust if spike deviate from Poisson processes in terms of short refractory periods (here 2 ms).

dIscussIon
The analysis of massively parallel spike trains for higher-order correlations is a prerequisite for understanding the mechanisms of cooperative neuronal computation in the brain. However, analysis techniques to estimate higher-order correlations typically require enormous sample sizes, rendering the analysis of large (N > 10) populations for higher-order effects extremely difficult. In Staude et al. (2009), we have presented a novel method (CuBIC) that avoids the need for extensive sample sizes. Numerical simulations illustrated that CuBIC reliably infers correlations of very high order (>10) from recordings of large (N ∼ 100), even weakly correlated neuronal populations (average pairwise correlation coefficient c < 0.01) with sample sizes corresponding to 10-100 s recording time. As a statistical model, CuBIC assumes the CPP, where correlation among the spike trains is modeled by the insertion of additional coincident events (Ehm et al., 2007;Johnson and Goodman, 2007;Brette, 2009;Staude et al., 2010). The linear relation between the model parameters, i.e., the orders of correlation present in the data, and the cumulants κ m [Z] of the population spike count Z allows the computation of the maximal value of κ m [Z] under the assumption that the orders of correlation in the data do not exceed a predefined value ξ. Comparing the estimated cumulants to these upper bounds then yields a collection of statistical hypothesis tests H m 0 ,ξ , labeled by m, the order of the estimated cumulant, and ξ, the maximal order of correlation allowed in the null-hypothesis. In this paper, we chose a fixed value of m = 3, and for given ξ, the rejection of H 0 3,ξ implies that the data must have correlations of order at least ξ + 1 (for a discussion on the choice of m see below). A combination of tests with different values for ξ finally yields a lower bound for the maximal order of correlation, denoted by ξ  . For a discussion of the properties and limitations of the CPP (e.g., positivity of correlations), general issues concerning CuBIC, and the relationship between cumulant-correlations and the higherorder parameters of the log-linear model used by, e.g., Martignon et al. (1995), Schneidman et al. (2006), Shlens et al. (2006), we refer to the extensive discussion of Staude et al. (2009). The latter point is discussed in more detail also in Staude et al. (2010). In this section, we focus on issues that relate directly to the novelties of the present study.
Before going into detail, we need to make a general remark. CuBIC is a parametric procedure, and assumes that the data, i.e., the population spike count, can be described sufficiently well by a discretized, potentially non-stationary CPP. If this model does not fit, results of CuBIC may be unreliable. The extent to which results are reliable then depends on the mismatch between the CPP and the data. In practice, where single spike trains deviate from Poisson in the bottom panel of the middle column) to ξ  = 6 in the non-stationary data (Figure 4, green arrowheads in the bottom panel of the right column).

The adapted CuBIC
As opposed to Section "Stimulus-driven Non-stationarity" where properties of the carrier rate could be inferred from the stimulus, we here cannot make qualitative guesses about the type of nonstationarity. The fact that firing rates fluctuate on a bin-to-bin basis makes it very difficult to infer the type of non-stationarity from the raster plots, the population spike counts Z or its distribution f Z . As a consequence, we can only make ad hoc assumptions on f R . We consider two cases. First, we allow f R to be a uniform distribution (Figure 4, red bars and arrowheads in bottom panels). As the uniform distribution is symmetric, we use Eq. 26 as the objective function with the additional constraint β 2 ≤ 1/3 imposed by the non-negativity of the carrier rate. The cumulants of the uniform distribution (see Cumulants of carrier distributions in Appendix) are then used for the computation of Var[k 3 ]. Second, we allow for gamma-distributed non-stationarities, where we use Eq. 27 as the objective function and the cumulants of the gamma-distribution for the computation of Var[k 3 ] (Figure 4, blue bars and arrowheads in bottom panels).
For the data with pure rate non-stationarity (Figure 4, left column), allowing a uniform rate variable rejects hypotheses up to ξ = 3 and thus yields a lower bound of ξ  = 4 . Allowing f R to be gamma-distributed, on the other hand, produces p-values above 0.05 for all ξ = 1,…,7, thereby retaining all hypotheses. Thus, while the uniform null-hypotheses fail to reduce the lower bound as compared to the stationary version, allowing for the true, gammadistributed non-stationarities corrects for false positive inference of correlation.
For data with correlation, the choice of the carrier distribution has only little influence on the resulting p-values (Figure 4, bottom panels in middle and left column). For pure correlation, the lower bounds are identical for all three rate models (ξ ξ  = = 7 syn ). In the combined data set (Figure 4, right column), the additional nonstationarity reduces the lower bound as compared to the stationary data with correlation to ξ  = 6, irrespective of the non-stationarity allowed in the null-hypotheses.

refractory perIods
Besides the stationarity, CuBIC's second central assumption is that spike trains of single neurons can be described as Poisson processes, i.e., have exponential interspike interval (ISI) distributions. While tails of ISI distributions can often be relatively well described by exponentials, the high probability for short intervals of the Poisson process contradicts the absolute refractory period of a few milliseconds found in most nerve cells. We investigated the extent to which refractoriness influences test results of CuBIC by re-analyzing the data of the previous sections after short ISIs were removed. Specifically, for each data set we assigned the events of the carrier process randomly to the N = 50 spike trains, removed spikes of all spike trains that occurred with a temporal difference of τ ≤ 2 ms, and constructed the population spike counts of these thinned spike trains. The analysis of the refractory data showed that introducing refractoriness has a very limited effect on test results (outlined bars and arrowheads in lower panels of Figures 3 and 4).
Non-stationary higher-order correlations correctIng for rate effects Classically, correlations are measured on a small time scale, and subsequently corrected for effects from the firing rates. The estimation of the firing rate, in turn, proceeds along one of the following two lines. Either, rate variations are identified with artifacts that are locked to the stimulus or some other external cue. Then, firing rates are estimated by averaging over repeated presentation of the stimulus, or trials (e.g., Perkel et al., 1967a,b;Aertsen et al., 1989). Alternatively, changes in firing rates are assumed to fluctuate only on slow time scales; they are then estimated by averaging over time.
Although combinations, refinements and optimizations of both strategies have been developed (e.g., Grün et al., 2002b;Ventura et al., 2005;Shimazaki et al., 2009), we wish to stress that all such corrections make strong a priori assumptions on the differences between "genuine" correlations and "artifacts" induced by nonstationary firing rates (see also Staude et al., 2008). The correction of CuBIC, i.e. the choice of a carrier family, follows a fundamentally different strategy. Rather than assuming that firing rate profiles are identical over trials (first strategy) or that rate-fluctuations are band-limited (second strategy), the carrier family limits the extent to which correlations among the counting variables X i are assigned to rate-variations. As discussed above, this choice ensures boundedness of the third standardized cumulant β 3 = κ 3 [R]/κ 1 [R] 3 of the rate variable R. As κ 3 [R] measures the skewness of the carrier distribution, large values of β 3 imply a long right tail. As a consequence, a population with large β 3 has bins with very large values of the carrier rate R. If binned firing rates can assume arbitrary high values, however, the difference between global rate variations and precise spike coordination vanishes. Thus, the choice of the carrier family determines the extent to which one assigns patterns with high spike counts to (co-)variations of firing rates. In other words: the carrier family sets CuBICs demarcation line between rate co-variation and genuine correlation.
In Section "Case Studies", we illustrated how CuBIC operates in two alternative scenarios. In Section "Stimulus-driven Nonstationarity", we assumed that properties of the firing rates can be inferred from the stimulus. Here, the stimulus was a cosine, but the reasoning there can be generalized to a broad class of stimuli. The effect of oriented bars, for instance, might be modeled by a bimodal carrier distribution as in Section "Different Carrier Family", where the mixing parameter η determines the relative duration of the respective stimulus phases (light/dark). The carrier family of Eq. 29 also describes experiments with a well-defined stimulus onset, as e.g., odor presentation or whisker stimulation. In such a scenario, η has to be chosen as the relative duration of the stimulus-off/stimulus-on epoch. Furthermore, properties of the carrier family in "free viewing" conditions might be estimated from the statistical properties of the visual scene.
In the second scenario (see Internally Generated Nonstationarities), the experimental paradigm did not provide information about the carrier family. Here, we argued that ad hoc assumptions are the only option. However, if the activity of individual neurons is available, their statistics can provide additional information that can be exploited. If, for instance, individual spike processes, for example due to refractory properties, this mismatch may be evaluated by analyzing surrogate data (e.g., Grün, 2009). If, for instance, CuBIC returns ξ  = 10 in a data set of non-Poissonian spike trains, and the analysis of surrogate data with the same single process properties but without correlation yields ξ  = 1, we can conclude that the value of ξ  = 10 is really due to existing correlations. This kind of analysis, however, has to be performed specifically for a given data set. As a first step, we have here investigated the effect of a spike train's most obvious deviation from the Poisson process: absolute refractory periods (here 2 ms, Figures 3 and 4). Its relatively small effect on p-values makes us confident that CuBIC is a promising analysis method even if single spike trains deviate from the Poisson assumption. The detailed analysis of CuBIC's robustness with respect to these violation is a central aspect of our current research (e.g., Staude et al., 2007;Reimer et al., 2009). We wish to stress that in the case of small bin sizes and hard refractory periods, assuming single processes to have Poisson statistics is essentially identical to the popular assumption of independence of subsequent bins in Bernoulli models (as in e.g., Martignon et al., 1995;Shlens et al., 2006). A more detailed discussion of this issue, together with an analysis of the parameter dependence of CuBIC can be found in Staude et al. (2009).
A central assumption in the original presentation of CuBIC  is that the statistics of the population does not change over time (stationarity). In the present study, we have adapted CuBIC to be able to analyze also non-stationary data. The basis of this adaptation is a non-stationary version of the CPP, where the intensity of the population, parametrized by the ν, is decoupled from the correlation structure, parametrized by the amplitude distribution f A . Describing the population spike count as a doubly stochastic CPP, potential non-stationarities in the data can by integrated into the quantification of the null-hypotheses of CuBIC. We wish to stress once again, however, that non-stationarities in single neurons do not necessarily imply time varying carrier rates (see, e.g., Figure 2A), such that not every non-stationary data set requires the application of the adapted CuBIC.
In this study, we presented the adaptation only for the third cumulant, i.e., m = 3. Although the derivation of the mathematical equations is straightforward also for higher values of m, the resulting expressions become increasingly complicated. This may result in particular in strong non-linearities in the maximization problem, such that additional care is necessary to ensure convergence of numerical solvers at the global maximum. Furthermore, the estimation of cumulants of order >3 becomes unreliable and their estimators have large variance, which may compromise test power. As CuBIC proved to be very sensitive even for m = 3 ), we currently have not developed the theory for higher m. Nevertheless, this might be a fruitful direction of future research.
The main difference between the stationary CuBIC and its non-stationary adaptation lies in the maximization of the third cumulant of Z. Here, the adapted version requires that the third standardized cumulant β 3 = κ 3 [R]/κ 1 [R] 3 of the binned carrier rate R does not assume arbitrary large values. In this study, this is achieved by prescribing a two-parameter family for the carrier distribution f R (the "carrier family"). The remainder of this section is therefore primarily concerned with elucidating the role of this particular choice.
July 2010 | Volume 4 | Article 16 | 13 Staude et al. Non-stationary higher-order correlations variations, especially if the family assigns smaller values of β 3 (e.g., allowing a uniform carrier distribution if the data had a gamma-distributed carrier rate as in Figure 4). 3. For stationary data with correlation, allowing for nonstationarities in the null-hypotheses has no effect on the inferred lower bound. This result holds irrespective of the type of non-stationarity allowed in the null-hypotheses. 4. Non-stationarities in data with correlation reduce the inferred lower bound as compared to data with the same correlation structure but constant firing rates. The degree of reduction depends mostly on the kind of non-stationarity in the data. The family allowed for the carrier distribution did not affect the lower bound.
It is well-known that co-variations in firing rates induce correlations between spike counts. Thus, it comes as no surprise that analyzing non-stationary data with the stationary CuBIC generates false positives (point 1). In the adaptation, parts of the correlation can be assigned to (co)-variations in firing rates. Allowing for nonstationarities therefore corrects for this faulty inference (point 2).
To understand why the adaptation did not generally reduce the lower bound (point 3), we investigated the interplay between the constraints and the objective function in the maximization of the third cumulant in further detail. Due to the increased number of degrees of freedom, our intuition was that allowing for non-stationarities additional to the correlations of order ξ would simply increase the maximal value of the third cumulant κ * 3,ξ as compared to the stationary version. This, however, seems not to be the case. In the data sets with pure correlation (Figures 3 and 4, middle panels), for example, the values of κ * 3,ξ and κ * 3, , ξ ns are identical for ξ ≥ 4 ( Figure 5B shows results for stationary, uniform and gamma-distributed rate variables, results of combined data sets are very similar, data not shown). The reason is that for a given value of the first estimated cumulant k 1 , the constraint k h k the component rates enter via ξ ξ   3 ⋅ ν , while the β 2 -dependence is a parabola with negative curvature. As especially for large values of ξ, the objective function profits more from high component rates than the constraint penalizes these. As a consequence, the maximization favors high component rates over strong rate fluctuations, especially for large ξ. The results of the maximization procedure for the data with pure correlation supports this interpretation, as the standardized variance of the model that maximizes κ 3 [Z], β 2 * , decreases with ξ ( Figure 5A). For ξ ≥ 4, we have β * 2 0 = , and, as a consequence, the maximizing model of the adapted maximization problem is the same as that of the stationary maximization. Consequently, also the solutions κ * 3,ξ and the p-values trains do not show evidence of time varying firing rates, or provide upper bounds for the cumulants of their firing rates, this information may be extrapolated to the level of the carrier rate. Together with general moment inequalities (e.g., Kumar, 2002), such information may help to dispose of the explicit choice of the carrier family by providing an explicit upper bound for β 3 . Although the absence of a precise parametric model for the carrier distribution impedes the faithful computation of Var[k 3 ] under H 0 3,ξ , the close similarity of the error-bars in Figure 5A for the different methods makes us confident that Var[k 3 ] can be reasonably approximated by any carrier distribution as long as the upper bound κ * 3, , ξ ns is faithful (see also Section 'What is the "true" Carrier Family?' for conceptual issues concerning the choice for f R ).

dIscussIon of case studIes
The analysis of artificial data (Figures 3 and 4) can be summarized by four major points: 1. For data with pure firing rate non-stationarity but no higherorder correlations, the stationary CuBIC misinterprets the common rate variations as events of amplitude ≥2, i.e., as correlations. The order of the inferred correlation depends on the kind of non-stationarity in the data (ξ  = 2 for cosine carrier rate, ξ  = 4 for gamma-distributed rate variable). 2. Allowing for potential non-stationarities in the null-hypotheses can reduce this lower bound. Using the correct family for the carrier distribution, i.e., a cosine-like model for the data in Figure 3 and a gamma-distribution for the data in Figure 4, corrects entirely for the false positive inference of the stationary CuBIC, such that the lower bound becomes ξ  = 1. Choosing the wrong family, however, may not account properly for rate * that solves Eq. 25 for ξ = 1,…,7 for different types of allowed non-stationarities (green: stationary CuBIC; red: non-stationary CuBIC allowing uniform carrier distribution; blue: non-stationary CuBIC allowing gamma carrier distribution).  Figure 4).
In either case, the term "correlations beyond the rate" is biologically meaningful only after an explicit interpretation of the term "rate" by the experimenter. Finally, we stress that the different choices for the carrier family affected the test results only weakly, especially if the data had genuine spike correlations (Figures 3 and 4, left and right columns). Thus, a perfect match between the carrier distribution underlying a Monte Carlo simulation and the family assigned in CuBIC does not seem to be of great importance for a reliable interpretation of test results. This Appendix shows that the solution ( , , ) ν ν * * 1 … ξ of the stationary (Eq. 6) and the non-stationary maximization problem (Eq. 25) fulfills ν * k = 0 for k = 2,…,ξ − 1. The non-stationary problem (Eq. 25) has the objective function

appendIx lIst of symBols
are identical. Contrary to our initial intuition, the maximization thus does not generally favor strong rate variations. Evidently, however, the extent to which the inclusion of non-stationarities in the null-hypothesis alter test results depends crucially on the parameters of the data.

parameter-dependence of test results
The results of the case studies summarized above suggest that including potential non-stationarities in the null-hypothesis is always a safe bet: it corrects for false positive inference if correlation originates from rate effects, but does not alter p-values if the stationary CuBIC did not overestimate the order of correlation.
To sketch the parameter range where including non-stationarities reduces p-values only if necessary, recall that we have identified the reason for the unchanged maximal third cumulant in the interplay between the constraint and the objective function. Considering the general objective function (Eq. 25) however, we find that the influence of the non-stationarity (via β 2 and β 3 ) depends on the value of the first sample cumulant k 1 . For k 1 < 1, non-stationarities (positive β 2 and β 3 ) hardly influence the objective function, hence the maximization can be assumed to favor high component rates over non-stationarities, yielding identical test results for the stationary and the adapted CuBIC. For k 1 >> 1 a small increase in β 3 has a strong effect on the objective function, which may in turn favor strong non-stationarities, thereby producing different test results for the stationary and the adapted method. Thus, a crucial parameter for the performance of the adaptation is the first sample cumulant k 1 . Now k 1 is the estimator of κ κ Σ Σ λ where λ i is the (average) firing rate of the ith neuron. Thus, given the summed firing rate Σ i N i =1 λ of the recorded population, we may chose the bin size h in order to keep k 1 in a range where the adaptation can be expected to have reasonable test power. As all simulations of Section "Case Studies" had k 1 ∼ 2.5, achieving a value of k 1 < 1 is not always necessary.

What Is the "true" carrIer famIly?
In Section "Correcting for Rate Effects" we have presented a few guidelines for the choice of the family of carrier rate distributions. There are cases, however, where this choice is not easily justifiable by resorting to observable quantities. To discuss the status of this problem in more depth, we wish to stress that measures of correlation and their various corrections are purely statistical in nature. To be of scientific value, statistical results have to be put in context and must be interpreted, e.g., in terms of biophysical mechanisms. The "firing rate" of a neuron, for example, is not a biophysical entity as such. The term arises only if one describes the variable behavior of a neuron using point processes. As a consequence, whether or not the choice of a chosen carrier family f R is "valid" depends entirely on the intended biological interpretation. If, for example, the choice of f R is guided by the properties of the stimulus, the natural interpretation of rejected null-hypothesis is that the dynamic properties of the neuronal network under investigation generates correlations beyond direct stimulus effects. In an alternative situation, f R may be chosen to reflect the slow ongoing dynamics observed in a simultaneously recorded mass signal (as e.g., described in Tsodyks et al., 1999). Significant higher-order correlations are then interpreted as coordinated activity that is not covered by such large-scale phenomena.

Uniform carrier rates
If R is uniformly distributed between a and b, the raw moments of R are given as

Gamma carrier rates
The moments of the gamma distribution f R k R e k The gradient of the objective function therefore points to negative values of ν k for k = 2,…,ξ − 1. Because ν k ≥ 0, the maximum is achieved for ν k = 0. In the stationary maximization problem, insertion of the constraints into the objective function yields the same F as Eq. 35, only with H = k 2 (ξ + 1). Consequently, Eqs. 36-41 hold also in the stationary scenario.

cumulants of carrIer dIstrIButIons
For the computation of Var[k 3 ] (Eq. 7) from the solutions of the maximization procedure, i.e., the parameters  ν * ξ and β 2 * , we require explicit expressions for the cumulants of the carrier distribution f R up to order m = 6. Recall that we here considered only two-dimensional parametric families for f R , which implies that these parameters can be computed from the known mean value E[ ] R k k = = Σ 1 ξ ν * and the normalized variance β 2 * . In the following, we provide explicit expressions that relate the raw moments to these parameters. Expressions for the cumulants are then computed by applying the conversion map G constructed in Section "Cumulants of the Non-stationary CPP".