# Mutual information against correlations in binary communication channels

- Agnieszka Pregowska
^{1}, - Janusz Szczepanski
^{1}Email author and - Eligiusz Wajnryb
^{1}

**16**:32

https://doi.org/10.1186/s12868-015-0168-0

© Pregowska et al.; licensee BioMed Central. 2015

**Received: **22 November 2014

**Accepted: **21 April 2015

**Published: **19 May 2015

## Abstract

### Background

Explaining how the brain processing is so fast remains an open problem (van Hemmen JL, Sejnowski T., 2004). Thus, the analysis of neural transmission (Shannon CE, Weaver W., 1963) processes basically focuses on searching for effective encoding and decoding schemes. According to the Shannon fundamental theorem, mutual information plays a crucial role in characterizing the efficiency of communication channels. It is well known that this efficiency is determined by the channel capacity that is already the maximal mutual information between input and output signals. On the other hand, intuitively speaking, when input and output signals are more correlated, the transmission should be more efficient. A natural question arises about the relation between mutual information and correlation. We analyze the relation between these quantities using the binary representation of signals, which is the most common approach taken in studying neuronal processes of the brain.

### Results

We present binary communication channels for which mutual information and correlation coefficients behave differently both quantitatively and qualitatively. Despite this difference in behavior, we show that the noncorrelation of binary signals implies their independence, in contrast to the case for general types of signals.

### Conclusions

Our research shows that the mutual information cannot be replaced by sheer correlations. Our results indicate that neuronal encoding has more complicated nature which cannot be captured by straightforward correlations between input and output signals once the mutual information takes into account the structure and patterns of the signals.

### Keywords

Shannon information Communication channel Entropy Mutual information Correlation Neuronal encoding## Background

Huge effort has been undertaken to analyze neuronal coding, its high efficiency and mechanisms governing them [1]. Claude Shannon published his famous paper on communication theory in 1948 [2,3]. In that paper, he formulated in a rigorous mathematical way intuitive concepts concerning the transmission of information in communication channels. The occurrences of inputs transmitted via channel and output symbols are described by random variables *X* (input) and *Y* (output). An actual important task is determination of an efficient decoding scheme; i.e., a procedure that allows a decision to be made about the sequence (message) input to the channel from the output sequence of symbols. This is the essence of the fundamental Shannon theorem, in which a crucial role is played by the capacity of the channel that is given by the maximum of mutual information over all possible probability distributions of input random variables. The theorem states that the efficiency of a channel is better when the mutual information is higher [4,5]. Analyzing a relation between data, in particular the input and response of any system, experimentalists apply the most natural tools; i.e., different types of correlations [6-14]. Correlation analysis has been used to infer the connectivity between signals. The standard correlation measure is the Pearson correlation coefficient commonly exploited in data analysis [15,16]. However, there are a number of correlation-like coefficients dedicated to specific biological and experimental phenomena [6]. Therefore, besides the Pearson correlation coefficient, in this paper, we also consider the correlation coefficient based on the spike train that is strongly related to the firing activity of neurons transmitting information. A natural question arises about the role of correlation coefficients in the description of communication channels, especially in effective decoding schemes [17,18]. Recently, interesting result has been shown [19], analytically and numerically, concerning the effects of correlations between neurons in encoding population. It turned out that decorrelation does not imply an increase in information. In [20] it was observed that the spike trains of retinal gangolin cells were indeed decorelated in comparison with the visual input. The authors conjecture that this decorrelation would enhance coding efficiency in optic nerve fibers of limited capacity. We begin a conversation about whether mutual information can be replaced in some sense by a correlation coefficient. In this paper we consider binary communication channels. It seems that the straightforward idea holds true: there is a high correlation between output and input; i.e., in the language of neuroscience, by observing a spike in the output we guess with high probability that there is also a spike in the input. This finding suggests that the mutual information and correlation coefficients behave in a similar way. In fact, we show that this is not always true and that it often happens that the mutual information and correlation coefficients behave in completely different ways.

## Methods

*Δ*

*τ*, so that in each time slice (bin) a spike is either present or absent. If we think of a spike as representing a "1" and no spike as representing a “0”, then, if we look at some time interval of length

*T*, each possible spike train is equivalent to \(\frac {T}{\Delta \tau }\) digit binary number. In [26] it was shown that transient responses in auditory cortex can be described as a binary process, rather than as a highly variable Poisson process. Thus, in this paper, we analyze binary information sources and binary channels [25]. Such channels are described by a 2 × 2 matrix:

Symbol *p*
_{
j|i
} denotes the conditional probability of transition from state *i* to state *j*, where *i*=0,1 and *j*=0,1. Observe, that *i* and *j* are states of “different” neurons. Input symbols 0 and 1 (coming from the information source governed, in fact, by a random variable *X*) arrive with probabilities \({p_{0}^{X}}\) and \({p_{1}^{X}}\), respectively.

*C*, one can find a relation between these random variables; i.e., one can find by applying the standard formula \(p(Y=j|X=i):=\frac {p(X=i \wedge Y=j)}{p(X=i)}\) joint probability matrix

*M*(2

*x*2), which in general is of the form

*X*and

*Y*are given by

*X*(input signal to the channel) and

*Y*(output from the channel) both assuming only two values 0 and 1, formally both defined on the same probability space. It is well known that the correlation coefficient for any independent random variables

*X*and

*Y*is zero [14], but in general it is not true that

*ρ*(

*X*,

*Y*)=0 implies independence of random variables. However, for our specific random variables

*X*and

*Y*, which are of binary type, most common in communication systems, we show the equivalence of independence and noncorrelation (see Appendix). The basic idea of introducing the concept of a mutual information is to determine the reduction of uncertainty (measured by entropy) of random variable

*X*provided that we know the values of discrete random variable

*Y*. The mutual information (

*M*

*I*) is defined as

*H*(

*X*) is the entropy of

*X*,

*H*(

*Y*) is the entropy of

*Y*,

*H*(

*X*,

*Y*) is the joint entropy of

*X*and

*Y*, and

*H*(

*X*|

*Y*) is the conditional entropy [4,17,21,27-29]. These entropies are defined as

*I*
_{
s
} and *O*
_{
s
} are, in general, sets of input and output symbols, *p*(*X*=*i*) and *p*(*Y*=*j*) are probability distributions of random variables *X* and *Y*, and *p*(*X*=*i*∧*Y*=*j*) is the joint probability distribution of *X* and *Y*. Estimation of mutual information requires knowledge of the probability distributions, which may be easily estimated for two-dimensional binary distributions, but in real applications it possesses multiple problems [30]. Since, in practice, the knowledge about probability distributions is often restricted, more advanced tools must be applied, such as effective entropy estimators [24,30-33].

*R*

*M*

*I*(

*X*,

*Y*) [34] between random variables

*X*and Y is defined as the ratio of

*M*

*I*(

*X*,

*Y*) and the average of information transmitted by variables

*X*and

*Y*:

*R*
*M*
*I*(*X*,*Y*) measures the reduction in uncertainty of *X*, provided we have knowledge about the realization of *Y*, relative to the average uncertainty of *X* and *Y*.

- 1.
0≤

*R**M**I*(*X*,*Y*)≤1; - 2.
*R**M**I*(*X*,*Y*)=0 if and only if*X*and*Y*are independent; - 3.
*R**M**I*(*X*,*Y*)=1 if and only if there exists a deterministic relation between*X*and*Y*.

*ρ*(

*X*,

*Y*) of random variables

*X*and

*Y*is

*E*is the average over the ensemble of elementary events, and

*V*(

*X*) and

*V*(

*Y*) are the variations of

*X*and

*Y*. Adopting the communication channels notation, we get

It follows that the Pearson correlation coefficient *ρ*(*X*,*Y*) is by no means a general measure of dependence between two random variables *X* and *Y*. *ρ*(*X*,*Y*) is connected with the linear dependence of *X* and *Y*. That is, the well-known theorem [15] states that the value of this coefficient is always between -1 and 1 and assumes -1 or 1 if and only if there exists a linear relation between *X* and *Y*.

*NSTC*) coefficient:

*NSTC*coefficient has an important property: i.e., once we know the firing rates \({p_{1}^{X}}\) and \({p_{1}^{Y}}\) of individual neurons and the coefficient, we can determine the joint probabilities of firing:

Since *p*
_{11}≥0, by formula (12) we have the lower bound *N*
*S*
*T*
*C*≥−1. The upper bound is unlimited for the general class (2) of joint probabilities. In the important special case when the communication channel is effective enough, i.e. *p*
_{11} is large enough so the input spikes with high probability pass through the channel, one has the following practical upper bound of \(NSTC<\frac {1}{p_{11}}-1\).

We present realizations of a few communication channels that show that the relative mutual information, the Pearson correlation coefficient and neuroscience spike-train correlation coefficient may behave in different ways, both qualitatively and quantitatively. Each of these realizations constitutes a family of communication channels parameterized in a continuous way by a parameter *α* from some interval. For each *α*, we propose, assuming some relation between neurons activities, the joint probability matrix of input and output signals and the information source distributions. These communication channels are determined by 2 × 2 matrixes of conditional probabilities (1). Next the joint probability is used to evaluate both the relative mutual information and correlation coefficients. Finally, we plot the values of the relative mutual information and both correlation coefficients against *α* to illustrate their different behaviors.

## Results and discussion

*α*while

*NSTC*and Pearson correlation coefficients are practically constant. Moreover,

*RMI*has large values which, according to the fundamental Shannon theorem, result in high transmission efficiency, while the Pearson correlation coefficient

*ρ*is small. To realize these effects, we consider the situation described by the joint probability matrix (14) where the first neuron becomes more active (i.e., the probability of firing increases) with an increase in the parameter

*α*while simultaneously the activity of the second neuron is unaffected by

*α*. Thus, the joint probability matrix

*M*(

*α*) reads

*C*(

*α*):

*X*with probability distribution \({p_{0}^{X}}=\frac {3}{5}-2\alpha \) and \({p_{1}^{X}}=\frac {2}{5}+2\alpha \). The behaviors of

*RMI*,

*ρ*and the

*NSTC*coefficient are presented in Figure 1.

*α*while the second neuron behaves in the opposite way. The joint probability matrix

*M*(

*α*) we propose is

*C*(

*α*) are of the form

*NSTC*coefficient strongly decreases from positive to negative values, while

*ρ*and

*RMI*vary non-monotonically around zero. Moreover,

*ρ*exhibits one extreme and

*RMI*two extremes. Additionally, for

*α*=0.35, the

*RMI*is close to zero while the

*NSTC*coefficient is approximately -0.32 (Figure 2). We point out these values to stress that, according to the fundamental Shannon theorem, the transmission is not efficient (

*RMI*is small), although at the same time, the activity of neurons described by the

*NSTC*coefficient is relatively well correlated. Figure 2 shows the behaviors of

*RMI*,

*ρ*and the

*NSTC*coefficient. Finally, we present the situation (18) in which one neuron does not change its activity with

*α*and the activity of the other neuron increases with

*α*. Additionally, in contrast to the first case, the second neuron changes its activity only when the first neuron is active.

*C*(

*α*) is given by

*NSTC*coefficient increases linearly from large negative values below -0.4 to a positive value of 0.1. Simultaneously,

*ρ*is practically zero and

*RMI*is small (below 0.1) but varies in a non-monotonic way having a noticeable minimum (Figure 3). Moreover, observe that for small

*α*the

*RMI*(equal to 0.1) is visibly larger than zero what suggests that the communication efficiency is relatively good, while at the same time the Pearson correlation coefficient

*ρ*(equal to -0.03) is very close to zero, indicating that the input and output signals are almost uncorrelated (independent for binary channels). It suggests that these measures describe different qualitative properties. Figure 3 shows the behaviors of

*RMI*,

*ρ*and the

*NSTC*coefficient.

## Conclusions

- 1.
for not correlated binary variables (

*ρ*(*X*,*Y*)=0), (which are shown in the Appendix to be independent) one has*R**M**I*=0, - 2.
for fully correlated random variables (|

*ρ*(*X*,*Y*)|=1) (which are linearly dependent) one has*R**M**I*=1. We introduce a few communication channels for which the correlation coefficients behave completely differently to the mutual information, which shows this intuition is erroneous.

In particular, we present the realizations of channels characterized by high mutual information for input and output signals but at the same time featuring very low correlation between these signals. On the other hand, we find channels featuring quite the opposite behavior; i.e., having very high correlation between input and output signals while the mutual information turns out to be very low. This is because the mutual information, which in fact is a crucial parameter characterizing neuronal encoding, takes into account structures (patterns) of the signals and not only their statistical properties, described by firing rates. Our research shows that neuronal encoding has a much more complicated nature that cannot be captured by straightforward correlations between input and output signals.

## Appendix

The theorem states that independence and noncorrelation are equivalent for random variables that take only two values.

###
**Theorem**
**1**.

*X*and

*Y*be random variables, which take only two real values

*a*

_{ x },

*b*

_{ x }and

*a*

_{ y },

*b*

_{ y }, respectively. Let

*M*be the joint probability matrix

*X*and

*Y*are given by

Adopting this notation, the condition *ρ*(*X*,*Y*)=0 implies that random variables *X* and *Y* are independent.

To prove this Theorem 1, we first show the following particular case for binary random variables.

###
**Lemma**
**1**.

*X*

_{1}and

*Y*

_{1}be two random variables, which take two values 0,1 only. Let

*M*

_{1}be the joint probability matrix

Adopting this notation, *ρ*(*X*
_{1},*Y*
_{1})=0 implies that *X*
_{1} and *Y*
_{1} are independent.

###
*Proof*.

*p*

_{11}−(

*p*

_{01}+

*p*

_{11})(

*p*

_{10}+

*p*

_{11})=0; i.e.,

*p*

_{11}is factorized \(p_{11}=p_{1}^{X_{1}} \cdot p_{1}^{Y_{1}}\). To prove the independence of

*X*

_{1}and

*Y*

_{1}, we have to show that

We prove the first and second equality, and the third equality can be proven analogously.

To generalize this Lemma 1, we consider the following. □

###
**Lemma**
**2**.

Assuming the notation as in Lemma 1, let us define the random variables: let *X*:=(*b*
_{
x
}−*a*
_{
x
})*X*
_{1}+*a*
_{
x
} and *Y*:=(*b*
_{
y
}−*a*
_{
y
})*Y*
_{1}+*a*
_{
y
}.

Under these assumptions, *ρ*(*X*,*Y*)=0 implies that *X* and *Y* are independent. In other words, divalent, uncorrelated random variables have to be independent.

###
*Proof*.

The proof is straightforward and follows directly (by the linearity of the average value) from the definition of the correlation coefficient (10) and from the fact that the joint probability matrices *M*
_{1} for *X*
_{1} and *Y*
_{1} and *M* for *X* and *Y* are formally the same. Since by Lemma 1 the random variables *X*
_{1} and *Y*
_{1} are independent, the random variables *X* and *Y* must also be independent.

Finally, observe that *X* takes the values *a*
_{
x
},*b*
_{
x
} and *Y* takes the values *a*
_{
y
},*b*
_{
y
} only. Therefore, Theorem 1 follows immediately from Lemma 2. □

## Declarations

### Acknowledgements

We gratefully acknowledge financial support from the Polish National Science Centre under grant no. 2012/05/B/ST8/03010.

## Authors’ Affiliations

## References

- van Hemmen JL, Sejnowski T. 23 Problems in Systems Neurosciences. UK: Oxford University Press; 2006.View ArticleGoogle Scholar
- Shannon CE, Weaver W. The Mathematical Theory of Communication. United States of America: University of Illinois Press, Urbana; 1963.Google Scholar
- Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948; 27:379–423623656.View ArticleGoogle Scholar
- Borst JL, Theunissen FE. Information theory and neural coding. Nat Neurosci. 1999; 2:947–57.View ArticlePubMedGoogle Scholar
- Paprocki B, Szczepanski J. Transmission efficiency in ring, brain inspired neuronal networks. information and energetic aspects. Brain Res. 2013; 1536:135–43.View ArticlePubMedGoogle Scholar
- Cohen MR, Kohn A. Measuring and interpreting neuronal correlations. Nat Neurosci. 2011; 14:811–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Arnold M, Szczepanski J, Montejo N, Wajnryb E, Sanchez-Vives MV. Information content in cortical spike trains during brain state transitions. J Sleep Res. 2013; 2:13–21.View ArticleGoogle Scholar
- Abbott LF, Dayan P. The effect of correlated variability on the accuracy of a population code. Neural Comput. 1999; 11:91–101.View ArticlePubMedGoogle Scholar
- Nirenberg S, Latham PE. Decoding neuronal spike trains: how important are correlations? In: Proceedings of National Academy of Science USA, 10 June 2003. National Academy of Science USA: 2003. p. 7348–53.Google Scholar
- de la Rocha J, Doiron B, Shea-Brown E, Josic K, Reyes A. Correlation between neural spike trains increases with firing rate. Nature. 2007; 448:802–6.View ArticlePubMedGoogle Scholar
- Pillow JW, Shlens J, Paninski L, Sher A, Litke AM, Chicgilnisky J, et al. Spatio-temporal correlations and visual signaling in a complete neuronal population. Nature. 2008; 454:995–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Amari S. Measure of correlation orthogonal to change in firing rate. Neural Comput. 2009; 21:960–72.View ArticlePubMedGoogle Scholar
- Ecker AS, Berens P, Keliris GA, Bethge M, Logothetis NK, Tolias AS. Decorrelated neuronal firing in cortical microcircuits. Science. 2010; 327:584–7.View ArticlePubMedGoogle Scholar
- Nienborg H, Cumming B. Stimulus correlations between the activity of sensory neurons and behavior: how much do they tell us about a neuron’s causality?Curr Opin Neurobiology. 2010; 20:376–81.View ArticleGoogle Scholar
- Feller W. An Introduction to Probability Theory and Its Applications. United States of America: A Wiley Publications in Statistics, New York; 1958.Google Scholar
- Kohn A, Smith MA. Stimulus dependence of neuronal correlation in primary visual cortex of the macaque. J Neurosci. 2005; 25:3661–73.View ArticlePubMedGoogle Scholar
- Ash RB. The Mathematical Theory of Communication. United States of America: John Wiley and Sons, New York, London, Sydney; 1965.Google Scholar
- Eguia MC, Rabinovich MI, Abarbanel HDI. Information transmissionand recovery in neural communications channels. Phys Rev E. 2000; 65(5):7111–22.View ArticleGoogle Scholar
- Moreno-Bote R, Beck J, Kanitscheider I, Pitkow X, Latham P, Pouget A. Information-limiting correlations. Nat Neurosci. 2014; 17:1410–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Pitkow X, Meister M. Decorrelation and efficient coding by retinal ganglion cells. Nat Neurosci. 2012; 15:628–35.View ArticlePubMedPubMed CentralGoogle Scholar
- Cover TM, Thomas JA. Elements of Information Theory. United States of America: A Wiley-Interscience Publication, New York; 1991.View ArticleGoogle Scholar
- Rolls ET, Aggelopoulos NC, Franco L, Treves A. Information encoding in the inferior temporal visual cortex: contributions of the firing rates and the correlations between the firing of neurons. Biol Cybernetics. 2004; 90:19–32.View ArticleGoogle Scholar
- Levin JE, Miller JP. Broadband neural encoding in the cricket cercal sensory system enhanced by stochastic resonance. Nature. 2004; 380:165–8.View ArticleGoogle Scholar
- Amigo JM, Szczepanski J, Wajnryb E, Sanchez-Vives MV. Estimating the entropy rate of spike trains via lempel-ziv complexity. Neural Comput. 2004; 16:717–36.View ArticlePubMedGoogle Scholar
- Chapeau–Blondeau F, Rousseau D, Delahaines A. Renyi entropy measure of noise-aided information transmission in a binary channel. Phys Rev E. 2010; 81(051112):1–10.Google Scholar
- DeWesse MR, Wehr M, Zador A. Binary spiking in auditory cortex. J Neurosci. 2003; 27(23/21):7940–9.Google Scholar
- Paninski L. Estimation of entropy and mutual information. Neural Comput. 2003; 15(6):1191–253.View ArticleGoogle Scholar
- London M, Larkum ME, Hausser M. Predicting the synaptic information efficacy in cortical layer 5 pyramidal neurons using a minimal integrate-and-fire model. Biol Cybernetics. 2008; 99:393–401.View ArticleGoogle Scholar
- Kraskov A, Stogbauer H, Grassberger P. Estimating mutual information. Phys Rev E. 2004; 69(6):066138.View ArticleGoogle Scholar
- Panzeri S, Schultz SR, Treves A, Rolls ET. Correlation and the encoding of information in the nervous system. Proc R Soc London. 1999; B:1001–12.View ArticleGoogle Scholar
- Lempel A, Ziv J. On the complexity of finite sequences. IEEE Trans Inf Theory. 1976; 22(1):75–81.View ArticleGoogle Scholar
- Lempel A, Ziv J. On the complexity of individual sequences. IEEE Trans Inf Theory. 1976; IT-22:75.View ArticleGoogle Scholar
- Strong SP, Koberle R, de Ruyter van Steveninck RR, Bialek W. Entropy and information in neural spike trains. Phys Rev Lett. 1998; 80(1):197–200.View ArticleGoogle Scholar
- Szczepanski J, Arnold M, Wajnryb E, Amigo JM, Sanchez-Vives MV. Mutual information and redundancy in spontaneous communication between cortical neurons. Biol Cybernetics. 2011; 104:161–74.View ArticleGoogle Scholar
- Bair W, Zohary E, Newsome WT. Correlated firing in macaque visual area mt: time scales and relationship to behavior. J Neurosci. 2001; 21(5):1676–97.PubMedGoogle Scholar

## Copyright

This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.