The discrimination of interaural level difference sensitivity functions: development of a taxonomic data template for modelling
 Balemir Uragun†^{1}Email author and
 Ramesh Rajan†^{1}
DOI: 10.1186/1471220214114
© Uragun and Rajan; licensee BioMed Central Ltd. 2013
Received: 6 January 2013
Accepted: 30 September 2013
Published: 7 October 2013
Abstract
Background
A major cue for the position of a highfrequency sound source in azimuth is the difference in sound pressure levels in the two ears, Interaural Level Differences (ILDs), as a sound is presented from different positions around the head. This study aims to use data classification techniques to build a descriptive model of electrophysiologically determined neuronal sensitivity functions for ILDs. The ILDs were recorded from neurons in the central nucleus of the Inferior Colliculus (ICc), an obligatory midbrain auditory relay nucleus. The majority of ICc neurons (~ 85%) show sensitivity to ILDs but with a variety of different forms that are often difficult to unambiguously separate into different informationbearing types. Thus, this division is often based on laboratoryspecific and relatively subjective criteria. Given the subjectivity and nonuniformity of ILD classification methods in use, we examined if objective data classification techniques for this purpose. Our key objectives were to determine if we could find an analytical method (A) to validate the presence of four typical ILD sensitivity functions as is commonly assumed in the field, and (B) whether this method produced classifications that mapped on to the physiologically observed results.
Methods
The threestep data classification procedure forms the basic methodology of this manuscript. In this threestep procedure, several data normalization techniques were first tested to select a suitable normalization technique to our data. This was then followed by PCA to reduce data dimensionality without losing the core characteristics of the data. Finally Cluster Analysis technique was applied to determine the number of clustered data with the aid of the CCC and Inconsistency Coefficient values.
Results
The outcome of a threestep analytical data classification process was the identification of seven distinctive forms of ILD functions. These seven ILD function classes were found to map to the four “known” ideal ILD sensitivity function types, namely: SigmoidalEI, SigmoidalIE, Peaked, and Insensitive, ILD functions, and variations within these classes. This indicates that these seven templates can be utilized in future modelling studies.
Conclusions
We developed a taxonomy of ILD sensitivity functions using a methodological data classification approach. The number and types of generic ILD function patterns found with this method mapped well on to our electrophysiologically determined ILD sensitivity functions. While a larger data set of the latter functions may bring a more robust outcome, this good mapping is encouraging in providing a principled method for classifying such data sets, and could be well extended to other such neuronal sensitivity functions, such as contrast tuning in vision.
Background
The ability to identify the location of a sound source is a core auditory ability for many daily purposes [1]. Our ability to accurately localize sounds depends on coding, by neurons in the Central Nervous System, of various cues to the location of the sounds. For ongoing high frequency sounds, the major cue for azimuthal location of the sound source is the difference in intensity/level (formerly Interaural Intensity Differences, now Interaural Level Differences; IIDs/ILDs) [2]. ILDs are the difference in sound levels at the two ears as a sound source moves about an animal and are created by head and body shadowing effects which affect high frequency sounds more than low frequency sounds [3]. There is a vast literature on the importance of ILDs and how neurons at various brain levels respond to ILDs that cover a wide azimuthal range across frontal space, from opposite one ear across to opposite the other. In mammals this cue is first functionally coded by neurons in the auditory brainstem, and then relayed to the Inferior Colliculus (IC), but it is clear that in some species at least (including the rat studied here), ILD sensitivity is also created de novo in many IC neurons [4].
Different IC neurons appear to use different combinations of interactions between excitatory and inhibitory inputs to code ILDs (a set of neuronal operations that also appears to be used in auditory cortex), [5] producing a diversity of forms of ILD sensitivity in neurons in the one auditory structure; this diversity argues against using a single network model to describe all the different forms of ILD sensitivity.
Introduction to data normalization
Data normalization is a scaling process for numbers in a data array and is used where a great heterogeneity in the numbers renders difficult any standard statistical analysis. The data is often normalized before any application process and therefore data normalization is usually termed as data preprocessing. Many different data normalization techniques have been developed in diverse scientific fields, e.g. in statistical analysis for applications such as in diagnostic circuits in electronics [6], temporal coding in vision [7], predictive control systems in seismic activities [8], modeling Auditory Nerve stochastic properties [9], modeling labor market activity [10], pattern recognition [11], and most extensively in microarray data analysis in genetics, [12–20].
The need for data normalization is determined by the user and depends on the application. Thus the purpose of data normalization depends on the proposed application, and includes use of linear scaling to compress a large dynamic range [6], scaling of values to correct for variation in laser intensity [18], handling obscure variation [12] or removing systematic errors in data [11, 15, 17, 20], or efficiently removing redundancy in a nonlinear model as an optimal transformation for temporal processing [7]. Although the benefits of data normalization depend on data type, data size and normalization method (which can vary between different fields), generally the advantages of data normalization are (a) to give a more meaningful range of scaled numbers for use, (b) to rearrange the data array in a more regular distribution, (c) to enhance the correctness of the subsequent calculations, and (d) to increase the significance or importance of the most descriptive numbers in a nonnormally distributed data set.
Introduction to data dimension reduction technique
Principal component analysis (PCA) is a statistical tool to reduce the dimensions of a large data set for the purpose of data classification when a data set can be described in a number of different ways, or described by a number of different variables (such as slope steepness, cutoff position, peak location, maximum firing rate etc.), and is therefore said to possess many dimensions. Such data becomes difficult to classify because it is often not known which of these dimensions are the most important or, indeed, if only one of them is the most important. In such a case, some means has to be devised in order to reduce the dimensions in the data set to a single dimension across all the data. This single dimension can then be used to differentiate between subgroups within the overall data set. PCA is a powerful statistical tool that does precisely this.
The PCA is used as an independent statistical method for data classification to handle both metric and multivariable types of data [21]. In the PCA, the data variables are largely dependent on one another; in fact, if data were not correlated then principal components would not be suitable for data dimension reduction. Barlett’s Sphericity Test can be used to verify the appropriate conditions for the data [22], but the details of this test are beyond the scope of this manuscript.
Introduction to cluster analysis
Data classification is a way of segregating similar types of data groups into homogenous clusters. Each of these compact data groups contains a number of dataelements with comparable characteristics. In data classification studies, two methods are generally used to distinguish the classified data, namely: Supervised (discriminant analysis) and Unsupervised (data clustering) classification [23].
Data characterization can be planned as a twostep procedure consisting of the combination of PCA for reduction of data dimensions followed by Cluster Analysis for grouping similar types of data objects. This technique has been widely used in several different types of applications in a diverse range of scientific fields including in crime analysis [24], in finding the relationship between retention parameters and physiochemical parameters of barbiturates [25], in chemometric methods in characterizing steel alloy samples [26], in drug design [27], in isolating single unit activities for data acquisition [28], and in microarray based gene identification [29, 30]. This combined technique has been reviewed by [31] for several clustering algorithms, and they have emphasized the importance of applying PCA prior to Cluster Analysis for high dimensional data.
Results
Data normalization: results
In general terms, normalization is simply signal intensity divided by a reference value, and serves to reduce systematic errors in data variables [32]. Data normalization also maximizes variance [22], which is especially important before applying the data dimension reduction technique for ILD type data of this study. Data normalization is sometimes a prerequisite for data analysis in statistics, and finding a suitable scaling technique for the data is therefore an important task.
Since the appropriate normalization technique for ILD data is unknown, we chose to test seven different techniques against our library of nine ideal ILD function variants. These seven datanormalization techniques have been widely used in many different applications but also in similar types (multivariate) of data. These are namely data normalization by mean correction, by a maximum value, by each vector’s maximum value, by each vector’s standard deviation, data standardization, Logarithmic normalization and, unit total probability mass (UTPM) as detailed in [33].
The nine prototypical ILD functions
Seven data normalization methods  The nine prototypical ILD functions (min: 0.0 and max: 100, with ±6 %)  

(A)  (B)  (C)  (D)  (E)  (F)  (G)  (H)  (I)  
(1) Vn (i, j) = X_{ n }(i, j) − μ_{ n }  46/49  67/68  69/48  37/59  23/73  43/71  64/60  42/46  5.7/8 
(2) Vn (i, j) = X_{ n }(i, j)/max{max{X_{ n }(i, j)}}  0.01/1  0.01/1  0.01/1  0.01/1  0.01/1  0.01/1  0.01/1  0.01/1  0.01/1 
(3) Vn (i, j) = X_{ n }(i, j)/max{X_{ n }(i, j)}  0.01/1  0.01/1  0.01/1  0.01/1  0.01/1  0.01/1  0.01/1  0.02/1  0.24/1 
(4) Vn (i, j) = X_{ n }(i, j)/σ_{ n }  0.04/2  0.02/2  0.04/2  0.06/2  0.04/3  0.03/3  0.02/2  0.1/23  0.6/51 
(5) Vn (i, j) = log_{2}(X_{n}(i, j)) − log_{2}(μ_{n})  4.6/1  6/1.7  5.3/1  4/1.3  4.3/2  4.6/2  4.8/1  4.4/1  1/0.7 
(6) Vn (i, j) = $\frac{{\mathrm{X}}_{\mathrm{n}}\left(\mathrm{i},\mathrm{j}\right)}{{{\displaystyle \sum}}_{\mathrm{n}}{\mathrm{X}}_{\mathrm{n}}}\xb7{\mathrm{\mu}}_{\mathrm{n}}$  0.1/7  0.1/7  0.08/7  0.09/7  0.08/7  0.09/7  0.09/7  0.23/7  0.1/7. 
(7) Vn (i, j) = $\frac{{\mathrm{X}}_{\mathrm{n}}\left(\mathrm{i},\mathrm{j}\right){\mathrm{\mu}}_{\mathrm{n}}}{{\mathrm{\sigma}}_{\mathrm{n}}}$  1.1/1  1.1/1  1.8/1  1/1.7  0.7/2  1.3/2  1.5/1  1.8/1  1.4/1 
PCA result
PCA for ILD data
Figure 3 PCA confirms that the UTPM data normalization is among the normalization techniques that can be represented by “sufficiently” few principal components (Figures 3A, C, E, G versus Figure 3B, D, F, H). It was not the normalization technique that needed the lowest number of principal components: for the normalization technique using division by each vector’s standard deviation, the first principal component (PC1) was sufficient to explain 96.77% of the variance (Figure 3E). For three other normalization techniques (which included the UTPM data normalization technique), the first three principal components (PC1, PC2, and PC3) were needed to account for a significant amount of the variance and explained 84.44%, 10.14% and 3.03% of the variance respectively (Figures 3A, C and G). The remaining four normalization techniques (Figure 3B, D, H and F) required more than three principal components to represent the variance in the data.
Although the technique of data normalization by each vector’s standard deviation was the most parsimonious in the sense that the PCA can be represented by a single principal component, this normalization technique was not used for our data. Our normalization testbench had shown (see Table 1) that this normalization technique when applied to our ILD data did not markedly affect the variance in the neuronal spike counts across the ILD functions. Note that this also illustrates the importance of conducting the normalization worktestbench exercise.
Result: Finding the right number of principal components
Selection of the correct number of principal components is important for reducing data dimensionality in PCA. The selection of the number of principal components is not an arbitrary task, despite the fact that it is general practice to select the very first few principal components, often only the first two. The first principal component (PC1) is the projection of the given points and it has the maximum variance among all possible linear coordinates. The second Principal Component (PC2) has maximum variance along an axis orthogonal to the first principal component [34]. In usual practice with twodimensional PCA, the first two principal components allow efficient visual representation of data and there are certainly some specific examples where a small number of principal components appear appropriate to visualize cluttered data distribution. These include the linear combination of gene expression levels on the first three principal components represented in a threedimensional plot [32], or rotating threedimensional Principal Components representation for the analysis of tumor karyotypes in [35] or in threedimensional object recognition application [36], or first two principal components utilized in the neural activity data analysis for the data classification [28]. Thus the PCA technique offers the least information loss when the first few principal components can account for the greatest variance in the data [37].
It must also be noted that if a large number of principal components is needed to represent a data set, then datanormalization is not efficiently applied [38]. This was also observed for our data with several of the normalization techniques we tested (see Figures 3B, D, F and H).
 (i)
The Screeplot gives Eigenvalues versus number of principal components. The point of change (the elbow of the curvature) in the figures (Figure 3), which distinguishes the number of principal components, is the highest percentage to be retained.
 (ii)
Kaiser’s rule retains all components with Eigenvalues greater than one [40], and is a way of measuring the common variance of variables.
 (iii)
Horn’s procedure is similar to Kaiser’s Rule; it gives fewer principal components than would Kaiser’s Rule.
 (iv)
Explained variance is a way of looking for the variance explained by the first few principal components. This may be a sufficient way to decide whether more principal components are required.
Thirteen principal components result
PC1: 84.447%  PC2: 10.149%  PC3: 3.033%  ➩ ∑ = 97.63 % 
PC4: 0.846%  PC5: 0.681%  PC6: 0.249%  ➩ ∑ = 1.776 % 
PC7: 0.188%  PC8: 0.122%  PC9: 0.087%  ➩ ∑ = 0.397 % 
PC10: 0.060%  PC11: 0.055%  PC12: 0.036%  ➩ ∑ = 0.151 % 
PC13: 0.033%  ➩ ∑ = 0.033 % 
Using PCA to represent the data in a reduced dimensionality form
These first three principal components accounted for 97.629% of the total variances (Table 2). The other principal components (from PC4 through PC13) were sufficiently low that they could be discarded. Thus the first three principal components can be treated as the new, reduced dimension, form of the ILD data.
 (i)
−6 ≤ PC1 (first principal component) ≤ +39
 (ii)
 −12 ≤ PC2 (second principal component) ≤ +12
 (iii)
−6 ≤ PC3 (third principal component) ≤ +7
The values of the first three principal components were projected onto the three axes of a threedimensional plot in Figure 4A. However, despite accounting for a large part of the variance in the ILD data, the reduced dimension data representation with three principal components was still not sufficient to distinguish the number of classified data. To address this issue, we turned to the third step of our process, Cluster Analysis, to classify the data.
Cluster analysis result
The assessment of the cluster dissimilarities
Cophenetic correlation  Linkage algorithms  

Coefficients (CCC)  Single  Average  Complete  Ward  
Pairwise  distance algorithms  
Euclidian  0.69391  0.77691  0.6625  0.57017  
Seuclidian  0.75833  0.80679  0.64322  0.48974  
Minkowski  0.69391  0.77691  0.6625  0.57017  
Mahalanobis  0.75833  0.80679  0.64322  0.48974  
Cityblock  0.72678  0.79032  0.57675  0.54419  
Cosine  0.34928  0.81656  0.73456  0.83168 
Discussion
Data normalization conclusion and discussion
We developed prototypical ILD functions to test several normalization techniques. This type of test bench was used for the first time in this field to investigate appropriate normalization techniques. This testing showed that the best method for normalization of ILD functions was the UTPM normalization method.
Generally, in data preprocessing, the normalization procedure is selected to be specific for the application under study, even if it is necessary to improvise by slight adaptation of existing normalization techniques (e.g., multiplication of the mean value of data for our UTPM type of normalization technique for each ILD function). It is true that we did not test all possible normalization methods, e.g., other data normalization methods which are variants of existing ones, such as dividing by a sum of all signals or by the standard error signals after correcting mean values [19]. We also recognize that slight variations of some of the methods used here could give a normalization procedure that would result in a nonlinear feature for the data [17]. Finally, in statistics it is a common practice to devise new normalization technique, as has been used to design new normalization technique for microarray data analysis [13], or sometimes different datasets are applied to find the normalization technique that most reduces variations by a comparison with the original data set [12].
We also note that in addition to visually comparing the raw data against the result of applying a selected normalization technique to the data, other methods are also available. Selection of the correct normalization technique can be quantified by examining the quality of the normalization technique and this can be estimated by; (a) calculating the sum of squares of differences between the model and normalization histogram, (b) using Pearson correlation coefficients between the values before and after normalization of data [42]. Such a quantification method for normalization selection is worth investigation, but is beyond the scope of this study.
Despite these constraints, we believe that we have identified an appropriate normalization technique that can be successfully applied to electrophysiologicallyrecorded neuronal sensitivity functions for ILD, the major binaural cue for azimuthal location of highfrequency sounds.
PCA discussion
There are several data dimension reduction techniques but to our knowledge, none of them have been applied to the study of ILD sensitivity functions. It is therefore not possible to evaluate the other types of data reduction techniques against the PCA we applied for data reduction of our ILD functions. Instead of comparing other data reduction techniques to PCA usage for the ILD data, we will therefore briefly explain the other types of data reductions techniques and their suitability for use for analysis of ILD sensitivity data.
The PCA is a good linear data analysis tool despite some limitations for data classification studies. There have been some strategies applied to overcome the shortcomings of PCA by implementing higherorder statistics, such as in nonlinear PCA. These include techniques such as Independent Component Analysis[43, 44], an extension of PCA; Vector Quantization PCA[45] or a Kernel PCA, a nonlinear generalization of PCA cited in [46]. There are even a few new dimensionality reduction procedures such as Linear Discriminate Analysis[47], or a combination of PCA and Discriminate Analysis as an efficient dimension reduction technique [34] or more specific applications such as in a PCA mixture model discussed in [48].
In summary, although PCA is limited for use for nonlinear data applications, it is actually helpful to discriminate the linear variations of data from the nonlinear ones.
Cluster analysis discussion
Cluster Analysis is a broad area in the field of data classification and many clustering algorithms have been developed for many classification applications in diverse scientific areas. These algorithms have some advantages and disadvantages. The following paragraphs will discuss a few Cluster Analysis algorithms and the data to which these have been applied for categorization. This scheme will not tell us which algorithm would be better than the other but may help divulge which algorithm is more applicable to a specific problem since clustering algorithms are application orientated statistical tools.
What is important in cluster analysis?
Cluster Analysis is achieved by a specific algorithm designed for a specific application [23]. There have been over 100 clustering algorithms available for Cluster Analysis [49]. Unfortunately, there is no generic Cluster Analysis algorithm that can be used to give the best solution for all types of data [50]. It is also not practical to design a clustering algorithm for each new application. Thus it is best to choose an algorithm that has been used for a similar type of application and is also a less timeconsuming approach, and is a process that is widely accepted in the field of study of Cluster Analysis. In the end, three things define the importance of Cluster Analysis: (a) Speed, (b) reliability and (c) consistency [49].
Conclusions
In this study we found that the UTPM normalization method was the best data normalization method applicable for ILD sensitivity functions. PCA was used to reduce the dimension of the large number of multivariable data that made up the 208 ILD functions we recorded from the midbrain auditory nucleus, the ICc, of the rat. The transformations used variances of highly correlated variables, and it was found that the first three principal components (i.e., variances) were good enough to represent our normalized data. The variances are explained in terms of percentage accounted for, as well as indicated in the Screeplot and both showed that more than 84% of the transformed data are accounted for by the first three principal components (Total variance explained > 97%). In the process our transformed data were converted from a 13x208 matrix into a form of 3x208 matrix.
Hierarchical Cluster Analysis with the agglomerative technique was used to determine the number of clusters of homogenous data objects in our data. For this analysis, we combined visual and automated Cluster Analysis of the full lot of ILD functions we investigated. We then applied common dendrogram and data cluster techniques that were available at MATLAB version 6.5, Statistic Toolbox version 4.1.
Several pairwisedistance and linkage algorithms were applied to our 3x208 transformed data. The best combination of these was determined as the one generating the highest CCC and this was the Cosine pairwisedistance algorithm combined with the Ward linkage algorithm. This pair of algorithms was used to plot dendrograms to visually present the distribution of the clusters of homogenous data.
To determine the cutoff for the number of clusters the Inconsistency Coefficient from the linkage algorithm application was applied to determine a depth value of three, and resulting in a maximum of seven cluster types. Averages were determined from all ILD functions in each of the seven clustered data (from visual and automated analysis) to identify the prototypical ILD function in that cluster. Then, statistical data analysis methods were used to differentiate between the ILD functions. The result showed seven different prototypical ILD functions, obtained from the three broad categories of ILD functions, namely peak, sigmoidal (two types “EI” and “IE” of them) and insensitive ILD functions, as in Figure 8. More than 80% of the electrophysiological data were of the peak and sigmoidal type of ILD functions. These analyses were completely congruent with the Cluster Analysis and the seven ILD function types from statistical analyses corresponded very well with the seven ILD function types determined by Cluster Analysis.
 (i)
Cluster Analysis was used to determine the number of data groups after PCA.
 (ii)
Cluster Analysis is a way of segregating data into meaningful subgroups or clusters.
 (iii)
Clustered data can be obtained in two ways, supervised and unsupervised data clustering.
 (iv)
There are several clustering algorithms assist for several different types of data clustering methods (KMeans, Hierarchical, Latent Class Analysis, Latent Profile Analysis, and Factor Mixture Model). Hierarchical and Agglomerative types of Cluster Analysis are the most common techniques that applied, where the number of subgroups and their contents (number of data to be formed) are unknown.
 (v)The hierarchical agglomerative technique is commonly used and is the most suited to our data. This method involves four steps [51]:
 a)
Sequentially merge the most similar cases in the N×N similarity matrix, where "N" is the number of objects.
 b)
This visual representation of the sequence of merged clusters can be illustrated in a tree type structure called a "dendrogram".
 c)
“N1” steps are required as numbered of clustered nodes.
 d)
All cases are merged into one group.
 a)
 (vi)
Once the number of clusters and their number of objects are defined, then the result can either be illustrated or tabulated to finalize the data classification solution.
Methods
Data normalization method
Generating prototypical ILD functions
Our own database and an extensive literature review showed that there are four prototypical ILD functions in [different] neurons at all levels of the brain beyond the brainstem [52–55]. These four prototypical functions (Figure 8) consist of two Sigmoid response functions where neuronal responses vary in a sigmoid function with variations in ILDs but with the plateau of responses in one ILD range (favouring one ear) or the other (favouring the other ear), a Peaked response function where neuronal responses are peaked at some ILD that would arise from frontal space, and Insensitive response function where neuronal responses vary little with ILDs. Each of these four broad response categories encompasses functions that can vary in metrics defining the features of that ILD function type, e.g. position of peak or the slope along ILD axis, the steepness of the slope along ILD axis – all of which are features that have been variously discussed to be defining informationbearing elements for deriving azimuthal location of a sound source [56].
In the simulated ILD sensitivity functions, neuronal responses were represented on a scale from ‘0’ to ‘100’ (“m”, maximum response count). This normalized scale allowed us to simulate ILD functions in absolute values. These minimum ‘0’ and maximum ‘100’ values (spikes/stimulus) are also selected for all normalization test bench to give a good comparison in the result.
 (i)
First generate 13 random numbers varying between −0.06 and +0.06 for every unit (i.e. ±6 for the maximum of 100 spike counts, viz., 6% variation),
 (ii)
then add the 13 random numbers arbitrarily to each ILD pattern (each ILD pattern contains 13 numbers and each number presents the number of spike counts), and
 (iii)
finally, monitor the previous steps to verify that perturbed data validly fit in the range from minimum to maximum (e.g., log normalization may produce errors for some values if divided by zero).The ±6% perturbation was applied to the nine ILD functions to produce the final test bench.
Data dimension reduction method: PCA
The PCA algorithm is based on threestep procedure. The objective of this algorithm is to find the principal components with the highest variances, Equation 1.
STEP 1: Finding a covariance matrix from the ILD patterns of an input matrix,
STEP 2: Using covariance matrix to find eigenvectors, and
STEP 3: Using eigenvectors to find principal components.
The PCA algorithm, where vector mean: “μ”, number of sample: “N”, largest Eigenvalues (covariance matrix) of “A”: “λ”, eigenvectors “υ”, principal components: “y” and set of centered input vectors: “x”, and the unit matrix: “I”.
where, total variability can be explained by each principal component in percentage, with the highest variances “υ”.
In practice, ILD sensitivity functions are statistically multivariate data, which can be exhibited as a single matrix. The dimension of this matrix form can be reduced with the aid of PCA. The PCA actually reconstructs the data on an orthogonal basis such that the columns represent the principal components of the projected values. The correlation between the first column and the other column represents the variance. The values of variances are greater between the first column and other columns and the variance values are reduced between the second column and other columns and then further reduced between the third column and other column, and so on. It is therefore an elegant way to represent the multivariable data with only few variables, which corresponds to just the first few columns.
Cluster analysis method
Dendrogram
The representation of clustering derived from the Cluster Analysis can be visualized in a treeshaped graphical representation termed a dendrogram[51]. The vertical axes represent the clustered data groups in pairwisedistances, and horizontal axes represent the predefined number of clusters.
The application of the Inconsistency Coefficient with different number of levels (depth) of the cluster tree is shown in Figure 5. This helps to comprehend the cluster tree distribution in the dendrogram, (Figure 12). The denser the distribution’ more likely that less similar objects are linked to each other; for example; the depth of seven (i.e. seven levels of cluster tree) explains how dissimilar objects are linked to each other, on the other hand three levels of cluster tree in shows the objects are began spreading sparsely around the median value. Note that the median values of Inconsistency Coefficients are always the same due to the fact that the same clustering algorithms (Cosine and Ward) were used for the linkage and pairwisedistance distribution and then applied here to these different numbers of clustering trees.
Materials and method: source of data
In this study, 208 extracellular ILD sensitivity functions were recorded from the left of ICc (Central nucleus of the Inferior Colliculus) cells of male rats. Data collection was carried out in a series of experiments conducted by the second author prior to this present study. These data have not been published and formed the data set used for the present modelling study. Only a brief description is given of the procedures used for that data collection.
Animal preparation and surgery
Ethics statement
All animal experiments were approved by the Monash University Department of Animal Ethics Committee (PHYS/1997/03 and PHYS2000/22) and conducted in accordance with protocols designed by the National Health and Medical Research Council of Australia for the ethical care and welfare of animals under experimentation.
Animal preparation
Adult male Sprague–Dawley and LongEvans SPF (Specific Pathogen Free) male rats (weight range, 250 to 450 grams) were obtained from the Monash University Animal Services for all experiments. Each rat was initially anesthetized with an intraperitoneal injection of 60mg/ml pentobarbital sodium (Nembutal; 0.1 millilitres/ 100 grams of body weight). Then XylazeSaline solution (0.1 millilitres of 1:1) was administered intramuscularly as a muscle relaxant and as an analgesic. Thereafter, throughout the experiment, hourly doses of 0.1 millilitres Nembutal and 0.05 millilitres XylazeSaline solutions were injected subcutaneously or intraperitoneally to keeping the rat in deep anesthesia. Body temperature was maintained at 37.5±0.5°C by a rectal probe feedbackcontrolled electrically heated blanket.
Once deep anaesthesia was established (as evidenced by the absence of withdrawal reflexes to strong noxious pinching of the forepaw as well as absent palpebral reflexes), the rat was transferred to a sound proof room and tracheotomized (a cannula surgically inserted into the trachea) to facilitate mechanical ventilation. Artificial ventilation with room air was adjusted according to the body weight of the rat, with a respiratory rate of 80 ~ 90 breaths/minute and a tidal volume of 3 ~ 4.5 millilitres.
Throughout the experiment, anaesthesia status was monitored through continuous recording of the electrocardiogram (ECG), and electromyogram (EMG) activity from forearm muscles on an oscilloscope as well as through a speaker. Depth of anesthesia was also checked at regular hourly intervals by checking for the presence of withdrawal reflexes to noxious stimuli by pinching of the forepaw and the presence of pupillary dilatation.
Electrophysiological recording
Under deep anesthesia a midline incision was made in the skin from top of the rat's skull then cleared of any connective tissues to expose the skull. The pinnae were removed and the ear canals were transected close to the tympanic membrane. A small hole was drilled in the skull at a point over the frontal lobes to allow a metal bar to be affixed by a screw through the bar and into the skull hole. The screwmetal bar system was fortified by a dental acrylic. The position of stabilising metal bar could easily be oriented to give the rat’s head any desired angle. A second hole, approximately 3 mm in diameter, was then drilled over the left occipital lobe of the rat’s skull to allow for an insertion of the recording electrode which would be advanced through the overlying cortex to the ICc. Siliconeoil was applied to the exposed surface of the cortex to prevent it drying out.
Parylene coated tungsten tip microelectrodes (AM Systems, Inc., WA, USA) with impedance of 2 MΩ were mounted on a micromanipulator mounted on a remotelycontrolled steepingmotor drive assembly on a series of translators and goniometers, and the micromanipulator was controlled electronically from outside the soundproof room. The remotely controlled microelectrode was placed to contact the left cortical surface and then advanced through the cortex to the left IC. Microelectrode penetrations were made into the cortex around positions ~1.1 mm anterior and ~1.7 mm lateral of lambda.
Data collection
Action potentials recording
The remotelycontrolled microelectrode was slowly advanced from outside the sound proof room in 5~10 micrometer steps through the cortex to locate a wellisolated cell in the ICc. Identification of the recording locus in the ICc was facilitated (a) by the observation of the expected tonotopic organization as the electrode was advanced through the putative ICc, and (b) the pattern of short latency robust responses to tone stimuli at different frequencies and intensities both binaurally and monaurally.
Action Potentials (APs) were recorded only from wellisolated single cells in ICc, with a signaltonoise ratio of at least 4:1 between the wellisolated APs and other activity. The output of the microelectrode was first passed amplifiers (preamplifier with the gain of 10, and amplifier with the gain of 100) to the bandpass filter (cutoff frequencies from 100 Hz and 10 kHz) then through the graphic equaliser for shaping the pulse of the APs, and the APs were also observed by an oscilloscope. These APs were digitized by a Schmitt trigger based level detection circuit for a realtime recording. The realtime data with timestamp information were both saved to the files on to a Personal Computer. In all recordings, the AP waveform was monitored continuously online to ensure recording fidelity and that there was no contamination by activity from other cells.
Acoustic stimuli and determination of the characteristic frequency of a cell
Acoustic stimuli were generated by a computer controlled two channel digital synthesiser systems (TDT System II), which were cascaded with digital attenuators. Outputs from the digital attenuators were separately routed to two input channels of HD535 Sennheiser speaker in homemade couplers. The speakers were connected to two sounddelivery tubes, which were placed in the rat’s external auditory meatus of both ears.
Once a cell was sufficiently well isolated, the characteristic frequency (CF; frequency of greatest sensitivity) and the threshold at CF were identified from audiovisual criteria with manual control of the tonal stimuli. This was confirmed by recording responses across a very wide frequencyintensity matrix using gated tone bursts, shaped with a 4 ms (milliseconds) risefall time, with variable duration between 50–200 ms depending on the test cells response profile. Cells with only onset components were tested with 50 ms tone bursts, cells with sustained components were tested with 100 or 150 ms tone bursts, and cells with late components were tested with 200 ms tone bursts. (Onset component were classed as responses occurring in the first 50 ms of tone burst, Sustained components were responses from 100 ~ 150 ms, and Late components were responses from 200 ms.)
Determining ILD sensitivity functions
Electrophysiological recordings of ILD sensitivity were obtained from a total of 208 cells from the ICc, (see Additional file 1). The stimuli were always CF stimuli gated as described above, with variable duration between 50–200 ms depending on the test cells response profile. The duration of each tone was equal to 50 ms for cells with only onset components, 100 or 150 ms for cells with sustained components, or 200 ms for cells with late components.
ILD sensitivity was tested using the Average Binaural Intensityconstant method [60–62]. In this method the average binaural level is maintained constant at some base level and the sound levels in the two ears are systematically varied around this base level to mimic the origin of a sound source from different positions around the head [63]. In this study ILDs varied from being 30 dB louder in one ear (i.e., Ear 1 = ABI +15 dB, Ear 2 = ABI15 dB), through 0 dB ILD (both ears = ABI) through to being 30 dB louder in the other ear (i.e., Ear 1 = ABI15 dB, Ear 2 = ABI+15 dB), in 5 dB ILD steps. These ILDs are designated as ranging from +30dB SPL to 30dB with 5dB intervals, and are calculated by the difference between contralateral and ipsilateral levels. Thus, positive ILDs indicate that the sound was louder in the contralateral ear and negative ILDs indicate that the sound was louder in the ipsilateral ear. In each block, the stimuli were alternated so that a larger contralateral intensity was followed by a larger ipsilateral intensity to prevent the cell from fatiguing.
The ABI constant method has been used in similar studies in different brain regions [60–62]. An alternative to ABI constant method, excitatory monaural intensity (EMI) – constant method, has been discussed in the context of recordings from primary auditory cortex, and may have some advantages for interpreting peaked ILD function data but not for nonmonotonic functions [64].
Notes
Abbreviations
 ABI:

Average binaural intensity
 APs:

Action potentials
 CCC:

Cophenetic correlation coefficient
 CF:

Characteristic frequency
 EI:

Excitatory to the ipsilateral ear and inhibitory to the contralateral ear
 IE:

Inhibitory to the ipsilateral ear and excitatory to the contralateral ear
 ILDs:

Interaural level differences
 PC:

Principal component
 PCA:

Principal component analysis
 UTPM:

Unit total probability mass.
Declarations
Acknowledgements
Funding for the study was obtained from the National Health and Medical Research Council of Australia and from scholarships to BU from Monash University. We acknowledge assistance in data collection from Mark Ford and Kate Worland.
Authors’ Affiliations
References
 Buonomano DV: The biology of time across different scales. Nat Chem Biol. 2007, 3 (10): 594597. 10.1038/nchembio1007594.View ArticlePubMedGoogle Scholar
 Irvine DR: Interaural intensity differences in the cat: changes in sound pressure level at the two ears associated with azimuthal displacements in the frontal horizontal plane. Hear Res. 1987, 26 (3): 267286. 10.1016/03785955(87)900633.View ArticlePubMedGoogle Scholar
 Hartmann WM, Rakerd B: Interaural level differences: Diffraction and localization by human listeners. J Acoust Soc Am. 2011, 129 (4): 2622.View ArticleGoogle Scholar
 Park TJ, Pollak GD: GABA shapes sensitivity to interaural intensity disparities in the mustache bat's inferior colliculus: implications for encoding sound location. J Neurosci. 1993, 13 (5): 20502067.PubMedGoogle Scholar
 Park TJ, Klug A, Holinstat M, Grothe B: Interaural level difference processing in the lateral superior olive and the inferior colliculus. Journal Of Neurophysiology. 2004, 92 (1): 289301. 10.1152/jn.00961.2003.View ArticlePubMedGoogle Scholar
 Aminian M, Aminian F: Neuralnetwork based analogcircuit fault diagnosis using wavelet transform as preprocessor. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on [see also Circuits and Systems II: Express Briefs, IEEE Transactions on]. 2000, 47 (2): 151156.Google Scholar
 Buiatti M, van Vreeswijk C: Variance normalisation: a key mechanism for temporal adaptation in natural vision?. Vision Res. 2003, 43 (17): 18951906. 10.1016/S00426989(03)003122.View ArticlePubMedGoogle Scholar
 Kosugi Y, Sase M, Kuwatani H, Kinoshita N, Momose T, Nishikawa J, Watanabe T: Neural network mapping for nonlinear stereotactic normalization of brain MR images. J Comput Assist Tomogr. 1993, 17 (3): 455460. 10.1097/0000472819930500000023.View ArticlePubMedGoogle Scholar
 Ahmad Z, Balsamo LM, Sachs BC, Xu B, Gaillard WD: Auditory comprehension of language in young children: neural networks identified with fMRI. Neurology. 2003, 60 (10): 15981605. 10.1212/01.WNL.0000059865.32155.86.View ArticlePubMedGoogle Scholar
 Skoog G, Ciecka J: Probability mass functions for additional years of labor market activity induced by the Markov (incrementdecrement) model. Econ Lett. 2002, 77 (3): 425431. 10.1016/S01651765(02)001593.View ArticleGoogle Scholar
 Kim HC, Kim D, Yang Bang S: Face recognition using the mixtureofeigenfaces method. Pattern Recogn Lett. 2002, 23 (13): 15491558. 10.1016/S01678655(02)001198.View ArticleGoogle Scholar
 Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics (Oxford University Press, England). 2003, 19 (2): 185193. 10.1093/bioinformatics/19.2.185.View ArticleGoogle Scholar
 Dougherty ER, Barrera J, Brun M, Kim S, Cesar RM, Chen Y, Bittner M, Trent JM: Inference from clustering with application to geneexpression microarrays. J Comput Biol. 2002, 9 (1): 105126. 10.1089/10665270252833217.View ArticlePubMedGoogle Scholar
 Kasturi J, Acharya R, Ramanathan M: An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics (Oxford, England). 2003, 19 (4): 449458. 10.1093/bioinformatics/btg020.View ArticleGoogle Scholar
 Leung YF, Cavalieri D: Fundamentals of cDNA microarray data analysis. Trends Genet. 2003, 19 (11): 649659. 10.1016/j.tig.2003.09.015.View ArticlePubMedGoogle Scholar
 Quackenbush J: Microarray data normalization and transformation. Nat Genet. 2002, 32 (Suppl): 496501.View ArticlePubMedGoogle Scholar
 Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH: Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001, 29 (12): 25492557. 10.1093/nar/29.12.2549.PubMed CentralView ArticlePubMedGoogle Scholar
 Venet D: MatArray: a Matlab toolbox for microarray data. Bioinformatics. 2003, 19 (5): 659660. 10.1093/bioinformatics/btg046.View ArticlePubMedGoogle Scholar
 Weiner J, Zimmerman CU, Gohlmann HWH, Herrmann R: Transcription profiles of the bacterium Mycoplasma pneumoniae grown at different temperatures. Nucleic Acids Res. 2003, 31 (21): 63066320. 10.1093/nar/gkg841.PubMed CentralView ArticlePubMedGoogle Scholar
 Workman C, Jensen LJ, Jarmer H, Berka R, Gautier L, Nielser HB, Saxild HH, Nielsen C, Brunak S, Knudsen S: A new nonlinear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 2002, 3 (9): 116.View ArticleGoogle Scholar
 Sharma S: Applied Multivariate Techniques. 1996, New York: J. WileyGoogle Scholar
 Lattin JM, Green PE, Carroll JD: Analyzing Multivariate Data. 2003, Thomson Brooks/Cole: Pacific Grove, CAGoogle Scholar
 Jain AK, Murty MN, Flynn PJ: Data Clustering: a review. ACM Comput Surv. 1999, 31 (3): 264323. 10.1145/331499.331504.View ArticleGoogle Scholar
 Hida M, Satoh H, Mitsui T: Comparative study of a cluster analysis and a principalcomponent analysis using a polarized imaging technique for discriminating adhesive cloth tapes. Analytical Sciences: The International Journal Of The Japan Society For Analytical Chemistry. 2002, 18 (6): 717722. 10.2116/analsci.18.717.View ArticleGoogle Scholar
 Forgacs E, Cserhati T: Use of cluster and principal component analysis in quantitative structureretention relationship study. Anal Chim Acta. 1997, 348 (1–3): 481487.View ArticleGoogle Scholar
 Zheng P, Harrington PB, Craig A, Fleming R: Variable alignment of high resolution data by cluster analysis. Anal Chim Acta. 1995, 310 (3): 485492. 10.1016/00032670(95)00149T.View ArticleGoogle Scholar
 Kirew DB, Chretien JR, Bernard P, Ros F: Application of Kohonen Neural Networks in classification of biologically active compounds. SAR QSAR Environ Res. 1998, 8 (1–2): 93107.View ArticlePubMedGoogle Scholar
 Horn CC, Friedman MI: Detection of single unit activity from the rat vagus using cluster analysis of principal components. J Neurosci Methods. 2003, 122 (2): 141147. 10.1016/S01650270(02)003047.View ArticlePubMedGoogle Scholar
 Peterson LE: Partitioning largesample microarraybased gene expression profiles using principal components analysis. Comput Methods Programs Biomed. 2003, 70 (2): 107119. 10.1016/S01692607(02)000093.View ArticlePubMedGoogle Scholar
 Knudsen S, Workman C, SicheritzPonten T, Friis C: Automated analysis of DNA microarray data. Nucleic Acids Res. 2003, 31 (13): 34713476. 10.1093/nar/gkg629.PubMed CentralView ArticlePubMedGoogle Scholar
 Parsons L, Haque E, Liu H: Subspace clustering for high dimensional data: a review SIGKDD Explor Newsl. 2004, 6 (1): 90105.Google Scholar
 Geschwind DH, Gregg JP: Microarrays for the Neurosciences: An Essential Guide. 2002, Cambridge, Mass: MIT PressGoogle Scholar
 Uragun B, Rajan R: Tenth International Conference on Machine Learning and Applications (ICMLA): 18–21 December 2011 2011. Developing An Appropriate Data Normalization Method. 2011, Honolulu, Hawaii: IEEEView ArticleGoogle Scholar
 MicheliTzanakou E: Supervised and Unsupervised Pattern Recognition: Feature Extraction and Computational Intelligence. 2000, Boca Raton, FL: CRC PressGoogle Scholar
 Hoglund M, Gisselsson D, Sall T, Mitelman F: Coping with complexity: multivariate analysis of tumor karyotypes. Cancer Genet Cytogenet. 2002, 135 (2): 103109. 10.1016/S01654608(01)006458.View ArticlePubMedGoogle Scholar
 Sahambi HS, Khorasani K: A neuralnetwork appearancebased 3D object recognition using independent component analysis. Neural Networks, IEEE Transactions on. 2003, 14 (1045–9227): 138149.View ArticleGoogle Scholar
 Gallagher M, Downs T: Visualization of learning in multilayer perceptron networks using principal component analysis. Systems, Man and Cybernetics, Part B, IEEE Transactions on. 2003, 33 (1083–4419): 2834.View ArticleGoogle Scholar
 Manly BFJ: Multivariate Statistical Methods: A Primer. 1994, London: Chapman and Hall, 2Google Scholar
 Jackson JE: A User's Guide to Principal Components. 1991, New York: WileyView ArticleGoogle Scholar
 Franklin SB, Gibson DJ, Robertson PA, Pohlmann JT, Fralish JS: Parallel Analysis: a method for determining significant principal components. J Veg Sci. 1995, 6: 99106. 10.2307/3236261.View ArticleGoogle Scholar
 Johnson RA, Wichern DW: Applied Multivariate Statistical Analysis. 2002, Prentice Hall: Upper Saddle River, NJ, 5Google Scholar
 Sidorov IA, Hosack DA, Gee D, Yang J, Cam MC, Lempicki RA, Dimitrov DS: Oligonucleotide microarray data distribution and normalization. Inform Sci. 2002, 146 (1–4): 67.View ArticleGoogle Scholar
 Karhunen J, Oja E, Wang L, Vigario R, Joutsensalo J: A class of neural networks for independent component analysis. Neural Networks, IEEE Transactions on. 1997, 8 (1045–9227): 486504.View ArticleGoogle Scholar
 Laubach M, Shuler M, Nicolelis MAL: Independent component analyses for quantifying neuronal ensemble interactions. J Neurosci Methods. 1999, 94 (1): 141154. 10.1016/S01650270(99)001314.View ArticlePubMedGoogle Scholar
 Kerschen G, Golinval JC: Nonlinear generalization of principal component analysis: from a global to a local approach. J Sound Vib. 2002, 254 (5): 867876. 10.1006/jsvi.2001.4129.View ArticleGoogle Scholar
 Cao LJ, Chua KS, Chong WK, Lee HP, Gu QM: A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing. 2003, 55 (1–2): 321336.Google Scholar
 Jankowski CR, Vo HDH, Lippmann RP: A comparison of signal processing front ends for automatic word recognition. Speech and Audio Processing, IEEE Transactions on. 1995, 3 (1063–6676): 286293.View ArticleGoogle Scholar
 Kim HC, Kim D, Bang SY: An efficient model order selection for PCA mixture model. Pattern Recogn Lett. 2003, 24 (9–10): 13851393.View ArticleGoogle Scholar
 Jain AK, Duin RPW, Mao J: Statistical pattern recognition: a review. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2000, 22 (1): 437. 10.1109/34.824819.View ArticleGoogle Scholar
 Xu R, Wunsch D: Survey of clustering algorithms. Neural Networks. IEEE Transactions on. 2005, 16 (3): 645.Google Scholar
 Aldenderfer MS, Blashfield RK: Cluster Analysis. 1984, Beverly Hills: Sage PublicationsGoogle Scholar
 Aitkin L: The Auditory Midbrain: Structure and Function in the Central Auditory Pathway. 1986, Clifton, N.J.: Humana PressView ArticleGoogle Scholar
 Aitkin LM, Irvine DR, Nelson JE, Merzenich MM, Clarey JC: Frequency representation in the auditory midbrain and forebrain of a marsupial, the northern native cat (Dasyurus hallucatus). Brain Behav Evol. 1986, 29 (1–2): 1728.View ArticlePubMedGoogle Scholar
 Aitkin L: The Auditory Cortex: Structural And Functional Bases of Auditory Perception. 1990, London: Chapman & Hall, 1Google Scholar
 Phillips DP, Irvine DR: Responses of single neurons in physiologically defined area AI of cat cerebral cortex: sensitivity to interaural intensity differences. Hear Res. 1981, 4 (3–4): 299307.View ArticlePubMedGoogle Scholar
 Grothe B, Pecka M, McAlpine D: Mechanisms of sound localization in mammals. Physiol Rev. 2010, 90 (3): 9831012. 10.1152/physrev.00026.2009.View ArticlePubMedGoogle Scholar
 Cliff N: Analyzing Multivariate Data. 1987, Harcourt Brace Jovanovich: San DiegoGoogle Scholar
 Wang X, Paliwal KK: Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition. Pattern Recogn. 2003, 36 (10): 24292439. 10.1016/S00313203(03)00044X.View ArticleGoogle Scholar
 Giri NC: Multivariate statistical analysis, 2nd, rev. and expand edn. 2004, New York: Marcel DekkerGoogle Scholar
 Semple MN, Kitzes LM: Binaural processing of sound pressure level in the inferior colliculus. J Neurophysiol. 1987, 57 (4): 11301147.PubMedGoogle Scholar
 Irvine DR, Gago G: Binaural interaction in highfrequency neurons in inferior colliculus of the cat: effects of variations in sound pressure level on sensitivity to interaural intensity differences. J Neurophysiol. 1990, 63 (3): 570591.PubMedGoogle Scholar
 Irvine DR, Rajan R, Aitkin LM: Sensitivity to interaural intensity differences of neurons in primary auditory cortex of the cat I. types of sensitivity and effects of variations in sound pressure level. J Neurophysiol. 1996, 75 (1): 7596.PubMedGoogle Scholar
 Irvine DR: Progress in Sensory Physiology, vol. 7. 1981, Berli; New York: SpringerVerlagGoogle Scholar
 Irvine DR: A comparison of two methods for the measurement of neural sensitivity to interaural intensity differences. Hear Res. 1987, 30 (2–3): 169179.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.