Data and statistical analysis
The natural images were 1008 gray-scale images of size 1024 × 1536 pixels from van Hateren's database, available at http://hlab.phys.rug.nl/imlib/index.html (category "deblurred") [8]. We manually chose natural images in the narrower sense, i.e. only wildlife scenes. From the source images, 50,000 image patches of size 24 × 24 pixels were randomly extracted. The mean grey value of each image patch was subtracted and the pixel values were rescaled to unit variance. The resulting image patch will be denoted by I(x, y).
The complex-cell model was similar to our previous work [14]. The filter bank consisted of a number of complex cells arranged on a 6 × 6 grid. Complex-cell responses x
k
to natural images were modelled with a classical energy model:
where
and
are even- and odd-symmetric Gabor receptive fields whose energies are pooled together in the complex cell. The complex cells had 6 × 6 = 36 different spatial locations, and at each location, four different preferred orientations and three different frequency bands. The aspect ratio was fixed to 1.5 and frequency bandwidth to 1.5 octaves, which implied an orientation bandwidth of 37°, according to the definitions in [8]. The frequency tiling of the Gabor filters is shown in Figure 1, in which all the filters W were normalized to unit norm for visualization purposes. The actual normalization we used in the experiments consisted of standardizing the variances of the complex cell outputs so that they were equal to unity for natural image input. The number of complex cells totalled K = 36 × 4 × 3 = 432. Note, however, that in Experiment 1 we only used a single frequency band.
Independent component analysis (ICA) was performed on the vector x = (x1,...,x
K
) using the FastICA algorithm [15]. The orthogonalization approach was symmetric. Different nonlinearities g were used, see Table 1. Thus we learned (estimated) a linear decomposition of the form
or in vector form
where the vector a
i
= (a1i,...,a
ki
) gives a higher-order basis vector. The s
i
define the values of the higher-order features in the third cortical processing stage.
Note that the signs of the basis vectors are not defined by the ICA model [4], i.e. the model does not distinguish between a
i
and -a
i
because any change in sign of the basis vector can be cancelled by changing the sign of s
i
accordingly. Here, we defined the sign for each vector a
i
so that the sign of the element with the maximal absolute value was positive.
To obtain a baseline with which to compare our results, and to show which part of the results is due to the statistical properties of natural images instead of some intrinsic properties of our filterbank and analysis methods, we did exactly the same kind of analysis for 24 × 24 image patches that consisted of white Gaussian noise, i.e. the gray-scale value in each pixel was randomly and independently drawn from a Gaussian distribution of zero mean and unit variance. The white Gaussian noise input provides a "chance level" for any quantities computed from the ICA results.
Analysis of the ICA results
We visualized the resulting higher-order basis vectors a
i
following [14] by plotting an ellipse at each centrepoint of complex cells. The orientation of the ellipse is the orientation of the complex cell k, and the brightness of the ellipse is proportional to the a
ki
coefficient of the basis vector a
i
, using a gray-scale coding of coefficient values. In Experiment 1, i.e. the case with a single frequency band, we used this method directly to visualize each higher-order basis vector in a single display. In Experiment 2, i.e. the multifrequency case, we visualized each frequency band separately.
In Experiment 2, we are interested in the frequency pooling of complex cells in different higher-order features. We quantified the pooling over frequencies using a simple measure defined as follows. Let us denote by a
i
(x, y, θ, f
n
) the coefficient in the higher-order basis vector a
i
that corresponds to the complex cell with spatial location (x, y), orientation θ and preferred frequency f
n
. We computed a quantity which is similar to the sums of correlations of the coefficients over the three frequency bands, but normalized in a slightly different way. This measure P
i
was defined as follows:
where the normalization constant C
m
is defined as
and likewise for C
n
.
For further analysis of the estimated basis vectors, we defined the preferred orientation of a higher-order feature. First, let us define for a higher-order feature of index i the hot-spot (x
i
, y
i
)* as the centre location (x, y) of complex cells where the higher-order component s
i
generates the maximum amount of activity. That is, we sum the elements of a
i
that correspond to a single spatial location, and choose the largest sum. This allows us to define the tuning to a given orientation of a higher-order feature i by summing over the elements of a
i
that correspond to the spatial hotspot and a given orientation; the preferred orientation is the orientation for which this sum is maximized. We also computed the length of a higher-order feature as described in [14].
It is also possible to perform an image synthesis from a higher-order basis vector. However, the mapping from image to complex-cell outputs is not one-to-one. This means that the generation of the image is not uniquely defined given the activities of higher-order features alone. A unique definition can be achieved by constraining the phases of the complex cells. We assume that only odd-symmetric Gabor filters are active. Furthermore, we make the simplifying assumptions that the receptive fields W in simple cells are equal to the corresponding basis vectors, and that all the elements in the higher-order basis vector are non-negative (or small enough to be ignored). Then, the synthesized image
for higher-order basis vector a
i
is given by
where the square root cancels the squaring operation in the computation of complex-cell responses, and H denotes the set of indices that correspond to complex cells of the preferred orientation at the hotspot. Negative values of a
ki
were set to zero in this synthesis formula.