Learned representations illustrated for speech and natural sounds. FastICA training is first done on patches of 80ms x 32 cochlear channels of the envelopes coming from a 128 channels cochleagram (Level L0). Then, patches of 160ms x 64 cochlear channels are created at level L1 with a concatenation through time and space of the learned L0 features. FastICA is then performed on these larger patches to generate the new L1 representations. The same procedure is repeated for level L2.