Saliency maps of attention models with different numbers of bases. (a) source images; (b) human eye tracking; (c) saliency maps of Itti’s attention model; (d) saliency maps obtained by the attention model with 100 bases (16 feature maps); (e) with 392 bases (50 feature maps); (f) with 576 bases (64 feature maps); (g) saliency maps of a fully connected network (392 bases, 1 feature map); (h) saliency maps of a randomly connected network (392 bases, 50 feature maps). The attention models in this paper simulate bottom-up saliency detection. Thus, their results are not always identical to the results of human eye tracking, which sometimes involves top-down attention. In the last row, our model with 576 filters (64 invariant features) acts like a contour extractor that can suppress textures. It detects most contours of the target despite the strongly cluttered background. All the filters are learned from the same training dataset.