The representation of visual depth perception based on the plenoptic function in the retina and its neural computation in visual cortex V1
© Songnian et al.; licensee BioMed Central Ltd. 2014
Received: 12 April 2013
Accepted: 25 March 2014
Published: 23 April 2014
How it is possible to “faithfully” represent a three-dimensional stereoscopic scene using Cartesian coordinates on a plane, and how three-dimensional perceptions differ between an actual scene and an image of the same scene are questions that have not yet been explored in depth. They seem like commonplace phenomena, but in fact, they are important and difficult issues for visual information processing, neural computation, physics, psychology, cognitive psychology, and neuroscience.
The results of this study show that the use of plenoptic (or all-optical) functions and their dual plane parameterizations can not only explain the nature of information processing from the retina to the primary visual cortex and, in particular, the characteristics of the visual pathway’s optical system and its affine transformation, but they can also clarify the reason why the vanishing point and line exist in a visual image. In addition, they can better explain the reasons why a three-dimensional Cartesian coordinate system can be introduced into the two-dimensional plane to express a real three-dimensional scene.
1. We introduce two different mathematical expressions of the plenoptic functions, P w and P v that can describe the objective world. We also analyze the differences between these two functions when describing visual depth perception, that is, the difference between how these two functions obtain the depth information of an external scene.
2. The main results include a basic method for introducing a three-dimensional Cartesian coordinate system into a two-dimensional plane to express the depth of a scene, its constraints, and algorithmic implementation. In particular, we include a method to separate the plenoptic function and proceed with the corresponding transformation in the retina and visual cortex.
3. We propose that size constancy, the vanishing point, and vanishing line form the basis of visual perception of the outside world, and that the introduction of a three-dimensional Cartesian coordinate system into a two dimensional plane reveals a corresponding mapping between a retinal image and the vanishing point and line.
How a three-dimensional scene can be “faithfully” expressed in a (two-dimensional) plane (e.g., TV), that is to say, how it can be “faithfully” represented using a planar Cartesian coordinate system, and what the differences are between the stereoscopic perception of an actual scene and its two-dimensional image are important issues in visual information processing research, neural computation, psychophysics, and neuroscience.
At the cellular level, previous studies have shown that in the V1 cortex, only complex cells are able to respond to absolute parallax . In the V2 cortex, there are some cortical neurons that respond to relative parallax  and parallax-sensitive neurons can be described by specific and generalized energy models [3, 4]. Studies have been carried out both in the ventral and dorsal streams of the visual cortex, mainly to detect neurons that can respond to depth perception through specific signal stimuli. The binocular visual system is able to perceive depth information using binocular disparity, and one of the founders of the computational theory of vision, Marr, proposed a classic reconstruction algorithm for three-dimensional images . Julesz’s experiments on random dot stereograms (RDSs) led to a psychophysical study on the binocular disparity that forms stereoscopic vision. Its purpose was to show how the human brain deals with depth information [6, 7]. In other words, the task was to explore how human vision extracted stereoscopic information from a visual scene contained in a Cartesian coordinate system and depicted on a two-dimensional imaging plane.
A three-dimensional scene “faithfully” represented in a plane seems to be commonplace phenomenon, yet the mechanism for this has never been explored. It is, however, a basic theoretical problem and is worthy of study in depth, not only because it concerns the geometric and physical properties of planes and space and is closely related to the three-dimensional perception of human vision, but also because it is closely related to the problem of stereoscopic perception in computer vision, robotics navigation, and visual cognitive psychology.
In fact, there are many similar phenomena, such as optical illusions generated using optics, geometry, physiology, psychology, and other means. Optical illusions are largely due to the uncertainty caused by the bimodal graphics in a two-dimensional plane and uncertainty during visual information processing in the brain. The illusions, such as bimodal images for instance (vase and face, girl and grandmother, Escher’s “waterfall” picture, and so on) and Additional file 1 disappear when the images are placed in a real three-dimensional space. Additional files 2, 3, 4, and 5 show the lifelike effect of three-dimensional perception, can be more intuitively reflect the meaning f this article.
Marr pointed out that the essence of visual information processing is to discover what and where objects are in space . F. Crick also stated that visual information processing is a construction process . In their book Seeing, Frisby and Stone defined how “seeing” is a particularly difficult task. They analyzed research from computational vision, psychophysics, neurobiology, neuroanatomy, brain imaging, modeling methods, image statistics, multiple representations, active vision, Bayesian theory, and the philosophy of visual information processing. The understanding of “seeing” among these fields is not the same, each focuses on different aspects of “seeing”, and each has their own understanding of the “the essence of seeing” (for details, see Chapter 23 of ).
As is known, any point in space can be represented by a Cartesian coordinate system (x, y, z) at a certain moment, t, and an object at this point can be expressed using light intensity V x , V y , V z and color-related wavelength λ. In this way, one can define a function P w , Pw = Pw(x, y, z; λ, V x , V y , V z ; t), that completely represents an object, and is also a good description of the objective external world. When human vision processes an object, the optical axis of the eyeball is consistent with the z axis (the depth axis), such that the visual imaging plane is perpendicular to the optical axis. This reduces one variable from the function, P w , and leaves only seven variables that form the plenoptic function proposed by Adelson and Bergen in the study of human primary visual information processing .
The intensity of each ray can be described as a function of the spatial viewing angle; that is, the wavelength, time, and light intensity of the observation position (the expression is Pv = Pv(θ, ϕ, λ, t, V ox , V oy , V oz ) in spherical coordinates and Pv = Pv(x, y; λ, V ox , V oy , V oz ; t) in Cartesian coordinates) captures all that the human eye or optical device may “see”, including ambient light. Therefore, the plenoptic function and full holographic representation of the visible world are equivalent. As for the different definitions of the plenoptic function and its mathematical expression, we will discuss this in some detail in the discussion [10, 11].
We should note that the plenoptic function not only reveals how humans “see” the external world, but also intuitively and concisely describes the information processing that occurs between the retina and the primary visual cortex. Marr pointed out that the true nature of information processing in “seeing” is to discover where and what is in space. “Where” in space can be located by a Cartesian rectangular coordinate system (i.e., x, y, and z). “What” is in this position may be perceived through the emitted or reflected structure of the light ray from the “object” to the viewer’s eyes, These correspond to the intensity V x , V y , V z and wavelength λ of light at that location that carry information about the contour, shape, and color of the object. Thus, it can be seen that the plenoptic function is a good description of the external world. When Adelson and Bergen proposed the plenoptic function, their intentions were to solve the problem related to the corresponding points in computer vision. It was not expected that the study would promote the birth and development of the new discipline of computational photography [12–16]. To adapt to the needs of different disciplines, there are two basic formulae for the plenoptic function, one describes an object P W = P(x, y, z; V x , V y , V z , λ, t) and the other describes the viewer’s perception of the object. In such a case, the optical axis (or possibly the visual axis) of human vision and the coordinate axis z are consistent, thereby eliminating the need for coordinate axis z, namely: Pv = P(x, y; V x , V y , V z , λ, t). “Seeing” is the association between the observer and the object, where the coordinates of an observer’s position are x, y, and z, and the light intensities that an object emits or reflects to the observer’s eye are V ox , V oy , V oz , representing the light intensity information of the object itself. The intensity of light is related to the number of excited photo-sensitive cells in the retina and their activity levels. As long as the angles of the incident light θ and φ are recorded, the plenoptic function can be simplified as Pv = P(x, y; θ, φ, λ, t) such that a dual-plane (x, y) and (θ, φ) parameterization becomes possible. This parameterization is used in this paper and is important for processing the visual information of an image to reveal its deep meaning.
In formula (3), z is the distance of the Z axis, reflecting depth information. It is thus clear that the coincidence of the optical axis with the coordinate axis Z is a very effective constraint. It is not imposed artificially, but is determined by the optics of the visual system.
When human eyes look into the distance, the fixation point can change in position, and this forms a horizontal vanishing line, as shown in Figure 2. This line is known as the infinity line and is composed of countless vanishing points [20–23]. Similarly, it is also an objective phenomenon that occurs in the visual perception of the external world. It occurs at the intersection of the sky and ground, and provides a broader perspective.
Mapping between the scene and visual image
The above brief description of previous research aims to introduce the problem of how a three-dimensional Cartesian coordinate system converted into a two-dimensional plane is able to express a real three-dimensional scene. This also explains why visual images in the retina can provide three-dimensional scene information to an observer. However, how the Cartesian coordinate system in a two-dimensional plane can “faithfully” represent a three-dimensional scene is not known, even though the problem seems trivial. The difference between the stereoscopic perception of actual scenes and a scene in a two-dimensional plane is an important issue in visual information processing, neural computation, psychophysics, and neuroscience, and is also a main research topic in image processing, three-dimensional display methods, and computer vision.
The actual loss of depth information along the z-axis, or the information loss of visual depth perception, is zloss = z - z p = z - 0.866z = z(1 - 0.866) = 0.134z. Naturally, α can have different values and indicate different depths. This is consistent with our experience of visual perception, although we usually pay no attention to it.
That is, through the plenoptic function Pw(x, y, z; V x , V y , V z ; λ, t), Iwr forms a visual image on the retina. Conversely, the visual image matches the external world through the plenoptic function Pv(x, y, z′; V x , V y , V z ; λ, t), and the loss of image information between Iwr and is approximately z cos α.
Of course, this is largely a proof of principle, but this discussion demonstrates that it can be used for studies in visual information processing.
It has been confirmed in many eye tracker tests, including psychophysical experiments that the visual system can adjust with eye movements to find a suitable viewing angle and orientation so that the loss of information is minimal [24–26]. This is a fundamental property of the visual system and means that forming visual images on the retina and in the V1 cortex does not require inversion and reconstruction, possibly because the computational cost is too high to solve its inverse, an ill-posed problem without a unique solution.
Loss of information due to the introduction of a three-dimensional Cartesian coordinate system in the plane
Role of the vanishing point in stereoscopic visual perception
The existence of the vanishing point is the fundamental reason why a Cartesian three-dimensional rectangular coordinate system can be drawn in a two-dimensional plane. As mentioned above, it can be easily seen that the formation of vanishing points underlies the optical system of human vision (in principle, see Figure 2). It is also the basis of an affine transformation by which the human visual system is able to perceive the three-dimensional external world, as illustrated in the case of railroad tracks that converge to a single point, forming of a vanishing point (again, in principle, see Figure 2).
Dual-plane parameterization of the plenoptic function for neural computation of early vision
We know that each pixel of a two-dimensional digital image is a record of the intensity of all light that reaches this point, but does not distinguish between the directions of the light rays. It is just a projection of the light field of the three-dimensional structure, with lost information about phase and direction. Unlike this, the light field refers to the collection of light from any point in space in an arbitrary direction. It comprises all light from different angles that makes a contribution to each pixel. If it takes into account how the angle of light changes with time (t), it is a dynamic light field. The plenoptic function is a good mathematical description of the dynamic light field. However, questions remain regarding how the human visual system perceives and processes the structural information of the dynamic light field as well as how it receives three-dimensional information from the image on the retina.
Studies by Zeki, Livingstone et al. have indicated that in the human visual system color information is transmitted in a separate channel in the cerebral cortex [27–29]. Therefore, wavelength λ can be separated from the plenoptic function. In addition, position, direction, and orientation information can also be separated. In this way, without considering time variation and separating dimensions, the seven-dimensional plenoptic function Pv = Pv(θ, ϕ, λ, t, V x , V y , V z ) can describe and reconstruct plenoptic images, or visual information of the objective world with different combinations of variables.
Three-dimensional visual perceptions of images in a two-dimensional plane
where S is the height of the object on the fundamental plane (i.e., on the x–z plane in Figure 2), δ is the viewing angle of the camera, D is the distance (i.e., distance along the z-axis) between the photographer and the object, or the depth information, and A is the scaling factor of the retina. Formula (8) is used to reconstruct a three-dimensional scene from an image in a two-dimensional plane. Figure 2 is an optical model of the affine transformation of the retina.
The main purpose of the calculation example is to show that we can use the vanishing point, size constancy and affine transformation model in Figure 2 to calculate the depth value in a picture taken of an actual scene. A comparison of the calculation results with actual measurements reveals that the vanishing point reflecting the basic characteristics of the optical system of human vision and size constancy reflecting cognitive psychological characteristics are important in accessing depth information in a two-dimensional picture.
Specific calculations are carried out employing two methods. The first method employs psychological methods based on formulae (8) and (9), and the second method employs an affine transformation based on an optical model of vision (Figure 2). Known parameters required for the calculation are the height of the camera from the ground (0.87 m) and the horizontal distance between the photographer and first white line on the ground (see Figure 7) (D = 6.40 m). The camera is a Nikon-E3700CCD, and the image size is 2048 × 1536 pixels. The calculation includes the vanishing point, the vanishing line, the height of the tree, and the line whose change in depth value is fastest on the ground portion of the image plane. Specific calculations are found in the literature [36, 37]. Naturally, algorithms of computer vision can also be used [38–42].
The results of both calculation methods are consistent with actual measurement results, showing that the calculation methods are reasonable and reflect the consistency between visual psychology and the optical system of visual pathways in the depth perception of an actual scene. More importantly, the results show that a two-dimensional image can contain rich three-dimensional information that is perceived by the visual system itself.
where α = (θ - 90∘) is the included angle between the z-axis and z′ -axis (namely the optical axis or gaze direction, see Figure 3) when looking at the image. Hence, we use formula (9) to correct the result of the depth information given by method 2, and these corrected values are also given in Figure 8. After taking into consideration information loss, the corrected value roughly reflects the visual depth perception obtained from the image (or two-dimensional plane).
The proposed method is completely different from three-dimensional image reconstruction that uses binocular disparity and corresponding points in the field of visual computational theory, or three-dimensional reconstruction using corresponding points in two images taken by two cameras in the field of computer vision. The processing method of visual perception has advantages [36, 43] such as efficiency, robustness, and low computational complexity. It is therefore worthy of study by researchers in the fields of computer vision and visual neural computational theory.
In Appendix 1, according to Figure 2, Figure 3, the formulae (7), (8) and (9) we will make some predictions about stereoscopic perception of the image on a two-dimensional plane, including: 1 The picture, in which there is no vanishing point; 2 Alternating process of Cartesian coordinate system and affine coordinate system; 3 The Moon Illusion, and 4 The inversion reconstruction of visual image.
This article explores how the human vision system extracts depth information from an image of a scene in a Cartesian rectangular coordinate system on a two-dimensional plane. We introduced the concepts of a plenoptic function in the optical system of the visual pathway. In the section of methods “Computational approach in visual cortex V1”, we proposed an algorithm of coincidence test, in which an image primitive rU,V(a) transferred by ganglion cells from retina to visual cortex V1 will coincide with neurons’ receptive field [Bθ,φ(g)]Θ × Φ in cortical columns.
Note that, all of neurons in the columns simultaneously carry out compliance testing operations in parallel manner, neuron of [Bθ,φ(g)]Θ × Φ, which most consistent with the image primitives rU,V(a), is activated and its firing rate is strongest, so that each image primitive rU,V(a) can be detected. Because it is distributed and parallel processing (see following equation 12), the mathematic operation of coincidence test is very simple, robust, fast and completely consistent with the pattern of stimulating → firing → response of neurons.
Based on the biological function and structure of the visual pathway and the primary visual cortex, we proposed the dual-parameterized method, which can be expressed as P(u, v) ⊗ P(θ, φ), and is mathematically equivalent to the formula Pv(u, v; θ, φ) = [Ru,v(a)]U × V ⊗ [Bθ,φ(g)]Θ × Φ, or to formula 12, as described as follows.
In this paper, we have raised an issue “in the two-dimensional plane, why can three-dimensional structure of a picture be expressed by adopting Cartesian coordinate system?”, its importance is to study the information processing from 2D retinal image to three-dimensional visual perception. Based on neural computation of visual cortex V1, and taking into account the affine transformation processing of visual image information and size constancy of visual perception, and also considered the findings of psychophysics. However, formula (8) and Figure 2 show that the psychology of visual perception can explain how the human vision perceives a three-dimensional scene from a two-dimensional retina. Because of a structured light field that densely fills the surroundings, human vision processes information according to formulae (6) and (7). The information loss from the three-dimensional scene in the external world to a visual image in the two-dimensional retina is small, and hence the visual image on the retina contains the rich information of the three-dimensional scene. Therefore, we may consider the visual system as a causal system, meaning that the scene has a one-to-one correspondence with the visual image. The scene produces a visual image in the retina, and conversely, if a visual image is formed in the retina, then a viewer perceives the external scene that produced that visual image in the retina.
We know the reconstruction of visual image is just a hard inverse problem as a major topic of research in computer vision, its concern is how to use binocular disparity information (i.e., corresponding point in dual camera image) to find a stable and efficient reconstruction algorithms; it is also an issues concerned by current 3D display technical, its focal point is that this kind of research will able to provide an effective method for better 3D display technology; of course, it is also hard problem to trouble the research of biological vision, vision research mainly is to start from unified basic viewpoint of the biological function and structure of the vision and then explore how to achieve the following information processing by human visual system, namely : from retinal images of three-dimensional scenes to → 2D visual image, and to → 3D visual perception. In the first section “Mapping between the scene and visual image ” of this paper, this issue has been discussed in more detail, in which the formulas (6) and (7) had shown that there is no specific reconstruction algorithm from 2D retinal images to three-dimensional scene. At present, to an image, the processing time of the brain has been determined by using an approach of rapid serial visual presentation of image series and cognitive psychological method, it is just 13 ms . So fast processing speed shows that human vision may not be obtained three-dimensional depth perception by using reconstruction method based on the corresponding point, because this method and related algorithms are too complicated, the computational cost is also too high, for this reason, it is impossible to implement such a reconstruction algorithms by using the neurons, neural circuits and partial network. This paper studies how to obtain stereoscopic visual perception, when viewing pictures on the plane, obviously, this issue has important significance for vision information processing; of course, it is also the same for computer vision.
The picture, in which there is no vanishing point;
Alternating process of Cartesian coordinate system and affine coordinate system; 3. The Moon Illusion (see Appendix 1 for details ).
We have reason to believe that rough outline of theory about three-dimensional visual perception of visual pathway is generally clear.
We know that there are many monocular depth cues (e.g., perspective scaling, linear perspective, texture gradient, atmospheric perspective, occlusion, light and shade, color, and image hierarchy structure) that can also form depth perception. However, in this paper, we study how to express stereoscopic visual perception in a two-dimensional plane and only use the parameterized method of a dual plane of the plenoptic function to process the visual information of an image.
According to the principle of graceful degradation proposed by Marr , if the visual system calculates a rough two-dimensional description from an image, it will be able to calculate a rough three-dimensional description represented by this image. In other words, human vision can perceive the real three-dimensional description from stereoscopic images on a two-dimensional plane. Marr posed the problem in this way: “The contours of the image are two-dimensional, but we often come to understand these contours from the perspective of three dimensions. Therefore, the key question is how do we make a three-dimensional interpretation of the two-dimensional contour? Why can we make this explanation?”
We have studied this issue, and to answer Marr’s question, this paper presents a preliminary explanation. The main results are as follows:
1. Two different plenoptic functions to describe the objective world were introduced. The difference between these two functions P w and P v regarding the external scene obtained by visual perception were analyzed, and their specific applications in visual perception were discussed.
2. The main results were how the processing of visual depth information perceived in stereoscopic scenes can be displayed in a two-dimensional plane. Constraints for the coordinates and an algorithm implementation were also provided, in particular, a method used to separate the plenoptic function and a transformation from the retina to the visual cortex. A dual-plane parameterized method and its features in neural computing from the visual pathway to visual cortex V1 were discussed. Numerical experiments showed that the advantages of this method are efficiency, simplicity, and robustness.
3. Size constancy, a vanishing point, and vanishing line form the psychophysiological basis for visual perception of the external world, as well as the introduction of the three-dimensional Cartesian rectangular coordinate system into a two-dimensional plane. This study revealed the corresponding relationship between perceptual constancy, the optical system of vision, and the mapping of the vanishing point and line in the visual image on the retina.
The main results of this paper are a preliminary explanation as to why and how the Cartesian rectangular coordinate system can be introduced into a two- dimensional plane, and how a three-dimensional scene can be perceived in a two-dimensional plane. The results of this study are of significance in visual depth perception and possibly in applications of computational vision.
Computational approach in visual cortex V1
Ganglion cells transmit a neural firing spike train to the LGN. Then, similarly, magnocellulars and parvocells in the LGN transmit information about the image patches into 4Cα (magnocellular layer) and 4Cβ (parvocellular layer) of the fourth layer in the V1 cortex. Naturally, these coded neural firing spike trains need to be decoded and information about their image primitives need to be restored. A neural decoding circuit with 40 Hz synchronous oscillation accomplishes this task .
In cortex V1, the shapes of a receptive field of the simple and complex cells are bar-shaped patterns of orientation and bandwidth selectivity. The sizes of the receptive field of the simple and complex cells are about 20–50 μm. Their orientation and maximum resolutions are about 10° and 0.25°, respectively. Hence, their line resolution is between 5.0–100 μm .
The neurobiological significance of the Kronecker product ⊗ between the two matrixes [Ru,v(a)]U × V and [Bθ,φ(g)]Θ × Φ lies in the assumption that these functional columns have the same information processing function and each functional column consists of many receptive fields with different directions and frequencies [46, 49]. The processing of the visual image in the retina and the corresponding points in the V1 cortex, in essence, is a process in which all receptive fields with different orientations in the cortical columns select suitable image patches. Those that correspond to the most active neurons are selected. This assumption is in accordance with the experimental results of the function and structure of the V1 cortex .
According to Figure 10 and formula (10), the total orientation of 180° is divided into 18 intervals, thus the orientation resolution of the human vision is only 10°. In fact, the resolution is much higher than 10° and is actually down to 0.25°. This is because the brain applies an interpolation method between the adjacent optimal orientations. In other words, when the preferred orientation of the receptive field of a cortical simple cell is close to the optimal orientation, a weighted average value based on the number of activated simple cells is calculated [9, 46, 48, 51]. Performing a numerical simulation based on formula (10), the azimuth angle in Figure 10 may be divided more finely, at the same time increasing the type and number of the receptive fields in formula (10). According to the complexity of the visual image, for example, the number of image features (line, corners, and curves for example) and their distribution density, the total number of blocks (primitives) can be determined (first level division), and then the number of sub-blocks is determined (secondary level division). If necessary, the sub-blocks can also be divided. The purpose of doing this is that one can simulate multi-scale properties of the visual system. In addition, it could make the results of numerical simulations more accurate, as the error between the source image (visual image) and results of numerical simulation would be smaller.
Appendix 1 of the section 5 of text
Image without a vanishing point
Alternating use of a Cartesian coordinate system and affine coordinate system
There seems to be no vanishing point in Figure 13. In fact, each of the four parallel sides extends to infinity in the left and right, up and down, and forward and backward directions. The parallel sides converge together and inevitably form vanishing points, all of which form a closed circle. This closed circle is the vanishing line. The circular vanishing line is the fundamental reason why the human’s visual perception can invert opposite sides for the front and back in Figure 13. In Figure 13, the Necker cube is consistent with the representation in Figure 3. As this representation can generate three-dimensional perception, also in line with the representation in Figures 2 and 5, it is not repeated.
Because of the combined effects of visual perception’s constancy and vision’s optical property of far objects being smaller and near objects being larger, the Moon (or Sun) in the sky is perceived to be further from the observer, and area of the Moon is thus perceived to be smaller. Existing experimental and calculation results are that the Moon on the horizon is visually perceived to be 1.5 to 1.7 times as large as that in the sky [1, 2].
This research was supported by the Natural Science Foundation of China (No.: 61271425). The authors would like to thank Dr. Wu Aimin for citing his research work from Ref , as shown in Figure 7 and Figure 8. The authors wish also to thank Li Shuzhong and Song Guangyu for providing two photos (Additional files 2 and 3). The authors also wish to thank the two anonymous reviewers for their comments that have helped improve the quality of the manuscript.
- Cumming BG, Parker AJ: Binocular neurons in V1 of awake monkeys are selective for absolute, not relative, disparity. J Neurosci. 1999, 19 (13): 5602-5618.PubMedGoogle Scholar
- Neri P, Bridge H, Heeger DJ: Stereoscopic processing of absolute and relative disparity in human visual cortex. J Neurophysiol. 2004, 92 (3): 1880-1891. 10.1152/jn.01042.2003.View ArticlePubMedGoogle Scholar
- Ohzawa I, DeAngelis GC, Freeman RD: Stereoscopic depth discrimination in the visual cortex: neurons ideally suited as disparity detectors. Science. 1990, 249 (4972): 1037-1041. 10.1126/science.2396096.View ArticlePubMedGoogle Scholar
- Haefner RM, Cumming BG: Adaptation to natural binocular disparities in primate V1 explained by a generalized energy model. Neuron. 2008, 57 (1): 147-158. 10.1016/j.neuron.2007.10.042.PubMed CentralView ArticlePubMedGoogle Scholar
- Marr D: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. 1982, New York: FreemanGoogle Scholar
- Julesz B: Binocular depth perception of computer generated patterns. Bell Syst Tech J. 1960, 39: 1125-1162. 10.1002/j.1538-7305.1960.tb03954.x.View ArticleGoogle Scholar
- Julesz B: Stereoscopic vision. Vis Res. 1986, 26: 1601-1612. 10.1016/0042-6989(86)90178-1.View ArticlePubMedGoogle Scholar
- Crick F: The Astonishing Hypothesis: The Scientific Search for the Soul. 1995, New York: Touchstone Rockefeller Center, 23-71.Google Scholar
- Frisby JP, Stone JV: Seeing, The Computational Approach to Biological Vision. 2010, England: The MIT Press London, 539-551. 2Google Scholar
- Adelson E, Bergen J: The Plenoptic Function and the Elements of Early Vision. Computational Models of Visual Processing. 1991, Cambridge, MA: The MIT Press, 385-394.Google Scholar
- Adelson EH, Wang John YA: Single lens stereo with a plenoptic camera. IEEE Trans Pattern Anal Mach Intell. 1992, 14 (2): 99-106. 10.1109/34.121783.View ArticleGoogle Scholar
- Ren N, Levoy M, Bredif M, Duval G, Horowitz M, Hanrahan P: Light Field Photography With a Handheld Plenoptic Camera. 2005, California: Stanford University Computer Science Tech Report CSTRGoogle Scholar
- Levoy M, Ren N, Andrew A, Footer M, Horowitz M: Light field microscopy. ACM Trans Graphics (TOG). 2006, 25 (3): 924-934. 10.1145/1141911.1141976.View ArticleGoogle Scholar
- McMillan L, Bishop G: Plenoptic modeling: An image-based rendering system. Computer Graphics of Proceedings ACM SIGGRAPH'95. 1995, Los Angeles: SIGGRAPH Press, 899-903.Google Scholar
- Wenger A, Gardner A, Tchou C, Unger J, Hawkins T, Debevec P: Performance relighting and reflectance transformation with time-multiplexed illumination. ACM Trans Graph. 2005, 24 (3): 756-764. 10.1145/1073204.1073258.View ArticleGoogle Scholar
- Schreer O, Kauff P, Sikjora T: 3D videocommunication: Algorithms, concepts and real-time systems in human centred communication. 2005, New York: John &Sons, Inc, 110-150.View ArticleGoogle Scholar
- Yan Z: A course on techniques of photography. 2013, Shanghai: Fudan university press, SevenGoogle Scholar
- Sonka M, Havac V, Boyle R: Image Processing, Analysis, and Machine Vision. 1999, New Jersey: Thomson Learning and PT Press, 310-321. SecondGoogle Scholar
- Mallot HA: Computational Vision: Information Processing in Perception and Visual Behavior. 2000, Cambridge, London, England: The MIT Press, 23-46.Google Scholar
- Koenderink JJ, van Doorn AJ: Representation of local geometry in the visual system. Biol Cybern. 1987, 55 (6): 367-375. 10.1007/BF00318371.View ArticlePubMedGoogle Scholar
- Faugeras O: Three-Dimensional Computer Vision: A Geometric Viewpoint. 1993, Cambridge, London, England: The MIT PressGoogle Scholar
- Faugeras O, Luong QT, Papadopoulo T: The Geometry of Multiple Images: The Laws That Govern the Formation of Multiple Images of a Scene Andsome of Their Applications. 2001, Cambridge, London, England: MIT pressGoogle Scholar
- Schwartz SH: Geometrical and Visual Optics. 2013, New York: McGraw-Hill MedicalGoogle Scholar
- Rybak IA, Gusakova VI, Golovan AV, Podladchikova LN, Shevtsova NA: A model of attention-guided visual perception and recognition. Vis Res. 1998, 38: 2387-2400. 10.1016/S0042-6989(98)00020-0.View ArticlePubMedGoogle Scholar
- Chua HF, Boland JE, Nisbett RE: Cultural variation in eye movements during scene perception. PNAS. 2005, 102 (35): 12629-12633. 10.1073/pnas.0506162102.PubMed CentralView ArticlePubMedGoogle Scholar
- Zou Q, Zhao S, Wang Z, Huang Y: A neural computational model for bottom-up attention with invariant and overcomplete representation. BMC Neurosci. 2012, 13: 145-10.1186/1471-2202-13-145.View ArticlePubMedGoogle Scholar
- Zeki S: A Vision of the Brain. 1993, Oxford: Blackwell Scientific PubGoogle Scholar
- Livingstone MS, Hubel DH: Anatomy and physiology of a color system in the primate visual cortex. J Neurosci. 1984, 4: 309-356.PubMedGoogle Scholar
- Livingstone MS, Hubel DH: Psychophysical evidence for separate channels for the perception of form, color, movement, and depth. J Neurosci. 1987, x7: 3416-3468.Google Scholar
- Hubel DH, Wiesel TN: Ferrier lecture, functional architecture of macaque monkey visual cortex. Proc R Soc Lond Biol Sci. 1977, 198: 1-59. 10.1098/rspb.1977.0085.View ArticleGoogle Scholar
- Hubel DH: Exploration of the primary visual cortex: 1955–1978. Nature. 1982, 299: 515-524. 10.1038/299515a0.View ArticlePubMedGoogle Scholar
- Nicholls JG, Martin AR, Wallace BG, Fuchs PA: From Neuron to Brain. 2001, Massachusetts: Sinauer Associates, Inc, FourGoogle Scholar
- Regan D: Human Perception of Objects. 2000, Sunderland, Mass: Sinauer Associates, Inc, 116-120.Google Scholar
- Rock I: The Logic of Perception. 1983, Cambridge, MA: MIT PressGoogle Scholar
- Hershenson M: Visual Space Perception, a Primer. 2000, Cambridge, MA: The MIT Press, 78-91.Google Scholar
- Aimin W, De X, Wang H, Wu J: Objects size constancy computation based on visual psychology. Acta Electronica Sin. 2006, 34 (6): 1096-1103.Google Scholar
- Wu A: Application of Visual Psychology in Computer Vision. 2006, Beijing: Press of Beijing Jiaotong University, 105-106.Google Scholar
- Shufelt JA: Performance evaluation and analysis of vanishing point detection techniques. IEEE Trans Pattern Anal Mach Intell. 2002, 21 (3): 282-288.View ArticleGoogle Scholar
- Almansa A, Desolneux A, Vamech S: Vanishing point detection without any a priori information. IEEE Trans Pattern Anal Mach Intell. 2003, 25 (4): 502-507. 10.1109/TPAMI.2003.1190575.View ArticleGoogle Scholar
- Kalantari M, Jung F, Guedon J: Precise: Automatic and fast method for vanishing point detection. Photogramm Rec. 2009, 24 (127): 246-263. 10.1111/j.1477-9730.2009.00542.x.View ArticleGoogle Scholar
- Schaffalitzky F, Zisserman A: Planar grouping for automatic detection of vanishing lines and points. Image Vis Comput. 2000, 18 (9): 647-658. 10.1016/S0262-8856(99)00069-4.View ArticleGoogle Scholar
- Tardif JP: Non-Iterative Approach for Fast and Accurate Vanishing Point Detection. Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, Kyoto, Japan: IEEE, 1250-1257.Google Scholar
- Palmer SE: Vision Science. 1999, London, Cambridge, Mass: MIT Press, 120-280.Google Scholar
- Potter MC, Wyble B, Hagmann CE, McCourt ES: Detecting meaning in RSVP at 13 ms per picture. Atten Percept Psychophys. 2013, 12: 1-10.Google Scholar
- Murray SO, Boyaci H, Kersten D: The representation of perceived angular size in human primary visual cortex. Nat Neurosci. 2006, 9: 429-434. 10.1038/nn1641.View ArticlePubMedGoogle Scholar
- Songnian Z, Qi Z, Zhen J, Guozheng Y, Li Y: Neural computation of visual imaging based on Kronecker product in the primary visual cortex. BMC Neurosci. 2010, 11 (43): 1-14.Google Scholar
- SongNian ZHAO, Li YAO, Zhen JIN, XiaoYun XIONG, Xia WU, Qi ZOU, GuoZheng YAO, XiaoHong CAI, YiJun LIU: Sparse representation of global features of visual images in human primary visual cortex in human primary visual cortex: Evidence from fMRI. Chin Sci Bull. 2008, 14 (53): 2165-2174.Google Scholar
- Songnian ZHAO, Qi ZOU, Zhen JIN, GuoZheng YAO, Li YAO: A computational model of early vision based on synchronized response and inner product operation. Neurocomputing. 2010, 73: 3229-3241. 10.1016/j.neucom.2010.05.021.View ArticleGoogle Scholar
- Zhao S, Xiong X, Yao G, Fu Z: A computational model as neurodecoder based on synchronous oscillation in the visual cortex. Neural Comput. 2003, 15: 2399-2418. 10.1162/089976603322362419.View ArticleGoogle Scholar
- Zhao S, Zou Q, Jin Z, Xiong X, Yao G, Yao L, Liu Y: A Computational Model that Realizes a Sparse Representation of the Primary Visual Cortex V1. Software Engineering 2009, WCSE '09. WRI World Congress on Software Engineering, IEEE Computer Society, Xplore. 2009, Los Alamitos, CA: Publications office of the IEEE Computer Society, 54-62. Issue Date: 19–21 MayGoogle Scholar
- Jackson AJ, Bailey IL: Visual acuity. Optom Pract. 2004, 5: 53-70.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.