(A) Network architecture. R = camera rotation (global 3D vector), F = optic flow (2D vector map), V = pixel positions projected onto sphere at focal point (constant 3D vector map), E = time derivative of light intensity (scalar map), G = spatial gradient of intensity (2D vector map), I = light intensity (scalar map). Each of these is a map of values covering the visual space, except for R, which is a single global value. Note that our sensory input is E, not I. (B) Example of I, G, E, and F while the camera is rotating around its viewing axis. The circular legend indicates directional colors in G and F.