Visual signals for action. A, Visual scenes consist of “grammar” that is defined by building blocks in a hierarchical structure, consisting of phrases (top), global objects (middle), and associated local objects (bottom) (Võ, 2021). B, When reaching to stationary objects, the target object is commonly fixated throughout the reach, which allows the integration of foveal vision of the object and peripheral vision of the hand. C, When intercepting moving objects, the eyes either track the moving object with SPEMs or fixate on the expected interception location to guide the hand toward the object.