Funded Grants

Beyond receptive fields in visual cortex

When we look at images, neural signals are elicited in a series of brain regions, from the retina on to the cerebral cortex, where cognitive elaboration is mostly thought to occur. Their place of entry in the cortex is the primary visual area (V1). Because visual stimuli have to pass through V1 to reach higher cortical areas, and because responses in V1 are already seen to reflect attributes of the perceptual experience of the visual stimuli, V1 is considered a key region in the processing of visual information.

The most common way to study responses of V1 neurons is by showing simple stimuli such as bars and crosses. Using these stimuli, it has been possible to characterize the main features that make V1 neurons fire, by describing a neuron’s receptive field, the region of visual space where light elicits responses. The classical model of V1 neurons is that they perform weighted sums (and subtractions) of the stimulus intensities, with weights given by the receptive field.

Research in the last 15 years, however, has revealed a number of phenomena that are not explained by receptive fields alone. For example, V1 neurons receive one kind of suppression from a visual region roughly overlapping with the receptive field, and another kind of suppression from a visual region extending beyond the confines of the receptive field. Both forms of suppression can be described as divisive: suppressing stimuli divide the effective strength of stimuli seen by the neuron. The opposite effects appear when neuronal responses are enhanced by visual attention, suggesting that suppression and attention might engage the same mechanisms, one to obtain division, the other to obtain multiplication.

A new generation of models shows great promise in explaining these effects. In addition to a receptive field, these models include a suppressive field with distinct visual preferences. The suppressive field estimates overall local strength of the stimulus, and divides the output of the receptive field. While they show great promise, however, these divisive models have not yet fully replaced the classical models based on the receptive field as the standard employed by the community.

What is not known is whether a single divisive model can explain all the phenomena that have been demonstrated to challenge the classical model. Moreover, it is not know whether such a model can go beyond predicting V1 responses to simplified visual stimuli such as bars and crosses. Can it predict responses to the more complex visual scenes that occur outside the laboratory? Earlier efforts to predict responses to complex stimuli have relied on minor variations of the receptive field model in spite of this model’s clear inadequacy.

We propose a more promising line of attack. First, rely on the divisive model, which goes beyond the receptive field and promises to explain a large number of phenomena. Second, employ simple stimuli such as bars and crosses to constrain model parameters. Third, test the model on responses to complex stimuli such as artificial scenes from a cartoon and natural scenes filmed from the head of a cat roaming a forest. We would then like to investigate the effects of cognitive factors such as visual attention. Ultimately, our goal is to include these effects in our model, by expressing them in terms of their action on the model components.

Our approach entails a combination of experiment and modeling, and requires multidisciplinary tools and skills. In addition to a neuroscientist (the PI), our team includes associates trained in Physics and Computer Science. In contrast to more traditional neurophysiology practice, we devote many hours of our experiments to the careful measurement of responses of individual neurons. The resulting large body of data requires months of analysis. To facilitate this analysis we have developed a number of tools. One of these tools allows us to simulate many hours of measurements obtained from single neurons. We can thus develop more and more refined models, and easily test those models on the very same sequence of visual stimuli that were presented during the experiment.

Once we have achieved our goal of predicting V1 responses to a broad range of visual stimuli, we would refine and polish our software and database so that they can serve the community for scientific and educational purposes. Users would be able to upload still images or video sequences of their choice to our web site, which would then output the predicted response of any one of our V1 neurons. Visual scientists might find this service useful to predict or interpret the outcome of their experiments, and students and teachers could use it to gain a better understanding of the operation of the early visual system.

In summary, our goal is an intuitive model of V1 responses that is constrained by a limited set of measurements, and predicts responses to complex video sequences. Such a succinct description of the computations performed in V1 could serve as a standard model, which is highly desirable to illuminate the relation between neural responses and perceptual effects. By bridging the gulf between simplified laboratory stimuli and arbitrary visual scenes, neurophysiology can illuminate the results of measurements of visual perception.

The proposed research merits the support of the James S. McDonnell Foundation for three main reasons. First, it is important, with direct consequences for our interpretation of cognitive effects. Second, it is novel: although the individual components of the model we propose have been the object of decades of work, they have never before been put together in a single, cohesive model. Third, it is ambitious: our goal to predict responses of V1 neurons to complex, arbitrary stimuli has been pursued by few others, and has been achieved by none.