Funded Grants


Neural basis of Bayesian inference and decision making: theory and experiments

The last few years have seen the emergence of a general theory of neural computation which provides a unifying framework to understand human behavior in wide variety of seemingly unrelated domains such as visual perception [1-3], cue integration [4, 5], multisensory integration [6], decision making [7, 8], language acquisition [9], concept acquisition [10] and motor control [11, 12]. This theory is known as the theory of statistical inferences, or Bayesian inferences. While the mathematical foundations of Bayesian inference, and their connections to behavior, are being extensively studied, little is known about their neural basis. The goal of the project is to understand how they are implemented in neural circuits as this will give us a deep understanding of how the brain carry out complex, real world computations. One of our main challenges will be to explain how such near-optimal computations can be performed given the remarkably high level of variability in the response of cortical neurons.

1. Why Bayesian inferences?
There are two main approaches to neural computation. In the standard model--the one found in most models of the nervous system--neural circuits compute the value of variables, given their input. For instance, in models of face recognition, neurons are thought to encode the identity of the person whose image appears on the retina. The key concept in this approach is that neural circuits return one answer in response to their input. The alternative, known as the Bayesian approach, is that neural circuits return many answers along with their respective probabilities. In other words, according to this view, neural circuits compute probability distributions over the variables of interest given the input. In the case of face recognition, a Bayesian inference would yield the probability over all possible names given the image. If a decision is needed, one can simply pick the name with the highest probability (other options are also available depending on the cost of mistakes). This might appear to be a minor modification to the standard approach, but it turns out to be a fundamentally different, and more efficient, approach.

For instance, the Bayesian framework provides a natural way to combine multiple sources of information. Imagine locating an object that can be seen and heard simultaneously (for instance, a mosquito). Using the standard approach, one would first compute two estimates: one based on vision alone and one based on audition alone. The next step would be to combine those estimates together, by taking, for instance, the average of the two unimodal estimates. This may sound like the best way to proceed, but this is not the case; the average treats the two estimates on an equal footing which is a bad idea because vision and audition are not equally informative. In daylight, vision is much more reliable than audition (close your eyes and try locating a mosquito based on audition alone). It would therefore make sense to favor vision when combining the unimodal estimates. However, favoring vision at all times is not optimal either, because there are contexts in which audition is more reliable (e.g., at night). The standard approach does not naturally deal with this problem.

By contrast, the Bayesian approach takes into account the reliability, or certainty of all cues automatically. It takes into account reliability because the computations operate over probability distributions, which explicitly encode certainty. Hence, if a visual cue is reliable, its probability distribution will tend to be narrowly peaked around a particular value. If the reliability decreases (as it would for a visual stimulus at night), the probability distribution becomes wider.

In the case of visuo-auditory integration, the goal of the Bayesian inference is to compute the probability distribution over position given the visual and auditory inputs. This is obtained by simply multiplying the unimodal distributions together, i.e., the distribution over position given vision and the distribution over position given audition. Such a product is dominated by the distribution with the narrowest distribution, thus favoring the most reliable cue.

The Bayesian approach also provides a natural framework for decisions making. Indeed, most decisions involves integrating a variety of factors, or evidence, in favor or against multiples alternatives, a process which is formally equivalent to the visuoauditory integration problem we have just described. Another major advantage of the Bayesian inferences is that it provides a natural way to incorporate prior knowledge. If for instance, an object has a tendency to appear in around the same location over time, this knowledge can be encoded in a prior probability distribution centered on that location. This prior can then be combined with the unimodal probability distribution by simply taking product of the distributions.

All of these remarkable computational properties of Bayesian inference have led several investigators to ask whether some aspects of neural computation are akin to Bayesian inferences. As mentioned in the introduction, there is now a host of behavioral data indicating that humans use a Bayesian approach to solve a wide variety of problems [see for instance 2].

2. Probabilistic population codes and inference
Our goal is to understand how neurons circuits in the cortex perform Bayesian inference. Our first challenge will to be to characterize the neural code used for representing probability distributions.

At first sight, it would seem that cortical neurons are not particularly well suited to the encoding of probability distributions because their responses appear to be very unreliable. For instance, some neurons in the visual cortex respond selectivity to the image of a moving object. Their response however can vary greatly from trial to trial, even if the stimulus in the image moves with the very same speed and direction. This high level of variability is difficult to reconcile with the notion that the nervous system is often Bayes optimal. We believe however that variability is exactly what is expected from a system representing and manipulating probability distributions, or, to put it slightly differently, we believe that neurons are variable because they encode probability distributions.

To see what led us to this idea, consider the following example. Imagine that we want to estimate the probability of getting head when flipping a coin. To estimate the probability, we can throw the coin, say, 100 times, count the number of heads, and divide by 100 to get the probability for head. For example, if we get 55 heads, we would conclude that the probability is 0.55. If we repeat this experiment a second time, we are very likely to get a different answer because of the random nature of the coin tossing. We may get for instance 47 heads, in which case we would now conclude that the probability is 0.47. If neurons encode probability distributions, they must go through a similar process. They get to observe a limited amount of data (the sensory signals) and they must infer the probability distribution that led to this data. Just like our estimate of the probability of head varies from trial to trial in our example, the response of neurons is bound to vary from trial to trial.

This is the essence of our idea: the variability in neuronal responses is due to the fact that neurons represent probability distributions. Furthermore, we will show that by measuring this variability, we can decode the probability distribution encoded in a pattern of neural activity across a population of neurons. We call this kind of neural code probabilistic population codes.

The next step in our project will be to determine how to combine probabilistic population codes to perform Bayesian inference. Our preliminary work indicates that the solution could be remarkably simple given the form of neuronal variability in the brain: inference simply require adding probabilistic population codes together. This is remarkable because summing is the simplest form of computation and it can be readily implemented in neural hardware.

Once the theoretical foundations of our framework are laid out, we will test our theory by recording the activity of neurons in animals trained to make simple perceptual decisions. We think that this is a particularly interesting domain to examine given that decision making is one of the most critical functions of the human brain, impacting all aspects of our life on time scales extending from tenths of seconds to days.

In summary, we believe that our project can provide a neural theory of Bayesian inference and simple decision making, while resolving one of the most puzzling paradoxes in neuroscience: how can behavior be near Bayesian optimal with seemingly unreliable neurons? This theory will have a wide range of applications, from sensory processing to motor control, and will be readily testable with the existing technologies for measuring neural activity.

References
1. Kersten, D., High level vision as statistical inference, in The New Cognitive Neurosciences, M.S. Gazzaniga, Editor. 1999, MIT Press: Cambridge.
2. Knill, D.C. and W. Richards, Perception as Bayesian Inference. 1996, New York: Cambridge University Press.
3. Knill, D.C., Surface orientation from texture: Ideal observers, generic observers and the information content of texture cues. Vision Research, 1998. 38: p. 1655- 1682.
4. Jacobs, R.A. and I. Fine, Experience-dependent integration of texture and motion cues to depth. Vision Research, 1999. 39(24): p. 4062-75.
5. Landy, M.S., et al., Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Res, 1995. 35(3): p. 389-412.
6. Ernst, M.O. and M.S. Banks, Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 2002. 415(6870): p. 429-33.
7. Newsome, W.T., K.H. Britten, and J.A. Movshon, Neuronal correlates of a perceptual decision. Nature, 1989. 341(6237): p. 52-4.
8. Gold, J.I. and M.N. Shadlen, Neural computations that underlie decisions about sensory stimuli. Trends in Cognitive Sciences, 2001. 5: p. 10-16.
9. Saffran, J.R., R.N. Aslin, and E.L. Newport, Statistical learning by 8-month-old infants. Science, 1996. 274(5294): p. 1926-8.
10. Tenenbaum, J.B. and T.L. Griffiths, Generalization, similarity, and Bayesian inference. Behav Brain Sci, 2001. 24(4): p. 629-40; discussion 652-791.
11. Kording, K.P. and D.M. Wolpert, Bayesian integration in sensorimotor learning. Nature, 2004. 427(6971): p. 244-7.
12. Todorov, E. and M.I. Jordan, Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 2002. 5(11): p. 1110-1.