Grantee: Stanford University, Stanford, CA, USA
Researcher: Daniel Yamins, Ph.D.
Grant Title: Building richer computational models of visual cognition
https://doi.org/10.37717/220020469
Program Area: Understanding Human Cognition
Grant Type: Scholar Award
Amount: $600,000
Year Awarded: 2016
Duration: 6 years
In the course of everyday functioning, humans rapidly and effortlessly reformat the "blooming, buzzing confusion" of unstructured sensory datastreams into powerful abstractions that better serve our behavioral needs. But converting retinal input into rich object-centric scene descriptions, or sound waves into words and sentences, are major computational challenges. The crux of the problem is that the natural axes of low-level sensory input spaces (e.g. retinal photoreceptor or auditory hair-cell potentials) are misaligned with the axes along which behaviorally-relevant constructs vary. In vision, for example, object translation, rotation, motion in depth, deformation, re-lighting, etc. cause complex non-linear changes in image space. Conversely, images of two objects that are functionally quite distinct, e.g. different individuals' faces, can be very close together as pixel maps. Behaviorally-relevant dimensions are thus "tangled up" in this input space, and brains must accomplish the extremely computationally-demanding task of untangling.
My goal as a computational neuroscientist is to reverse engineer the brain's sensory systems with enough fidelity to build formal computational models that:
1. behave like humans, meeting the artificial intelligence (AI) goal of matching both the absolute performance as well as the error patterns of humans on a multiplicity of tasks in a realistic environment,Underlying these goals is the hypothesis that advances in psychologically-informed AI will lead to better neural models; and conversely, better understanding of the brain will push the boundaries of AI. My belief in this fruitful reciprocity is inspired by significant recent progress in understanding neural response patterns in higher visual and auditory cortex. In this work, optimizing deep neural networks for performance on simple but ecologically relevant sensory recognition tasks (e.g. visual object categorization or auditory speaker identification) has led to quantitatively accurate models of neural response patterns in higher cortical areas that were previously inaccessible to quantitative understanding. Conversely, observed neuroanatomical hierarchical structure and physiological retinotopy of the visual system (and related findings in audition) helped inspire the class of deep convolutional neural network algorithms currently revolutionizing computer vision and other areas of artificial intelligence.
My research program focuses on two related problems that build on this work, capturing what I feel are key open directions in creating behaviorally richer and more neurally accurate models of visual cognition. One focus is on discovering more psychologically realistic learning rules | understanding the dynamic principles behind how sensory systems flexibly incorporate data in the environment to improve their functioning. A second focus is on building models of scene understanding that go beyond simple categorization and recognition tasks, toward understanding complex inter-object relationships in multi-object scenes | with the eventual goal of building a bridge between deep sensory models are other areas of higher cognitive function.