Funded Grants


Tracking engagement in human communication from wearable sensors in real-world environments

Face-to-face communication plays a fundamental role in human behavior. We seek help, share information, and maintain social bonds through interpersonal, often face-to-face, communication. Real world communication is rich, nuanced, and impacted moment-by-moment by subtle linguistic and nonlinguistic cues. Yet, our research has tended to carve communication at its joints. Researchers investigating speech perception, voice recognition, face recognition, speech production, and gesture have engaged in largely parallel progress with surprisingly little cross-talk and virtually no attempts to build ‘universal’ approaches to human communication. Moreover, most research has adhered to carefully controlled laboratory studies, the very nature of which influences the theoretical constructs we identify as significant, the questions we pursue, and the answers we can discover.

The time is right to develop new tools and approaches to break human communication research out of the laboratory and pursue it in real world contexts. The wide and ever-growing array of wearable sensors available to track behavior (accelerometers, GPS, heart rate, motion, portable neuroimaging, etc.) offers the potential to collect highly detailed, time-resolved records of multiple facets of real-world behavior. At the same time, powerful new computational approaches are being developed for automatic speech recognition, object/event recognition, behavior recognition, and video/image analysis that can monitor detailed sensor data in real-time, and categorize it according to the emotional valence of speech, the engagement of a listener, the amusement of a conversation partner, or a host of other dimensions. As colleagues at Carnegie Mellon University, we see a unique opportunity to blend our cognitive science and computer science research programs to break new empirical and analytical ground in understanding human communication in the wild.

Our central hypothesis is that understanding real-world communication will require us to direct new research focus to engagement. Face-to-face communication is more than an exchange of continuous speech input modestly supported by a view of the talker. We aim to understand dynamic signals (under both conscious and unconscious control) that meet communicative demands, how we pick up on (or miss) these signals, and the impact of these signals on what gets communicated.

In this two-year project, we will study the ultimate communication innovators – teenage girls – who, across hundreds of years, have been documented to be on the leading edge of communication change and thus provide an excellent initial testing ground. We take the approach of collecting acoustic, video, and eye gaze data via sensor-embedded glasses across a freely flowing coffee shop conversation between familiar pairs of teens, and the same teens shuffled so that the pairs are strangers. We will extract and quantify signals such as laughing, amount of time to react and silences, gaze aversion and engagement, convergence of speech, and face emotions beyond laughter using machine learning algorithms, and then calculate an engagement value from these behavioral metrics. This rich dataset, with an experimental manipulation across dyad familiarity, will allow us to rigorously test the feasibility of these (and other) real-world indices of communication and engagement.