Predicting the Speaker in Three-Party Conversations from Eye Contact Patterns
Imagine a future where robots can speak face-to-face with humans in a way that is entirely…well, human. Accomplishing this requires a detailed understanding of behavior we take for granted.
“People are very skilled at identifying the speaker in a conversation just from observing their eye contact behavior,” said Zhigang Deng, professor of computer science at the University of Houston. “Quantifying these patterns, however, is a challenge.”
Computer-Human Interaction for Efficient Design of Technology
Deng’s research group works in a field known as computer-human interaction, which studies how we interact with computers. This field ranges from designing apps to voice recognition software. Understanding this interaction can lead to design of technology that feels intuitive for users.
On the forefront of this research is the challenge of designing robots that can interact with humans in a way that feels entirely natural. Accomplishing this, however, requires quantifying behavior, such as eye contact patterns, that we take for granted.
“We want robots to have the capability of identifying the speaker in a conversation,” said Yu Ding, a postdoctoral researcher in Deng’s group, and the first author on a paper describing the research. “A robot, which will be programmed by computer scientists, needs to know who is the speaker and the listener.”
Predicting the Speaker in a Three-Party Conversation
Deng, along with his research group, developed a strategy to predict the speaker in a three-party conversation.
“We are the first group to use eye contact patterns and head orientation to identify the speaker in a three-party conversation,” said Deng, who is part of the College of Natural Sciences and Mathematics.
The results will be presented at the ACM CHI Conference on Human Factors in Computing Systems, which is the top conference in their field. A copy of the manuscript describing these results is available on the Computer Graphics and Interactive Media Lab website. Also included as co-authors on the paper are Yuting Zhang, a UH Ph.D. student in computer science, and Meihua Xhao, a visiting scholar.
“Eye contact is constantly changing during a conversation. It’s very dynamic data,” Ding said. “The challenge is to extract meaningful behavior patterns.”
Understanding Dynamic Data With Time-Series Analysis
To create this method, Deng’s group obtained a dataset of a lengthy conversation between three people, recorded using a high-quality optical motion capture system. These recordings, as well as the data analysis and modeling methodology developed by Deng’s group, made these predictions possible.
“Our methodology came down to time-series analysis,” Ding said. “When we speak, our gaze changes all the time. That means we can’t detect the speaker by just looking at a single frame. Instead, we have to consider a sequence of frames in which the gaze of all three people is changing.”
Although Deng is quantifying eye contact patterns during conversations for computer-human interactions, this knowledge is applicable to other disciplines.
For example, autism spectrum disorders are characterized by differences in eye contact behaviors. Although current diagnostic techniques are lengthy and subjective, a quantitative understanding of eye contact behaviors could offer doctors an additional tool for accurate diagnosis.
“These research findings can be used in multiple fields,” Deng said.
This research was supported by the National Science Foundation and the National Natural Science Foundation of China.
- Rachel Fairbank, College of Natural Sciences and Mathematics