Context-Aware Head-and-Eye Motion Generation with Diffusion Model

Published in [IEEEVR24] IEEE Conference on Virtual Reality and 3D User Interfaces, 2024

In humanity’s ongoing quest to craft natural and realistic avatars within virtual environments, the generation of authentic eye gaze behaviors stands paramount. Eye gaze not only serves as a primary non-verbal communication cue, but it also reflects cognitive processes, intent, and attentiveness, making it a crucial element in ensuring immersive interactions. However, automatically generating these intricate gaze behaviors presents significant challenges. Traditional methods can be both time-consuming and lack the precision to align gaze behaviors with the intricate nuances of the environment in which the avatar resides. To overcome these challenges, we introduce a novel two-stage approach to generate context-aware head-and-eye motions across diverse scenes. By harnessing the capabilities of advanced diffusion models, our approach adeptly produces contextually appropriate eye gaze points, further leading to the generation of natural head-and-eye movements. Utilizing Head-Mounted Display eye-tracking technology, we also present a comprehensive dataset, which captures human eye gaze behaviors in tandem with associated scene features. We show that our approach consistently delivers intuitive and lifelike head-and-eye motions and demonstrates superior performance in terms of motion fluidity, alignment with contextual cues, and overall user satisfaction.