Predicting where we look in videos

Ryo Yonetani, Hiroaki Kawashima, Takashi Matsuyama: “Predicting Where We Look from Spatiotemporal Gaps”, International Conference on Multimodal Interaction (ICMI2013), Sydney, Australia, Dec 2013


When we are watching videos, there exist spatiotemporal gaps between where we look and what we focus on, which result from temporally delayed responses and anticipation in eye movements. We focus on the underlying structures of those gaps and propose a novel method to predict points of gaze from video data. In the proposed methods, we model the spatiotemporal patterns of salient regions that tend to be focused on and statistically learn which types of the patterns strongly appear around the points of gaze with respect to each type of eye movements. It allows us to exploit the structures of gaps affected by eye movements and salient motions for the gaze-point prediction. The effectiveness of the proposed method is confirmed with several public datasets.