Video and Motion Analysis

Video data understanding has drawn considerable interest in recent times as a result of access to huge amount of video data and success in image-based models for visual tasks. However, motion blur, compression artifacts cause apparently consistent video signals to produce high temporal variation on frame-level output for vision tasks such as object detection or semantic segmentation. In this project, we study efficient early, and high-level visual processing algorithms on videos by leveraging consistencies over time and space.


  • Object Detection in Video We introduce a framework for refining object detection in video. Our approach extracts contextual information from neighboring frames, generating predictions with state of the art accuracy that are also temporally consistent. Importantly, our model benefits from context frames even when they lack ground truth annotations.
  • Semantic Segmentation in Video We explore the efficiency of the Conditional Random Field (CRF) based inference for semantic segmentation in videos. The key idea is to combine best of the two worlds – semantic co-labeling and more expressive models. We follow the mean-field updates for higher order clique potentials and extend the spatial smoothness and appearance kernels to address ...
  • Supervoxel Segmentation We propose a significant increment and improvement to the streamGBH, the state-of-the-art supervoxel segmentation method in terms of improving segmentation quality by using dense optical flow. We use optical flow as feature and at the same time as a guiding tool for the temporal connection in the initial spatio-temporal graph. We perform experimental analysis on ...