Object Detection, Recognition, and Tracking


As the growth of mobile devices and social networks has been faster than ever, online image and video content has become truly ubiquitous today. Understanding of these images and videos, called vision,  is one of the most primary ways for human being to perceive the world. Computer vision, as a study of enabling machines to see and understand the visual world, is fundamental in advancing Artificial Intelligence.

Object detection, as the task of locating and recognizing object categories in images and videos is a major research field in computer vision. Recent research in object detection has achieved some significant improvement combining  larger labelled data (e.g. ImageNet) and deep architecture of neural-network algorithms (e.g. Convolution Neutral Network, Restricted Boltzmann Machine, etc.)

However, object detection research using deep architectures has been mainly focused on images. Little has been done in videos, one of the fastest growing type of multimedia content. Video understanding, especially large-scale object detection in video, has applications in brand awareness, autonomous cars, augmented reality, etc.


Our research focuses on object detection in video, specifically, we employ deep learning architecture taking advantages of spatial-temporal properties to fold the localization and recognition in a single feed-forward pass through network to quickly extract location and class label for each object in a video, building upon our prior work titled “Highly Accurate Video Object Identification Utilizing Hint Information”. 



Explore the distribution of category-dependent object properties (location, aspect ratio, shape) and spatial-temporal properties of video (temporal coherence and correlations in a sequence of frames)
                                  bird                                                                             car                                                                         cat

                                                                         Aspect ratio distribution of object bounding boxes

Deep Learning on large-scale data set with sliding-windows on dramatically reduced search space


Incorporate tracking and key frame extraction