Ongoing Projects

Incremental Activity Modeling.

Human activity recognition in videos is a difficult but widely studied problem in computer vision due to its numerous practical applications. Most of the state-of-the-art approaches to human activity recognition need an intensive training stage and assume that all of the training examples are labeled and available beforehand. But these assumptions are unrealistic for many applications where we have to deal with streaming videos. In these continuous streaming videos, as new activities are seen, they can be leveraged upon to improve the current activity recognition model. In this work, we aim to develop an incremental activity learning framework that will be able to continuously update the activity models and learn new ones as more videos are seen. Our proposed approach leverages upon state-of-the-art machine learning tools, most notably active learning systems, and leads to the development of an online activity recognition framework for streaming videos. It does not require tedious manual labeling of every incoming examples of each activity class. We perform rigorous experiments on challenging human activity datasets, which demonstrate the robustness of our incremental activity modeling framework

Future Projects

Utilizing Context in Activity Recognition.

Context is significant in human visual systems. As there is no formal definition of context in computer vision, we consider all the detected objects and motion regions as providing contextual information about each other. Activities in natural scenes rarely happen independently. The spatial layout of activities and their sequential patterns provide useful cues for their understanding.

Utilizing Sparse Coding and Deep Learning in Activity Recognition.

Sparse modeling is also called feature selection in pattern recognition. It aims to find the set of features which can be used to solve targeted problem optimally from a larger set of candidate features. In many applications of pattern recognition, extracted features are often redundant or highly correlated. There are advantages for feature selection: since the model features are sparse, it would be more efficient and effective to estimate the parameters. The direct way of feature selection is to find a sparse subset of features that optimizes an objective function which evaluates the quality of using this subset of features in the model building.

Completed Projects

Aerial Video Tracking.

The analysis of videos from aerial platforms remains a challenging and important problem. The most fundamental task in this regard is to be able to detect and track objects reliably from a moving platform. In this paper, we address the problem of multi-target detection and tracking in unconstrained aerial videos. Generally, aerial videos are very unstable due to air turbulence and targets of interest have few discriminating features, which impose strong challenges in tracking objects such as humans and vehicles. In our proposed approach, we stabilize an unstable aerial video using homography transformation. We estimate the homography between two frames of an unstable video by utilizing the geometric constraint of the ground plane. In order to detect targets in a stabilized video frame, we detect motion regions and then identify targets of interest around the motion regions using appearance based pre-trained classifiers. We devise a finite state machine (FSM) that incorporates both motion detection and target classification into a Kalman filter (KF) based tracking-by-detection framework for robustly tracking humans and vehicles across the aerial video frames. Finally, we associate the tracklets by using overlap and appearance based bipartite graph matching and homography projection of the tracklets. We conduct extensive experiments on challenging aerial video datasets, which prove the robustness of our approach compared to other state-of-the-art tracking approaches.

Sample tracking video.

Human Activity Recognition in Surveillence Videos.

We detect seven activities defined by TRECVID SED task such as CellToEar, Embrace, ObjectPut, PeopleMeet, PeopleSplitUp, PersonRuns, and Pointing. We employ two different strategies to detect these activities based on their characteristics. Activities like CellToEar, Embrace, ObjectPut, and Pointing are the results of articulated motion of human parts. Therefore, we employ local spatio-temporal interest point (STIP) feature based bag of words strategy for these activities. Visual vocabularies are constructed from the STIP features and each activity is described by the histograms of visual words. We also construct activity probability map for each camera-activity pair that reflects the spatial distribution of an activity in a camera. We train a discriminative SVM classifier using Gaussian kernel for each camera-activity pair. During evaluation we employ sliding window based technique. We slide spatio-temporal cuboids in both spatial and temporal direction to find a likely activity. The cuboid is also described by the histograms of visual words and final decision is made using the SVM classifier and the activity probability map. For the activities like PeopleMeet, PeopleSplitUp, and PersonRuns, the characteristics of trajectories of persons of interest in the activities are discriminative. For instance, trajectories of PeopleMeet converge along time while those of PeopleSplitUp diverge along time. Therefore, we use track-based string of feature graph (SFG) to recognize these activities. Results of our experimental runs on the evaluation videos are comparable with other participants. Our performances in all the activities are among the top five teams.

A CellToEar event.

Projects Prior to Ph.D.

Bengali License Plate Recognition.

Automatic license plate detection and recognition has numerous applications. A large number of schemes have already been proposed in order to make the detection and recognition process efficient. However, a very little work has been done on Bengali license plate recognition. Wide variation among the license plate patterns, complex background, and the difficulty in segmenting Bengali characters of Bangladeshi license plates make it inefficient to use the existing algorithms. In this paper, we propose a solution for Bengali license plate detection and recognition. We use three stages of conventional license plate recognition system. However, we propose new algorithm in each stage, which are effective for Bengali license plate detection and recognition. We tested our algorithms for over 250 images taken from the road. We achieve over 95% success in Bengali license plate recognition.

Traffic Sign Detection and Recognition.

Automatic detection of road sign is a challenging but demanding job. A new approach namely automatic detection and recognition of traffic signs (ADRTS) considering color segmentation, moment invariants, and neural networks has been proposed in this paper. Experimental result proves the superior performance in the detection and recognition of road signs. Computational time complexity is also quite low that makes it applicable for the real time system

Object Segmentation.

Segmenting homogeneous regions or objects in an image are very much demanding but challenging. Pattern based object segmentation using split and merge (PSM) was proposed to overcome the problems of basic split and merge(SM) algorithm, which is unable to segment properly all types of objects in an image due to huge variations among the objects in size, shape, intensity and orientation. Though the PSM algorithm has better performance than some other image segmentation algorithms, it is completely unable to segment the connected regions in an image and also has higher rate of shape distortion. Addressing these issues, a new algorithm namely object segmentation using block based patterns(OSP) is proposed in this paper considering multi stage merging technique. Experimental results show that the OSP algorithm is not only capable of segmenting connected regions in an image but also yield quite low shape distortion of the regions.