|Title||:||Expressive Part Interactions in Human Pose Estimation|
|Speaker||:||Anoop R Katti (IITM)|
|Details||:||Fri, 20 Nov, 2015 11:00 AM @ BSB 361|
|Abstract:||:||Human Pose Estimation is the task of automatically locating the body-parts given an image of a human. One of the most successful approaches to human pose estimation is Pictorial Structure. Here, the human body is modeled as an articulation of deformable body parts, flexibly connected to each other via spring-like connections. The connections between the parts are typically restricted to a tree structure in order to perform efficient pose estimation using Dynamic Programming.
The downside of such a restriction is that it leaves many part interactions unhandled. Two of the most common and strong manifestations of such unhandled interactions are self-occlusion among the parts and the confusion in the localization of the non-adjacent symmetric parts. By handling the self-occlusion in a data efficient manner, we improve the performance of the basic Mixture of Parts model by a large margin, especially on uncommon poses. Through addressing the confusion in the symmetric limb localization using a combination of two complementing trees, we improve the performance on all the parts by atmost doubling the running time. Finally, we show that the combination of the two solutions improves the results. We report results that are equivalent to the state-of-the-art on two standard datasets. Because of maintaining the tree-structured interactions and only part-level modeling of the base Mixture of Parts model, this is achieved in time that is much less than the best performing part-based model.
In the second part of the thesis, we explore solving pose estimation with a different paradigm called Convolutional Neural Networks (CNNs). A CNN is a Deep Learning algorithm containing many alternating layers of convolution and spatial-pooling, finally ending in a layer specific to the task. CNNs gained populatiry in the recent years for their remarkable improvements in the Image Classification problem initially, followed by many other Computer Vision tasks. Despite their wide success, CNNs are notorious for being highly sensitive to many factors like the architecture of the CNN, the hyper parameters (learning rate, momentum), the training procedure etc. Here, we begin with a part-based CNN approach (similar to Pictorial Structure). We explore the caveats of implementing a part-based model in CNN and learning it end to end. Following this, we explore incorporating a more sophisticated spatial model in the base method and show the obtained results.