|Title||:||Topic Labeled Text Classification: A Weakly Supervised Approach|
|Speaker||:||Hingmire Swapnil Vishveshwar (IITM)|
|Details||:||Tue, 4 Apr, 2017 2:00 PM @ BSB 361|
|Abstract:||:||Text classification helps in organizing and using the knowledge hidden in a large collection of documents such as the World Wide Web. The effectiveness of supervised text classification depends critically on the number and nature of labeled documents. Unfortunately, labeling a large number of documents is a labor-intensive and time-consuming process that involves significant human intervention. It is, therefore, important from a practical standpoint to explore text classification approaches that reduce the time, effort and cost involved in creating labeled corpora.
Towards the goal of making the best use of the human expertise that is available and bringing down cognitive load of labellers, we discuss a weakly supervised text classification algorithm based on the labeling of Latent Dirichlet Allocation (LDA) topics. Our algorithm is based on the generative property of LDA. In our algorithm, we ask an annotator to assign one or more class labels to each topic, based on its most probable words. We classify a document based on its posterior topic proportions and the class labels of the topics. We also enhance our approach by incorporating domain knowledge in the form of labeled words. We evaluate our approach on four real-world text classification datasets. The results show that our approach is more accurate in comparison to semi-supervised techniques from previous work. A central contribution of this work is an approach that delivers effectiveness comparable to the state-of-the-art supervised techniques in hard-to-classify domains, with very low overheads in terms of manual knowledge engineering.