Associate Professor
Computer Science and Engineering
Indian Institute of Technology Madras
Chennai - 600 036
My research focuses on bringing parity in AI technologies for Indian languages with respect to English with open-source contributions in tools, datasets, neural models, and reference applications. My specific areas of interest are pretrained multilingual models for natural language and speech, neural machine translation, efficient models for automatic speech recognition, and evaluation metrics for natural language generation. Please visit the Nilekani Centre at AI4Bharat to know more about my work.
[J10] Jay Gala, Pranjal A Chitale, A K Raghavan, Varun Gumma, Sumanth Doddapaneni, Aswanth Kumar M, Janki Atul Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M Khapra, Raj Dabre, Anoop Kunchukuttan. IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages. Transactions on Machine Learning Research ( TMLR ), 2022.
[J09] Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Mahalakshmi J, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra. Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages. Transactions of the Association for Computational Linguistics,( TACL ), 2022.
[J08] Ananya B. Sai, M Akash Kumar, Mitesh M. Khapra. A Survey of Evaluation Metrics Used for NLG Systems. ACM Computing Surveys (ACM CSUR), 2021.
[J07] Ananya B. Sai, M Akash Kumar, Siddharatha Arora, Mitesh M. Khapra. Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining. Transactions of the Association for Computational Linguistics (TACL), 2020.
[J06] Suman Banerjee, Mitesh M. Khapra. Graph Convolutional Network with Sequential Attention for Goal-oriented Dialogue Systems. Transactions of the Association for Computational Linguistics (TACL), 2019
[J05] Deepak Mittal, Shweta Bhardwaj, Mitesh M. Khapra, Balaraman Ravindran. Studying the Plasticity in Deep Convolutional Neural Networks using Random Pruning. To appear in the Journal of Machine Vision and Applications (MVA). Springer.
[J04] Rudra Murthy, Mitesh M. Khapra, Dr. Pushpak Bhattacharyya. Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning. The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2018.
[J03] Anoop Kunchukuttan, Mitesh Khapra, Gurneet Singh, Pushpak Bhattacharyya. Leveraging Orthographic Similarity for Multilingual Neural Transliteration. Transactions of the Association for Computational Linguistics (TACL), 2018.
[J02] Sarath Chandar, Mitesh M. Khapra, Hugo Larochelle, Balaraman Ravindran, Correlational Neural Networks, Neural Computation, February 2016.
[J01] A Kumaran, Mitesh M. Khapra and Pushpak Bhattacharyya, Compositional Machine Transliteration, accepted for publication in Transactions on Asian Language Information Processing (TALIP Journal), December 2010.
[C52] Yash Madhani, Sushane Parthan, Priyanka Bedekar, Gokul NC, Ruchi Khapra, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Khapra. Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users. In Proceedings of Empirical Methods in Natural Language Processing Findings (EMNLP Findings 2023), Singapore, December 2023.
[C51] Gokul Karthik Kumar*, Praveen S V*, Pratyush Kumar, Mitesh M. Khapra, Karthik Nandakumar. Towards Building Text-To-Speech Systems for the Next Billion Users. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023), Rhodes Island, Greece, June, 2022.
[C51] Kaushal Bhogale, Abhigyan Raman, Tahir Javed, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra. Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages. In 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023), Rhodes Island, Greece, June, 2023.
[C50] Tahir Javed, Kaushal Santosh Bhogale, Abhigyan Raman, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra. IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2023), Washington, DC, USA, February, 2023.
[C49] Aman Kumar, Himani Shrotriya, Prachi Sahu, Amogh Mishra, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Mitesh M. Khapra, Pratyush Kumar: IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages. In Proceedings of Empirical Methods in Natural Language Processing Findings (EMNLP Findings 2022), Abu Dhabi, December 2022.
[C48] Emil Biju, Anirudh Sriram, Pratyush Kumar, Mitesh M. Khapra: Input-specific Attention Subnetworks for Adversarial Detection. Findings of the Association for Computational Linguistics (ACL -Findings 2022), Dublin, Ireland, May 2022.
[C47] Raj Dabre, Himani Shrotriya, Anoop Kunchukuttan, Ratish Puduppully, Mitesh Khapra, and Pratyush Kumar. IndicBART: A Pre-trained Model for Indic Natural Language Generation. In Findings of the Association for Computational Linguistics (ACL -Findings 2022), Dublin, Ireland, May 2022.
[C46] Prem Selvaraj, Gokul Nc, Pratyush Kumar, Mitesh M. Khapra: OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages. In Proceedings of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland, May 2022.
[C45] Akash Kumar Mohankumar, Mitesh M. Khapra: Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons. In Proceedings of the Association for Computational Linguistics (ACL 2022 - Outstanding Paper Award ), Dublin, Ireland, May 2022.
[C44] Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M Khapra. Towards Building ASR Systems for the Next Billion Users. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2022), February, 2022.
[C43] Ananya B. Sai, Tanay Dixit, Dev Sheth, Sreyas Mohan and Mitesh M. Khapra: Perturbation CheckLists for Evaluating NLG Evaluation Metrics. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2021), Punta Cana, Dominican Republic, November, 2021.
[C42] Dev Yashpal Sheth, Sreyas Mohan, Joshua Vincent, Ramon Manzorro, Peter A. Crozier, Mitesh M. Khapra, Eero P. Simoncelli and Carlos Fernandez-Granda: Unsupervised Deep Video Denoising. IEEE/CVF International Conference on Computer Vision (ICCV 2021 ), October, 2021.
[C41] Madhura Pande, Aakriti Budhraja, Preksha Nema, Pratyush Kumar, Mitesh M. Khapra: The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), February, 2021.
[C40] Pritha Ganguly, Nitesh Methani, Mitesh M. Khapra, Pratyush Kumar: A Systematic Evaluation of Object Detection Networks for Scientific Plots. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), February, 2021.
[C39] Emil Biju, Anirudh Sriram, Mitesh M. Khapra, Pratyush Kumar: Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages.In Proceedings of the The 27th International Conference on Computational Linguistics (COLING 2020), Nov-Dec 2020.
[C38] Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul N. C., Avik Bhattacharyya, Mitesh M. Khapra, Pratyush Kumar: iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages. In Proceedings of Empirical Methods in Natural Language Processing Findings (EMNLP Findings 2020), November 2020.
[C36] M Akash Kumar, Preksha Nema, Sharan Narasimhan, Mitesh M. Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran: Towards Transparent and Explainable Attention Models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2020), July, 2020.
[C35] Nitesh Methani, Pritha Ganguly, Mitesh M. Khapra, Pratyush Kumar: PlotQA: Reasoning over Scientific Plots. In the Proceedings of the Eighteenth IEEE Winter Conference on Applications of Computer Vision (WACV 2020), Aspen, Colorado, USA, March 2020.
[C34] Preksha Nema, M Akash Kumar, Mitesh M. Khapra, Balaji V Srinivasan, Balaraman Ravindran: Let's Ask Again: Refine Network for Automatic Question Generation. In Proceedings of Empirical Methods in Natural Language Processing ( EMNLP 2019 ), Hong Kong, November 2019.
[C33] Shweta Bhardwaj, Mitesh M. Khapra and Mukundhan Srinivasan: Efficient Video Classification Using Fewer Frames. IEEE International Conference on Computer Vision and Pattern Recognition ( CVPR 2019 ), Long Beach, California, USA, June, 2019
[C32] Siddhartha Arora, Mitesh M. Khapra and Harish G. Ramaswamy: On Knowledge distillation from complex networks for response prediction. In Proceedings of 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics ( NAACL 2019 ), Minneapolis, USA, June 2–7, 2019
[C31] Ananya Sai, Mithun Das Gupta, Mukundhan Srinivasan, Mitesh M. Khapra: Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses, In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019), Honolulu, Hawaii, USA, January - February, 2019
[C30] Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh M. Khapra, Karthik Sankaranarayanan, Harish Ramaswamy: On Controllable Sparse Alternatives to Softmax, Neural Information Processing Systems (NeurIPS 2018), Montreal, December 2018
[C29] Preksha Nema and Mitesh M. Khapra: Towards a Better Metric for Evaluating Question Generation Systems. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, NOvember 2018.
[C28] Nikita Moghe, Siddhartha Arora, Suman Banerjee and Mitesh M. Khapra: Towards Exploiting Background Knowledge for Building Conversation Systems. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, NOvember 2018.
[C27] Suman Banerjee, Nikita Moghe, Siddhartha Arora, Mitesh M. Khapra: A Dataset for Building Code-Mixed Goal Oriented Conversation Systems. In Proceedings of the The 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, New-Mexico, USA, August 2018.
[C26] Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra and Karthik Sankaranarayanan: DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, July, 2018.
[C25] Soham Parikh, Ananya Sai, Preksha Nema, Mitesh M Khapra: ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice Questions In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, July, 2018.
[C24] Preksha Nema, Shreyas Shetty M, Parag Jain, Anirban Laha, Karthik Sankaranarayanan and Mitesh M. Khapra: Generating Descriptions from Structured Data Using a Bifocal Attention Mechanism and Gated Orthogonalization. In Proceedings of 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2018), New Orleans, June, 2018.
[C23] Amrita Saha, Megha Nawhal, Mitesh M. Khapra, Vikas Raykar: Learning Disentangled Multimodal Representations for the Fashion Domain. In the Proceedings of the Eighteenth IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, NV/CA, USA, March 2018.
[C22] Deepak Mittal, Shweta Bhardwaj, Mitesh M. Khapra, Balaraman Ravindran: Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks. In the Proceedings of the Eighteenth IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, NV/CA, USA, March 2018.
[C21] Amrita Saha, Vardaan Pahuja, Mitesh M. Khapra, Karthik Sankaranarayanan, Sarath Chandar : Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, Louisiana, USA, February 2018.
[C20] Amrita Saha, Mitesh M. Khapra, Karthik Sankaranarayanan : Towards Building Large Scale Multimodal Domain-Aware Conversation Systems. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, Louisiana, USA, February 2018.
[C19] Preksha Nema, Mitesh M. Khapra, Anirban Laha, Balaraman Ravindran: Diversity driven attention model for query-based abstractive summarization. In the Proceedings of the Fifty-Fifth Annual Meeting of the Association of Computational Linguistics (ACL 2017), Vancouver, Canada, July 2017.
[C18] Sathish Reddy, Dinesh Raghu, Mitesh M. Khapra, Sachindra Joshi: Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia, Spain, April, 2017.
[C17] Amrita Saha, Mitesh M. Khapra, Sarath Chandar, Janarthanan Rajendran, Kyunghyun Cho: A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation. Computational Linguistics Conference (COLING 2016), Osaka, Japan, December 2016
[C16] Janarthanan Rajendran, Mitesh M. Khapra, Sarath Chandar, Balaraman Ravindran: Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning. In North American Association of Computational Linguistics (NAACL 2016), Atlanta, USA, June 2016, pp. 171–181.
[C15] Ruty Rinott, Lena Dankin, Carlos Alzate Perez, Mitesh M. Khapra, Ehud Aharoni, Noam Slonim: Show Me Your Evidence - an Automatic Method for Context Dependent Evidence Detection. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2015), Portugal, September 2015, pp. 440-450.
[C14] A. P. Sarath Chandar, Stanislas Lauly, Hugo Larochelle, Mitesh M. Khapra, Balaraman Ravindran, Vikas C. Raykar, Amrita Saha, An Autoencoder Approach to Learning Bilingual Word Representations, Neural Information Processing Systems (NeurIPS 2014), Montreal, December 2014, pp. 1853-1861.
[C13] Mitesh M. Khapra, Ananthakrishnan Ramanathan, Anoop Kunchukuttan, Karthik Visweswariah, Pushpak Bhattacharyya, When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland, May 2014, pp. 196-202
[C12] Mitesh M. Khapra, Ananthakrishnan Ramanathan, Karthik Visweswariah, Improving reordering performance using higher order and structural features, in North American Association of Computational Linguistics (NAACL 2013), Atlanta, USA, June 2013, pp. 315-324.
[C11] Karthik Visweswariah, Mitesh M. Khapra, Ananthakrishnan Ramanathan, Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation, in Annual Meeting of the Association of Computational Linguistics (ACL 2013), Bulgaria, August 2013, pp. 1275-1284.
[C10] Mitesh M. Khapra, Salil Joshi, Arindam Chatterjee and Pushpak Bhattacharyya, Together We Can: Bilingual Bootstrapping for WSD , Annual Meeting of the Association of Computational Linguistics (ACL 2011) Oregon, USA, June 2011, pp. 561-569.
[C09] Mitesh M. Khapra, Salil Joshi and Pushpak Bhattacharyya, It Takes Two to Tango: A Bilingual Unsupervised Approach for Estimating Sense Distributions using Expectation Maximization , 5th International Conference on Natural Language Processing (IJCNLP 2011), Chiang Mai, Thailand, November 2011, pp. 695-704.
[C08] Mitesh M. Khapra, Raghavendra Udupa, A. Kumaran, and Pushpak Bhattacharya, PR + RQ ≈ P Q: Transliteration Mining Using Bridge Language, in American Association for Artificial Intelligence (AAAI 2010) , July 2010.
[C07] Mitesh Khapra, Anup Kulkarni, Saurabh Sohoney and Pushpak Bhattacharyya, All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision, Conference of Association of Computational Linguistics (ACL 2010), Uppsala, Sweden, July 2010, pp. 1532-1541.
[C06] Harshada Gune, Mugdha Bapat, Mitesh Khapra and Pushpak Bhattacharyya, Verbs are where all the Action Lies: Experinces of Shallow Parsing of a Morphologically Rich Language, Computational Linguistics Conference (COLING 2010), Beijing, China, August 2010, pp. 347- 355.
[C05] Mitesh M. Khapra, Saurabh Sohoney, Anup Kulkarni and Pushpak Bhattacharyya, Value for Money: Balancing Annotation Effort, Lexicon Building and Accuracy for Multilingual WSD, Computational Linguistics Conference (COLING 2010), Beijing, China, August 2010, pp. 555-563.
[C04] Raghavendra Udupa and Mitesh M. Khapra, Transliteration Equivalence using Canonical Correlation Analysis, in European Conference on Information Retrieval (ECIR 2010), March 2010, UK, pp. 75-86.
[C03] Mitesh M. Khapra, A Kumaran and Pushpak Bhattacharyya. Everybody loves a rich cousin: An empirical study of transliteration through bridge languages, in North American Association of Computational Linguistics (NAACL 2010, June 2010, Los Angeles, USA, pp. 420-428.
[C02] Raghavendra Udupa and Mitesh M. Khapra. Improving the Multilingual User Experience of Wikipedia Using Cross-Language Name Search, in North American Association of Computational Linguistics (NAACL 2010), June 2010, Los Angeles, USA, pp. 420-428.
[C01] Mitesh M. Khapra, Sapan Shah, Piyush Kedia and Pushpak Bhattacharyya, Projecting Parameters for Multilingual Word Sense Disambiguation, Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, August, 2009, pp. 459-467.