1. A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
    Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L.A., Shalabh Bhatnagar
    [arxiv], 2023.

  2. Online Estimation and Optimization of Utility-Based Shortfall Risk
    Vishwajit Hegde, Arvind S. Menon, Prashanth L.A., Krishna Jagannathan
    [arxiv], 2023.

  3. A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization
    Akash Mondal, Prashanth L.A., Shalabh Bhatnagar
    [arxiv], 2022.

Books, Surveys, PhD Thesis

  1. Risk-Sensitive Reinforcement Learning via Policy Gradient Search
    Prashanth L.A. and Michael Fu
    Foundations and Trends in Machine Learning, 2022. [arxiv]

  2. A Survey of Risk-Aware Multi-Armed Bandits
    Vincent Y. F. Tan, Prashanth L.A., and Krishna Jagannathan
    International Joint Conference on Artificial Intelligence (IJCAI) (Survey Track), 2022. [longer version]

  3. Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods
    S.Bhatnagar, H.L.Prasad and Prashanth L.A.
    Lecture Notes in Control and Information Sciences Series, Vol. 434, Springer, ISBN 978-1-4471-4284-3, Edition: 2013, 302 pages.

  4. Resource Allocation for Sequential Decision Making under Uncertainty: Studies in Vehicular Traffic Control, Service Systems, Sensor Networks and Mechanism Design
    Prashanth L.A.
    Indian Institute of Science, 2012 (IEEE ITSS Best Ph.D. Dissertation 2014 - Third Prize). [pdf] [slides for the defense] [slides for plenary talk at IEEE ITSC 2014]

  5. Adaptive feature pursuit: Online adaptation of features in reinforcement learning
    S.Bhatnagar, V.S.Borkar and Prashanth L.A.
    Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (Ed. F. Lewis and D. Liu), IEEE Press Computational Intelligence Series, pp. 517-534, 2012. [pdf]

Journal Papers

  1. A Wasserstein distance approach for concentration of empirical risk estimates
    Prashanth L.A. and Sanjay P. Bhat
    Journal of Machine Learning Research, 2022. [pdf]

  2. Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles
    Nirav Bhavsar and Prashanth L.A.
    IEEE Transactions on Automatic Control, 2022. [arxiv]

  3. Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint
    N. Vijayan and Prashanth L.A.
    Systems & Control Letters, vol. 155, 2021.

  4. Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling
    Prashanth L.A., Nathaniel Korda and Remi Munos
    Machine Learning, doi:10.1007/s10994-020-05912-5, 2021. [arxiv:1306.2557v5] [Code]

  5. Random directions stochastic approximation with deterministic perturbations
    Prashanth L.A., S.Bhatnagar, Nirav Bhavsar, Michael Fu and Steve Marcus
    IEEE Transactions on Automatic Control, vol. 65, no. 6, pp. 2450-2465, June 2020. [arxiv:1808.02871]

  6. Concentration bounds for empirical conditional value-at-risk: The unbounded case
    Ravi Kumar Kolla, Prashanth L.A., Sanjay P. Bhat, Krishna Jagannathan
    Operations Research Letters, Vol. 47, Issue 1, pp. 16-20, 2019. [arxiv]

  7. Stochastic optimization in a cumulative prospect theory framework
    Jie Cheng, Prashanth L.A., Michael Fu, Steve Marcus and Csaba Szepesvari
    IEEE Transactions on Automatic Control, Vol. 63, No. 9, pp. 2867-2882, 2018. [pdf]

  8. Adaptive system optimization using random directions stochastic approximation
    Prashanth L.A., S.Bhatnagar, Michael Fu and Steve Marcus
    IEEE Transactions on Automatic Control, Vol. 62, Issue 5, pp.2223–2238, 2017. [arXiv (slightly old)]

  9. Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs
    Prashanth L.A. and Mohammad Ghavamzadeh
    Machine learning, Vol. 105, No. 3, pp. 367-417, 2016. [arXiv (slightly old)]

  10. A constrained optimization perspective on actor critic algorithms and application to network routing
    Prashanth L.A., H.L.Prasad, S.Bhatnagar and Prakash Chandra
    Systems & Control Letters, Vol.92, pp.46-51, 2016. [arXiv]

  11. Simultaneous Perturbation Methods for Adaptive Labor Staffing in Service Systems
    Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta
    Simulation, Vol 91, Issue 5, pp. 432 - 455, 2015. [arxiv]

  12. Simultaneous Perturbation Newton Algorithms for Simulation Optimization
    S.Bhatnagar and Prashanth L.A.
    Journal of Optimization Theory and Applications, Vol. 164, Issue. 2, pp. 621-643, 2015. [pdf]

  13. Two Timescale Convergent Q-learning for Sleep–Scheduling in Wireless Sensor Networks
    Prashanth L.A., A. Chatterjee and S.Bhatnagar
    Wireless Networks, Vol. 20, Issue. 8, pp. 2589-2604, 2014. [pdf]

  14. Adaptive Smoothed Functional Algorithms for Optimal Staffing Levels in Service Systems
    H.L.Prasad, L.A.Prashanth, S.Bhatnagar and N.Desai
    Service Science (INFORMS), Vol. 5, No. 1, pp. 29-55, 2013. [pdf]

  15. Threshold Tuning using Stochastic Optimization for Graded Signal Control
    Prashanth L.A. and S.Bhatnagar
    IEEE Transactions on Vehicular Technology, Vol. 61, No. 9, pp.3865-3880, 2012. [pdf]

  16. Reinforcement learning with function approximation for traffic signal control
    Prashanth L.A. and S.Bhatnagar
    IEEE Transactions on Intelligent Transportation Systems, Vol. 12, No. 2, pp.412-421, 2011. [pdf]

Proceedings of International Conferences

  1. Adaptive Estimation of Random Vectors with Bandit Feedback
    Dipayan Sen, Prashanth L.A., Aditya Gopalan
    ICC, 2023. [pdf]

  2. A policy gradient approach for optimization of smooth risk measures
    N. Vijayan and Prashanth L.A.
    UAI, 2023. [arxiv]

  3. Generalized Simultaneous Perturbation Stochastic Approximation with Reduced Estimator Bias
    S.Bhatnagar and Prashanth L.A.
    CISS, 2023. [arxiv]

  4. Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation
    Gandharv Patil, Prashanth L.A., Dheeraj Nagaraj, Doina Precup
    AISTATS [arxiv], 2023.

  5. Estimation of Spectral Risk Measures
    Ajay Kumar Pandey, Prashanth L.A. and Sanjay P. Bhat
    AAAI 2021. [arxiv]

  6. Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions
    Prashanth L.A., Krishna Jagannathan and Ravi Kumar Kolla
    ICML, 2020. [arxiv:1901.00997]

  7. Concentration of risk measures: A Wasserstein distance approach
    Sanjay P. Bhat and Prashanth L.A.
    NeurIPS, 2019. [arxiv] [slides]

  8. Correlated bandits or: How to minimize mean-squared error online
    V.P. Boda and Prashanth L.A.
    ICML, 2019. [arxiv]

  9. Weighted bandits or: How bandits learn distorted values that are not expected
    Aditya Gopalan, Prashanth L.A., Michael Fu and Steve Marcus
    AAAI, 2017. [pdf]

  10. Improved Hessian estimation for adaptive random directions stochastic approximation
    D. Sai Koti Reddy, Prashanth L.A. and S.Bhatnagar
    CDC, 2016.

  11. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
    Prashanth L.A., Jie Cheng, Michael Fu, Steve Marcus and Csaba Szepesvari
    ICML, 2016.[pdf] [slides] [longer-version]

  12. (Bandit) Convex Optimization with Biased Noisy Gradient Oracles
    Xiaowei Hu, Prashanth L.A., Andras Gyorgy and Csaba Szepesvari
    AISTATS, 2016. [pdf]

  13. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence
    Nathaniel Korda and Prashanth L.A.
    ICML, 2015.
    [Proof has a bug, rendering the bounds invalid. A fix will happen later than sooner..]

  14. Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games
    H.L.Prasad, Prashanth L.A. and S.Bhatnagar
    AAMAS, 2015. [pdf] [slides]

  15. Fast gradient descent for drifting least squares regression, with application to bandits
    Nathaniel Korda, Prashanth L.A. and Remi Munos
    AAAI, 2015. [pdf] [slides] [Code+Readme]

  16. Simultaneous Perturbation Algorithms for Batch Off-Policy Search
    Raphael Fonteneau and Prashanth L.A.
    CDC, 2014. [pdf] [arXiv]

  17. Policy Gradients for CVaR-Constrained MDPs
    Prashanth L.A.
    ALT, 2014. [pdf] [slides]

  18. Fast LSTD using stochastic approximation: Finite time analysis and application to traffic control
    Prashanth L.A., Nathaniel Korda and Remi Munos
    ECML, 2014. [pdf] [slides]

  19. Actor-Critic Algorithms for Risk-Sensitive MDPs
    Prashanth L.A. and Mohammad Ghavamzadeh
    NIPS (Full oral presentation), 2013. [pdf] [slides]

  20. Mechanisms for Hostile Agents with Capacity Constraints
    Prashanth L.A., H.L.Prasad, N.Desai and S.Bhatnagar
    AAMAS, 2013. [pdf]

  21. Stochastic optimization for adaptive labor staffing in service systems
    Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta
    ICSOC, 2011. [pdf]

  22. Reinforcement Learning with Average Cost for Adaptive Control of Traffic Lights at Intersections
    Prashanth L.A. and S.Bhatnagar
    IEEE ITSC, 2011. [pdf]

Copyright Notice: Since most of these papers are published, the copyright has been transferred to the respective publishers. The following is IEEE's copyright notice; other publishers have similar ones.

IEEE Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therin are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of the works published in IEEE publications in other works must be obtained from the IEEE.