Distorted bandits or How I learned to be risk-seeking without regretting it

NCC 2025

Prashanth L.A.

Mar 8, 2025

Risk and uncertainty

By “uncertain” knowledge, let me explain, I do not mean merely to distinguish what is known for certain from what is only probable.

The sense in which I am using the term is that in which the prospect of a European war is uncertain….

 

There is no scientific basis to form any calculable probability whatever.

We simply do not know.

Talk outline

 

 

 

Part 0: Introduction to Distortion Riskmetrics (DRMs)

Risk Measures Deviation Measures
Value at Risk (VaR) Mean-median deviation
Conditional VaR (CVaR) Inter-quantile range
L-functionals (statistics) Wang’s right-tail deviation
Distortion risk measures Inter-expected shortfall
Gini deviation
Cumulative Tsallis past entropy
Gini shortfall
Rank-based decision-making in decision theory

Distortion Riskmetric: definition

Risk maximization

Quote from Cover (1991), “Universal portfolios”

In general, volatile uncorrelated stocks lead to great gains for the rate at which a portfolio grows...

 

Quote from V. Anantharam and V. S. Borkar (2017), “A variational formula for risk-sensitive reward.”

Work on *risk-sensitive reward maximization* has been relatively uncommon; see, e.g., [24]. Unlike in the case of the classical discounted or ergodic costs, the two risk-sensitive control problems are not trivially equivalent by treating cost as a negative reward. In fact, risk-sensitive reward maximization is the natural set-up in portfolio optimization...

Part I: Distortion riskmetrics + bandits

 

Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms
M. Tatli, A. Mukherjee, P. L.A., K. Shanmugam and A. Tajer
AISTATS 2025 (To appear)

Summary

  • Estimation: \(K\)-continuous valued mixing coefficients
  • Tracking: optimal mixture

Multi-armed Bandits: A Sequential Experimental Design Framework

 

Risk-neutral bandit problem setup

Risk-neutral Objective: Regret minimization (Exploration-Exploitation trade-off)

Minimize cumulative regret:

\[R_T\triangleq T\mu_{a^\star} - \sum\limits_{s=1}^T \mathbb{E}[X_{A_s}]\]

Bandit Settings and Applications

:::

Bandit Settings and Applications – Focus

Risk-Sensitive Decision Making

Another Motivating Example

Which one would you go for?

Main message

Human preferences can be explained using distorted probabilities

People usually overweight extreme/unlikely events

How to distort the probabilities? Distortion riskmetrics (DR)

Probabilistic distortions: basis for Nobel-prize winning Prospect Theory work of Tversky and Kahnemann

Risk Nomenclature

:::

Gini Deviation

  • Distortion function: \(h(p) = p(1-p)\)

  • For Bernoulli CDF \(\mathbb{F} = {\sf Bern}(p)\), we have \(U(\mathbb{F}) = p(1-p)\)

  • Downweights lower and higher probabilities

Linking Risk-Sensitivity and Experimental Design

  • Human-in-the-loop decision making is sensitive to decision risks

  • Example bandit applications: clinical trials / investment portfolios

  • Average reward is risk-neutral – not suitable

  • Question: How to sequentially control risk?

  • Use Risk-Sensitive Utilities: Functions of arm distributions (not just the first moment)

  • Examples: Variance, CVaR, Gini deviation, Sharpe ratio, many others

:::

Risk-Sensitive Bandits: Existing Literature

Sporadic investigations on monotone distortion functions:

Quantile-based measures

  • (Szorenyi et. al. 2015) (regret minimization)
  • (David et. al. 2018) (best arm identification)
  • (Zhang et. al. 2021) (best arm identification)

CVaR

  • (Baudry et. al. 2018) (regret minimization)
  • (Agrawal et. al. 2021) (best arm identification)

Focus: Towards a unifying approach

A Unified Framework for Risk Measures (Cassel et. al. 2018)

Gaps in the Literature...

  • Convexity does not hold for various riskmetrics!

  • Concave + non-monotone distortion function \(\implies\) optimal mixtures!

  • Counter-example: Gini deviation, \(K=2\) arms

    \[U(\alpha p_1 + (1-\alpha)p_2) > \max\{U(p_1),U(p_2)\}\]

Question: Can we construct regret-efficient algorithms for riskmetrics which have optimal mixtures?

Key Challenge: how to track mixtures?

:::

Revised Objective: Regret w.r.t. Infinite Horizon Oracle Policy

\[ \mathfrak{R}_{\mathbf{\nu}}^\pi(T)\;\triangleq\; U\left ( \sum\limits_{i\in[K]}\alpha_{\mathbf{\nu}}^\star(i)\mathbb{F}_i\right ) - \mathbb{E}_{\mathbf{\nu}}^\pi\Bigg [ U\Bigg ( \sum\limits_{i\in[K]}\frac{\tau^\pi_T(i)}{T}\mathbb{F}_i\Bigg )\Bigg] \]

Algorithm Design – Challenges

Algorithm Design Components

Risk-Sensitive Explore Then Commit for Mixtures (RS-ETC-M)

Component 1: Estimating mixtures...

Risk-Sensitive Explore Then Commit for Mixtures (RS-ETC-M)

Component 2: Tracking the estimated mixtures...

Drawback: Assumes knowledge of instance-dependent parameters (through \(N(\varepsilon)\))

Risk-Sensitive Upper Confidence Bound for Mixture (RS-UCB-M)

Component 1: Estimating mixtures...

Open question Can we design a regret-efficient algorithm that implicitly explores arms in a linear order?

Risk-Sensitive Upper Confidence Bound for Mixture (RS-UCB-M)

Risk-Sensitive Upper Confidence Bound for Mixture (RS-UCB-M)

Component 2: Tracking the estimated mixtures...

How I learned to stop regretting…

Performance Guarantees (Takeaways)

In contrast, vanilla ETC and UCB have the same performance guarantees in the risk-neutral case!

Risk and happiness: A matter of perspective