Advance warning: This is a very boring post.

In my last post I outlined a ludicrously over-simplified model for why you might want to consider high variance strategies.

I’ve been thinking over some of the modelling assumptions and wondering whether it could be made a bit less over-simplified. The only ones that are obviously easy to weaken are the assumptions on the distribution shape.

Here’s an example that shows you need *some* assumptions on the distribution shape. Consider a standard distribution \(Z\) with \(P(Z = 1) = P(Z = -1) = \frac{1}{2}\) and suppose we can choose strategies of the form \(\mu + \sigma Z\). Note that \(E(Z) = 0\) and \(\mathrm{Var}(Z) = 1\) so these really are the mean and standard deviation of our distributions.

But \(E(\max\limits_{1 \leq k \leq n} Z_i) = 1 (1 – 2^{-n}) – 2^{-n} = 1 – 2^{1 – n}\) (because the maximum takes the value \(-1\) only if all of the individual values are \(-1\), which happens with probability \(2^{-n}\)). So \(E(\max\limits_{1 \leq k \leq n} \mu + \sigma Z_i) = \mu + (1 – 2^{1 – n} \sigma\). \(1 – 2^{1 – n} < 1\), so you’re always better off raising \(\mu\) rather than \(\sigma\).

The interesting feature of this example is that \(P(X_k \leq \mu + \sigma) = 1\). If this happens then it will always be the case that \(E(\mathrm{max}(X_k) ) \leq \mu + \sigma\) so there’s no real benefit to raising \(\sigma\) instead of \(\mu\) (note: It’s conceivable that there’s some complicated dependency on \(\mu\) as a parameter, but I’m just going to assume that \(\mu\) is purely positional and not worry about that).

You only need to go slightly beyond that to show that for some sufficiently large group you’ll always eventually be better off raising \(\sigma\) rather than \(\mu\).

Suppose all our strategies are drawn from some distribution \(X = \mu + Z^\sigma\) with \(E(Z^\sigma) = 0\). The only dependency on \(\sigma\) that we care about is that \(P(Z^\sigma \geq (1 + \epsilon)\sigma \geq p\). for fixed \(\epsilon > 0, 0 < p < 1\) and all \(\sigma > 0\) (this is trivially satisfied by the normal distribution for example).

Then we have \(E(\max\limits_{1 \leq k \leq n} X_n) = \mu +E(\max\limits_{1 \leq k \leq n} Z^\sigma_n)\).

So we now just want to find some lower bounds on \(T_n = E(\max\limits_{1 \leq k \leq n} Z^\sigma_n)\). We’ll split this up as three variables. Let \(T_n = U_n + V_n + W_n\) where \(U_n = T_n \mathbb{1}_{T_n \leq 0}\), \(V_n = T_n \mathbb{1}_{0 < T_n < (1 + \epsilon) \sigma }\) and \(W_n = T_n \mathbb{1}_{(1 + \epsilon) \sigma \leq T_n }\).

Because \(U_n \geq 0\) and \(W_n \leq (1 + \epsilon) \sigma\) this gives us the lower bound \(E(T_n) \geq E(U_n) + (1 + \epsilon)\sigma P(W_n \geq (1 + \epsilon) \sigma) \geq E(U_n) + (1 + \epsilon)\sigma (1 – p)^n\).

We now just need to bound \(U_n\) below. But \(U_n \geq U_1 \mathbb{1}_{T_k \leq 0, k \geq 2}\). But these two random variables are independent so \(E(U_n) \geq E(U_1) P(Z \leq 0)^{n – 1}\). Therefore \(E(T_n) \geq + (1 + \epsilon)\sigma (1 – p)^n\)

This lower bound lets us show a much less pretty version of our last result:

Given a strategy \(\mu, \sigma\) being employed by \(n\) people, and given some increase \(a\) which could go to either \(\mu\) or \(\sigma\) there exists some sufficiently large \(m\) such that for \(m\) people, changing the strategy to \(\mu, \sigma + a\) would beat changing the strategy to \(\mu + a, \sigma\).

Yeah, that phrasing is kinda gross to me too.

Note though that if we go back to the previous case where \(\sigma\) is just a scaling parameter and are just dropping the normality strategy, we can use our lower bound on \(E(T_n)\) to find some \(n\) for which \(E(T_n|\sigma = 1) > 1\) and for all \(m \geq n\) it will be beneficial to increase \(\sigma_n\).

Note by the way the crucial role of \(\epsilon\). I think if you consider a distribution that takes with equal probability the values \(\sigma, -\sigma, \sigma + 2^{-\sigma}, -\sigma – 2^{-\sigma}\) (note that \(\sigma\) is not the standard deviation here) then it’s not actually helpful to raise \(\sigma\) instead of \(\mu\), even though \(P(Z^\sigma > \sigma) = \frac{1}{4}\). I have not bothered to work out the details.