 # Gauss, Berry and Esseen went to Monte Carlo

Normal (or Gaussian) probability distributions are very convenient to work with, due to their regular properties. They are widely used in statistical applications such as hypothesis testing and prediction. They become even more useful thanks to the Central Limit Theorem, which translated into the language of slot machines somewhat simplified states that if we hit the spin button sufficiently many times, our average win distribution can be approximated by a Normal distribution. All that is needed to find the best approximating Normal distribution for a certain number of game rounds is the mean value (RTP) and the variance (volatility).

For instance, many people have seen expressions like

P( m-d < X < m+d ) = 1-a.

It states that the probability is 1-a that the variable X falls within the distance d from its mean value m. The confidence level a is typically 0.1 (10%) or 0.05 (5%). If X is the RTP, this formula gives an idea of what to expect regarding the player return after a certain number of played game rounds. In the formula above, m is the theoretical RTP, and in view of the Central Limit Theorem, d can be approximated by the corresponding parameter from a Normal distribution which is easy to find.

The approximation by Gaussian distributions is also heavily relied upon in the testing phase of a game. A simple statistical method for finding bugs is to calculate the standard deviation for one game round and use expressions similar to the above formula to make predictions regarding the RTP after, say, a million game rounds. If too few or too many groups of a million game rounds have RTP outside the interval (m-d,m+d), then it’s a good idea to start looking for mistakes in the code or the configuration. This is also useful for finding situations with several bugs whose effect on the RTP cancel each other out to some extent – if the bugs have effect on the variance, it should show when grouping the game rounds and comparing the partial results.

So far, so good. But what is “sufficiently many”? The Berry-Esseen Theorem extends the Central Limit Theorem and says, again rather simplified, that the more skewed the distribution is, the more game rounds we may need to play for the Normal distribution to be a good enough approximation. And the probability distributions of games tend to be rather skewed.

As an example, we consider a scratch ticket game with a virtual batch of 200’000 tickets. We repeatedly simulate one million game rounds, which thus corresponds to five full cycles of the game, and compare the result with the best possible Normal distribution. In the diagram below, the Monte Carlo method has been applied 10’001 times on 1’000’000 game rounds, thus over ten billion game rounds have been simulated. The curves show the inverse of the cumulative distribution function for the respective methods. The Monte Carlo curve is of course more jagged than the Normal distribution curve, but they seem to match pretty well. Zooming in at 95%, however, shows a discrepancy of more than one percent. And we all know the importance of one percent! For a multi-line slot machine, the situation is a bit more complicated. It is usually easy to calculate the variance for
the base game when playing only one payline. It gets a bit more tricky to find the variance for the entire game if bonus features are involved. Most importantly, since two separate paylines are not statistically independent, we cannot calculate the variance for play on all lines by just dividing the one line variance by the number of paylines.

The Central Limit Theorem is certainly very useful also for us working with games, but I would recommend to use it with a bit of care!