# Sandbox

## The Bulgarian Toto 6 of 42 lottery

The Bulgarian Toto 6 of 42 lottery was the subject of an investigation after the [1] same set of six numbers {4, 15, 23, 24, 35, 42] was drawn in two successive lotteries on September 6 and September 10, 2009. The article cites a mathematician as stating that the probability of picking the same six numbers twice in a row is 4,200,000:1. We wondered how he arrived at this number. What is the probability that a specified set of six numbers will repeat consecutively?

There are <math>{42 \choose 6} = 5245786</math> different sets of six numbers and the probability that a SPECIFIED set will occur in the next two consecutive draws is <math>1/5245786^2</math>. The probability that SOME set will occur in the next two consecutive draws is <math>5245786 \times 1/5245786^2 = /5245786</math>.

But now, suppose the lottery has been running continuously for <math>m</math> draws and we ask what the chance is that during this period there were consecutive draws of the same set. As before, first consider a fixed set of six numbers.

There are <math>m-1</math> opportunities for this set to be drawn twice in succession (beginning with the second drawing). The probability that this will happen is then the probability of the union <math>P(A) = P(\cup_i A_i A_{i+1}) </math> where <math>A_i</math> is the event that this set of numbers is drawn on the ith draw.

Bonferroni's inequality gives the upper bound <math>P(A) \le \sum_i P(A_i A_{i+1})</math> while Hunter's inequality gives the lower bound <math>P(A) \ge \sum_i P(A_i A_{i+1}) - \sum_i P(A_i A_{i+1}A_{i+2}).</math>

We assume (!) that the events <math>A_i</math> are independent and identically distributed with probability <math>p = 1/5245786</math> leading to <math>(m-1) p^2 - (m-2) p^3 \le P(A) \le (m-1) p^2</math>. Since <math>p</math> is very small the <math>p^3</math> term can be ignored giving <math>P(A) \approx (m-1)/5245786^2.</math>

It appears that the draws are held twice per week so for one year <math>m = 104</math> giving the probability <math>3.74 \times 10^{-12}</math> that a specified set of numbers will be drawn twice in succession. According to a spokeswoman the lottery has been taking place for 52 years. Using <math>m = 104 \times 52 = 5408</math>, the probability that a specified set of numbers will be drawn twice in succession over this period is <math>1.89 \times 10^{-10}</math>, still very small.

But now let's ask the question, not for a fixed set of numbers but for some set of numbers. After all, in discussing this coincidence the the repeated set arises by chance alone and is not specified in advance.

In <math>m</math> drawings what is the probability that SOME set of six numbers will be repeated in consecutive draws.

There are 5245786 possible sets of numbers that could be repeated. Enumerate the sets by integers <math>1 \le k ≤ \le 5245786</math> with <math>E_k</math> the event that set <math>k</math> repeats consecutively sometime during these <math>m</math> drawings. The probability of the union <math>P(\cup E_k)</math> is needed. Each of the 5245786 events <math>E_k</math> has probability <math>(m-1)/ 5245786^2</math> and if they were independent we could evaluate the probability using complements as <math>P(\cup E_k) = 1 - (1- (m-1)/5245786^2)^{5245786} \approx 1 - e^{-(m-1)/5245786}</math>. However, they are dependent, but as long as <math>m</math> is small relative to 5245786, Bonferroni's and Hunter's bounds can once again be used to estimate <math>P(\cup E_k) \approx (m-1)/5245786.</math> For <math>m = 5408</math> this is 0.0010302. (Note that assuming independence gives 0.0010307)

This probability relates to one lottery. Suppose we consider all lotteries worldwise and ask for the probability that in some lottery, somewhere, some set of numbers will be repeated consecutively. All lotteries are variant of Toto with different numbers involved. Each lottery will have had its own cumulative number of drawings. In order to gauge the magnitude of the probability wanted, assume that there are <math>x</math> lotteries, each one sharing the same numerical characteristics as the Bulgarian one.

This time we can use independence. The probability that some set will be repeated is 1 minus the probability that in no lottery is a set of numbers selected on two consecutive drawings <math>= 1 - (1 - (m -1)/5245786)^x</math>. For <math>x = 50</math> this is 0.0503 while for <math>x = 100</math> the probability is 0.0980. (An approximation to one significant digit for this range of values of interest is <math>x(m-1)/5245786.</math>)

For a different problem that discusses "very big numbers" see the article about double lottery winner.

Questions.

1. Instead of Hunter's lower bound, what would the second Bonferroni bound give?

2. How many years would the Bulgarian lottery need to be running in order to have the same probability that some set of numbers will appear three times in succession?

3. Instead of demanding that the same set of numbers appear twice in succession, what is the probability that some set of numbers will repeat during <math>m</math> drawings (This is simpler and is the famous birthday problem).

4. The second application of Hunter's bound requires estimating <math>\sum P( E_{k} E_{k+1} )</math> which involves terms of the form <math>P(A_i A_{i+1} B_j B_{j+1} )</math> where <math>A_i</math> is the event that the set <math>k</math> occurs on draw <math>i</math> and <math>B_j</math> is the event that the set <math>k+1</math> occurs on draw <math>j</math>. Each of these terms has probability <math>1/5245786^4</math>. Count the number of terms to validate the claim that <math>P(\cup_k E_k) \approx (m-1)/5245786</math>.

Submitted by Fred Hoppe