25 June, 2009

The Monty (Python) Hall program

Brendan at Siris has highlighted and discussed an interesting fallacy, and his discussion of it reminded me of the Monty Hall problem, which has always gotten my goat. (Pardon the pun.)

I left a comment, and while thinking about it a few minutes later, it occurred to me to try the problem out in a computer program. This way, I could simulate several hundred or thousand games in less time than it takes to blink an eye. Ain't computers grand?

For this I turned to Sage, a computer algebra system that uses the programming language Python as its interface with the user. It would be very, very easy to write a Sage program that would run the Monty Hall problem many, many times, building a sufficiently large sample space that the correct answer would be obvious. Moreover, Sage is free software that can be run from a web interface found at the website linked above, so anyone can try it, mess with the code, and see it in action. Thus was born the Monty (Python) Hall program. (Pardon the pun.)

While writing the code and thinking about how to implement the problem, I (finally) saw something that I had never really grasped before. (To see the code, click on "Read more".)

def monty_python_hall_program(n=6000):
switch_wins = 0
stay_wins = 0
doors = Set([1,2,3])
for each in range(n):
winning_pick = randint(1,3)
initial_pick = randint(1,3)
if winning_pick == initial_pick:
stay_wins += 1
else:
switch_wins += 1
print "Probability that switching wins:" , round(switch_wins*100/n,1)
print "Probability that staying wins:", round(stay_wins*100/n,1)
What I I realized is that the information given by the host is essentially useless. It doesn't affect the end result at all, because to decide whether the player wins by switching, one simply tests whether his initial choice was the winning door! You see this in the following if statement:
if winning_pick == initial_pick:
This line does not depend on the host's action at all.

The probability of the initial choice being the winning choice is 1/3, period, full stop. The probability of switching to the correct door is 2/3. The host's revelation of a wrong door doesn't change anything.

Here are the results of the program:
sage: monty_python_hall_program()
Probability that switching wins: 66.0
Probability that staying wins: 33.0
sage: monty_python_hall_program()
Probability that switching wins: 66.0
Probability that staying wins: 33.0
sage: monty_python_hall_program()
Probability that switching wins: 67.0
Probability that staying wins: 32.0
That may look convincing, but it shouldn't because I designed the program to produce those numbers. The real question is whether the program is designed correctly, which depends on whether my "insight" is correct.

Wikipedia does a pretty good job of explaining it, and I had read it a couple of times before. I "understood" it, but was too thick to grasp the explanation. I think (perhaps wrongly) that now I understand it.

I could be wrong, though! Perhaps the host's additional information does need to be implemented in a way that I didn't see. It's late at night, and I'm not a probabilist, so I beg your forbearance in that case. In any case, the code is here (again, click on "Read more"); feel free to leave a comment indicating any error. If you want to suggest a way to implement the host's information, first think carefully about whether it can be optimized out, especially since the player wins by staying if and only if the following boolean expression line evaluates to true:
if winning_pick == initial_pick:
and that expression does not depend in any way on which door is opened by the host!

4 comments:

Tom L said...

Your program is correct because it does use the information provided by the host (albeit implicitly). The only reason switch always wins in the else case is that the host has necessarily eliminated the other wrong choice (the one you didn't initially pick). The host's action is implicit in the statement switch_wins += 1.

jack perry said...

Thanks!

Something like that actually occurred to me as I was walking from my car to the building this morning: the probability of a win is 1/3, and the probability of a loss is 2/3. The host has opened a door, so the probability of a loss remains the same. There's only one door to switch to, so the probability of winning with that door is the probability of a loss with the initial pick: 2/3. So I agree with you that it uses the host's information implicitly; I'm just amazed that it comes out of a program that doesn't even bother to check what the host did.

By contrast, I found some programs online which engaged in a quite complicated analysis of what the host did. In at least one case, the analysis was even wrong, but fortuitously arrived at the wrong answer nevertheless!

I shouldn't do these things late at night; I really shouldn't— but the programming exercise does help I think.

Now let me go see if the department's resident statistician can explain the problem…

jack perry said...

...er I mean, that program fortuitously arrived at the correct answer nevertheless!

Clemens said...

Dang! Until I read your correction I thought you were being unusually droll.

It was the only part I thought I understood.