The sisters “paradox” – counter-intuitive probabilityblog.engora.com

82 points by Vermin2000 2 days ago | 130 comments

StopDisinfo910 1 days ago [-]

There is no sisters paradox. The trick is how the question is weirdly framed and has to be interpreted. What people think about when they hear the question would effectively lead to a probability of 0.5: if you see a family in the street with a girl and know they have two kids, the probability of the other kid being a girl is indeed 0.5.

The trick of the so-called "paradox" is turning the question into the Monty Hall but with an ambitious enough formulation that you might be confused it’s not.

6gvONxR4sf7o 1 days ago [-]

The way to see this is bayes rule. p(answer | data) = p(data | answer) * p(answer) / (sum_{all possible answers'} p(data | answer') * p(answer')). So for this question, that's expands to:

    p(both are girls | you're told at least one is a girl)
     = p(you're told at least one is a girl | both are girls) * p(both are girls) / (
            p(you're told at least one is a girl | both are girls) * p(both are girls)
            +
            p(you're told at least one is a girl | they aren't both girls) * p(they aren't both girls)
        )

The problem is that we don't know p(you're told at least one is a girl | they aren't both girls). Clearly if both are boys, then you won't be told at least one is a girl (or at least it's implied that you're told the truth). But that still leaves us p(you're told at least one is a girl | one boy and one girl).

This is the crux of the thing. Different readings of the setup imply different answers to p(what you're told | the unknowns).

It's also a great case of where bayes rule shorthands can be slippery. You'll usually abbreviate it out (hell, it was tedious to write this way even with copy-paste). But if you abbreviate "you're told there's at least one girl" to "there's at least one girl", then you've stopped modeling a crucial part of the setup. p(there's at least one girl | they aren't both girls) has an unambiguous answer.

lqet 1 days ago [-]

This. A less confusing way to ask the question with the 1/3 answer would be:

  What is the probability that a family with 2 children has exactly 2 daughters *if you know that the family does not have 2 boys*?

The reasons why the original problem is so confusing is the same reason why the Monty Hall is so confusing: people have different understandings of the question, and don't realize it in discussions. As I have written a few years ago [0]:

Because most people don't talk formal probabilities, your explanations will be so vague that the other person will not realize your different understanding. You will discuss forever, you will both be right, and you will part ways with the strange feeling that maybe the other person was right, when all along you were talking about different problems. This is why this problem is so notorious.

[0] https://news.ycombinator.com/item?id=24707305

edanm 20 hours ago [-]

> The reasons why the original problem is so confusing is the same reason why the Monty Hall is so confusing: people have different understandings of the question, and don't realize it in discussions.

I think this is true of the "children" question, but I actually disagree that this is what makes the Monty Hall question so confusing.

For one thing, I vaguely recall this being asked directly, and even after people agree on all the definitions explicitly, they still consider the answer wrong. (See e.g. some mathematicians like Erdos refusing to believe the correct answer without actually running simulations on computers... by that point you clearly have a real definition.)

For another, when I personally talk to people about Monty Hall, even after I explain the correct answer, and explain all the nuances, people tend to still have a hard time accepting the correct solution and claim to find it counterintuitive (as did I!).

chatmasta 6 hours ago [-]

The typical Monty Hall formulation has similar ambiguity because it’s not clear whether or not the host knows the right door. Just like in this question, it’s not clear if the “narrator” knows the first child is a girl.

(Also, this problem has an additional layer of ambiguity where “birth order is irrelevant,” but MF and FM are treated as distinct items in the probability set. Is the order irrelevant to the probability, or is it impossible to distinguish the age of the two children? It would be clearer to simply say “each birth is an independent event.” One of the comments on the blog explains this better than I can.)

js2 1 days ago [-]

So it's "what is the probability both are girls?" vs. "what is the probability the other is a girl?" and most people will hear the latter and answer 1/2 whereas the question is the former and its answer is 1/3. Do I have that right?

AnotherGoodName 1 days ago [-]

"The question writer took all sets of two child families and ruled out the bb case. Then they asked the exact question above" This is 1/3 chance - select gg from [gg,bg,gb]'

"The question writer came across a girl from a two child family, then they asked the exact question above". This is 1/2 chance - select gg from [gg, gg, bg, gb] with gg listed twice since there's two ways to select a girl from that set; ie. coming across a girl is twice as likely to occur from the gg case than it is either gb or bg.

I think that's the clearest wording to get the message across. Either way it's the exact same question but it reasonably has a completely different answer. There's no way to resolve this ambiguity with the question as written.

LegionMammal978 1 days ago [-]

That's a good framing. It's similar to the fact that the chance of a given star being in a multiple system (~47% in our vicinity [0]) is significantly higher than the chance of a given system having multiple stars (~30%), because counting by individual stars gives more weight to the multiple systems.

[0] https://astronomy.stackexchange.com/a/55505

SkiFire13 15 hours ago [-]

The issue with asking whether the other is a girl is how you choose the first one.

If you look at one random child, see it's a boy and exclude the family, even though the other child may be a girl, then you get the 1/2 probability. If however in that case you also look at the second child, see that's a girl and consider the family anyway, then you get the 1/3

LudwigNagasena 1 days ago [-]

Those questions are equivalent. What is important is the conditional “… given that I looked at a random child and it was a girl” / “… given that I looked at both children and at least one of them was a girl”.

lIl-IIIl 19 hours ago [-]

The other interpretation that leads to 1/3 probably is also pretty intuitive. That's the fun part of this question is that it leaves crucial information unspecified.

I think this is a reasonable interpretation:

You meet a family at a party. They say "We have two children". You ask "Do you have any girls"? They say "yes!"

This will give you 1/3 probability that the other child is also a girl.

I think this interpretation is more intuitive because it doesn't make any assumptions about how you get your information. Usually in probability questions you assume any information you have is given to you from on high. For example, you just "know" that the family has two children, you don't somehow deduce it. Therefore I assume the same for "one child is a girl" information.

kgwgk 19 hours ago [-]

> I think this interpretation is more intuitive because it doesn't make any assumptions about how you get your information.

Do you mean “interpretation” or “alternative problem”.

Because if it’s an “interpretation” of the original problem you’re indeed making assumptions to fill the unspecified information.

If you mean that it’s an alternative problem which has a definite solution I agree. It’s a different problem and its relevance to the original one is to illustrate that additional assumptions were required.

lIl-IIIl 18 hours ago [-]

I mean the original problem can be interpreted in both ways. You have to ask the questioner for more information to remove the ambiguity.

kgwgk 18 hours ago [-]

I think we agree then.

The original problem cannot be answered without making additional assumptions about how you get your information. Different interpretations may reach different answers by making different assumptions.

taeric 1 days ago [-]

I'd hazard that people also typically hear "what are the odds of this from the outset?" Effectively, "you flip two quarters, and see one land heads; what were your odds to flip two tails?"

zahlman 1 days ago [-]

>ambitious

ambiguous?

1 days ago [-]

guy2345 1 days ago [-]

[dead]

tocs3 2 days ago [-]

"This might seem abstract, but I've seen variations of this problem pop up in business and I've had difficult conversations with non-technical people as a result."

Does anyone have some real life examples? i cannot think of any off hand but would like to be able to cite a couple if someone says "So, what is this good for?".

nostrademons 10 hours ago [-]

"We noticed that our app reviews on TV, Watch, and VR suck, why could that be? Are the OSes on those platforms terrible? Are user expectations just super high? Can we ever make money on those platforms?"

"When you're on TV, you need to use a DPad to laboriously type in a review. Same with the watch and its tiny screen. When you're on VR, you're using a laser pointer. That makes the barrier to leaving a review very high. As a result, people only bother to leave reviews if they have particularly strong feelings. The strongest motivating emotion is anger. Ergo, we are getting a disproportionate number of reviews from people who are angry, and everyone that is happy is not going to bother leaving a review."

Same goes for many kinds of user feedback situations. The feedback you're measuring is P(user holds that opinion | user bothers to tell you that opinion). The denominator needs to be understood in order to evaluate that probability, but by definition, you don't have any data on people who don't bother to engage with you, unless you can measure their interaction in some other way (eg. usage data). This is why online reviews of many offline businesses skew negative; why businesses will often nag you to leave a review if you're satisfied; why corporate executives often overestimate employee satisfaction, particularly in prestigious businesses (anyone who is dissatisfied will leave and get a job elsewhere), and why Congress has such a low approval rating (by definition, you can't escape the laws enacted by Congress, so there is no "exit" option, and you get an accurate appraisal of what people really think).

JHonaker 1 days ago [-]

Any time you start conditioning on something, i.e. selecting subsets of data to analyze. You can fool yourself quite often if you do something seemingly innocuous like select "everyone with at least one X" and compare expectations to what's true unconditionally (meaning not conditioning on anything, not "in all cases") with conditional computations.

1 days ago [-]

jmount 2 days ago [-]

Peter Winkler shares some great variations of this: "Boy Born on Tuesday" (p. xix) and "Men with Sisters" (p. xxii) in "Mathematical Puzzles".

"Mrs. Chance has two children of different ages. At least one of them is a boy born on Tuesday. What is the probability that both of them are boys?"

(note: it is a puzzle, not a biology or data demography problem. so there are 50/50 independence assumptions on gender and uniform day of week assumptions prior to adding the conditioning.)

layer8 1 days ago [-]

Here “on Tuesday” is ambiguous, in my opinion. I first thought it meant “on a Tuesday” and that it was just a diversion. But it is likely intended to mean “last Tuesday” or “this Tuesday” (which excludes the boy-then-girl case). Wording it more clearly would likely reduce the ratio of wrong answers.

Furthermore, “of different ages” is likely intended to exclude the case of twins. However, even with twins, one is generally nominally older than the other. (Not to mention that it’s possible for two non-twin siblings to be the same age in years, at certain points in time.) Why not just say “that aren’t twins”?

I loathe when logic puzzles are obscured by ambiguous language, turning them more into “gotcha” text interpretation riddles than logic puzzles.

jmount 1 days ago [-]

Puzzles are definitely odd birds. I myself have gotten into a literal screaming match try to push my belief that they never should be used in interviews. The bulk of that was an interviewer said the interviewee was "clearly confused when they were asked a puzzle" yet refused to agree that may evidence the presentation of the puzzle may in fact be confusing (and not measuring anything).

I can't speak for Winkler, but both he and Jaynes implicitly separate the reading of the puzzle from the work. Winkler start his book with a few awful "reading trick ones", but in the explanations gives a few reading directions to try and avoid that going forward. I happen to know he meant "on a Tuesday." But a correct solution to a different read would be a correct solution even if it doesn't match the book text. I don't think he was trying to set a text trap, it is just hard to be clear, concise, and unambiguous at the same time. (Even "on a Tuesday" isn't completely clear if it means "all I am telling you was the day of week was Tuesday" versus "it was a very specific Tuesday, that I am not telling.")

exmadscientist 1 days ago [-]

The value of puzzles in interviewing is never about reaching the solution. It is about seeing how candidates deal with tricky situations that stretch them a bit, because that happens all the time on the job. It should almost always be done interactively, so you can see what clarifications and extra information they ask for, when and how they give up, and if they're dumb enough to say HR violations out loud (it does happen).

This does require a rather skilled interviewer, so the benefits may well not be worth it. But it can be very interesting information to have.

teekert 1 days ago [-]

Or whether you watch Veritasium and just know you can jump out of that blender because weight scales with the cube of height and muscle power scales quadratically.

zeroonetwothree 1 days ago [-]

I think it clearly means “on a Tuesday”, anything else wouldn’t make sense as a puzzle. We are meant to assume each day of the week is equally likely.

Excluding twins is so that we can assume the probability of each day of the week is independent.

stronglikedan 1 days ago [-]

I agree with the first one, but age is measured in days at a minimum, so twins are always the same age. (I'm sure there are cases where they are born farther apart due to some issue with the pregnancy, but that is statistically insignificant here.)

layer8 1 days ago [-]

Even if you take days, one can be born at 23:58 and the other at 00:03 the next day. (And it could be New Year’s day — in some cultures that would even imply different ages in years.) Regardless of days, it’s not uncommon to talk about who is the older twin.

Of course colloquially twins are the same age, but we are talking about a mathematical puzzle about probabilities here, where precision is paramount.

jimmaswell 1 days ago [-]

Why can't you just disregard the existing boy and reframe the question as the probability that the other child is a boy, and the space of all possible answers is BG and BB, equally probable (1/2)? Not really following explanations I find online.

joshuaissac 1 days ago [-]

Because GB is also a possibility. You are not told that the existing boy is the elder child.

jimmaswell 1 days ago [-]

https://www.online-python.com/Q5leTWuvb6

https://www.online-python.com/RueVd2514m

No matter how I frame or interpret this question, the birthdays and birth order appear completely irrelevant - the results are still ~0.50 as expected. Whatever the author was trying to say, they didn't communicate it well. I'm really curious exactly what word or phrase the author thought I was supposed to take to mean something else. If someone could edit one of these simulations to show what the author intended then that would probably be the clearest way.

LudwigNagasena 1 days ago [-]

https://www.online-python.com/6LDtZz7lMh

* Take all families with two children

* Take a subset where at least one child is a boy born on Tuesday

* Take a subset of the previous subset where all children are boys

* The share of the 2nd subset relative to the 1st subset is around 48%

zeroonetwothree 1 days ago [-]

So we would write down the possibilities as (B-Tue, B-Tue), (B-Tue, G-Any), (B-Tue, B-NotTue) and the inverse for the latter two. This results in 27 cases. Of those 13 have two boys so the answer is 13/27.

zeroonetwothree 1 days ago [-]

Similarly if you had the knowledge “at least one is a boy born on May 11” then it would be very close to but slightly less than 50%.

So we can see in the limit as the information becomes more and more specific it turns into the unconditional probability. That is, the case of “the first is a boy, what is the probability both are boys” (50%).

I think this clarifies the situation in the OP pretty well.

Rickasaurus 1 days ago [-]

Great book, I highly recommend it too.

JeffJor 1 days ago [-]

Q1: "A family has two children. You're told that at least one of them is a girl. What's the probability both are girls?"

Q2: "A family has two children. You're told that at least one of them is a boy. What's the probability both are boys?"

Note that these are symmetric problems, and must have the same answer.

Q3: "A family has two children. You're told that a gender, that applies to at least one, is written inside a sealed envelope. What's the probability both have that gender?"

In Q3, we have no information. So the answer is the proportion of two-child families that are single gendered. That is, 1/2.

But if we open the envelope, and read what is written inside, the problem becomes either Q1 or Q2. Which have the same answer. So we don't have to open it; whatever the answer to Q1 and Q2 is, opening the envelope in Q3 make its answer the same. If that answer is 1/3, we have a paradox. The answer has to be 1/2 of we don't look.

This is what is known as "Bertrand's Box Paradox." Well, if we add a fourth box to his problem, with one gold and one silver coin. I realize that in modern times the problem itself is called the paradox, but what Bertrand actually wrote (edited to this problem) was "How can it be that opening the envelope suffices to change the probability from 1/2 to 1/3?"

The resolution is that probability must be based on the full set of possibilities, not the possibilities that _could_ result from the full set of _states._ These are the possibilities for this problem:

1) BB and you are told that there is at least one boy. 2A) BG and you are told that there is at least one boy. 2B) BG and you are told that there is at least one girl. 3A) GB and you are told that there is at least one boy. 3B) GB and you are told that there is at least one girl. 4) GG and you are told that there is at least one girl.

Each numbered case has a prior probability of 1/4. Let's say the "A" subcases have a probability of Q/4, so the "B" subcases have a probability of (1-Q)/4.

The answer to the first problem is the probability of case 1, which is 1/4, divided by the total probability of cases 1, 2A, which is (1+2Q)/4. That's 1/(1+2Q).

The answer to the second problem is the probability of case 4, divided by the total probability of cases 4, 2B, and 3B. Which is (3-2Q)/4.

Bertrand's paradox, stated another way, is that these must be equal, but can only be equal if Q=1/2 and both answers are 1/2.

Majromax 1 days ago [-]

In all of these questions, you're making an assumption about the data-generating process. In Q1 and Q2, you're assuming that you had a 0% chance (a priori) of hearing that 'neither is a (girl/boy)', and in Q3 you're assuming that there's a 0% chance of hearing that the envelope doesn't match the family.

Take a look at this problem beginning with no assumptions. We have two kids, and an envelope that contains 'B' or 'G'. Our probability space is (B,G)^3, with each having probability of 1/8.

Now, we add information about the match as conditioning. Conditional on being told that the envelope matches the family, we can exclude the BBG and GGB cases. That brings us down to 6, of which we have BBB, GGG, and (BG,GB)(B,G). With this additional information, the probability of matching genders becomes 1/3. This probability is still 1/3 if we open the envelope to find B or G, since we exclude all three cases where the envelope doesn't match our observation of it.

In my view, this is related to the Monty Hall problem; we have to realize that we're given additional information with the statement/envelope.

maest 24 hours ago [-]

> in Q3 you're assuming that there's a 0% chance of hearing that the envelope doesn't match the family.

This is equivalent to the host never opening the door with the car in the Monty Hall scenario

meatmanek 1 days ago [-]

In Q3, you've got 8 possibilities, expressed as (gender of 1st child, gender of 2nd child, which child's gender is written inside the sealed envelope?), each with presumably equal probability:

   1. B B 1
   2. B B 2
   3. B G 1
   4. B G 2
   5. G B 1
   6. G B 2
   7. G G 1
   8. G G 2

in which case 4 of 8 possibilities satisfy the condition (the first two and the last two).

Once you open the sealed envelope and it says "girl", it does not become Q1, it becomes a different question:

Q4: "A family has two children. I randomly sampled one of the children and it was a girl. What's the probability both are girls?"

In which case, we're looking at possibilities 4, 5, 7, and 8, and in only 2 of those 4 possibilities are both children girls.

In Q1, you're actually told "A family has two children. I looked at both children and can tell you that at least one of them is a girl. What's the probability that both are girls?". In which case, possibilities 3, 4, 5, 6, 7, 8 are all valid. Only in 2 of those 6 possibilities are both children girls.

So as in_cahoots said in https://news.ycombinator.com/item?id=45053187, it matters whether the person asking looked at both children or just a single one.

D13Fd 1 days ago [-]

The "paradox" problem is in the setup. It's easy to mistake it as "a couple has one girl, what is the probability that their next child will be a girl," in which case the answer is 50%.

tantalor 1 days ago [-]

If you allow misunderstanding the question, then any answer is allowed.

in_cahoots 1 days ago [-]

But it's a valid point, the question is not well-posed. If you said, "I looked at both children and saw that at least one was a girl" more people would get the right answer. Many people will assume that the author looked at only one child, not both. And there's nothing in the wording to indicate either way.

As others are pointing out, this is just the Monty Hall problem. But the way the question is posed there is much clearer.

tantalor 1 days ago [-]

I don't know how this could be made more clear:

"You're told that at least one of them is a girl"

> Many people will assume that the author looked at only one child

There is no mentioning of "looking"

in_cahoots 1 days ago [-]

How did you determine at least one is a girl? Presumably you looked in some way. But did you look at one child or both? That's the crux of the ambiguity.

tantalor 1 days ago [-]

I think you are asking "how did the person who told you there is at least one girl learn that".

The answer is: it doesn't matter how because that is an unambiguous statement.

It means "you can assume the family does not have two boys".

I think people are actually getting hung up on "you are told" as if that could be a lie, or some kind of trick, when it is really just supposed to mean "here is some more information that you can rely on".

kgwgk 1 days ago [-]

> It means "you can assume the family does not have two boys".

But it does not mean that you can assume that p(you're told at least one is a girl | both are girls) = p(you're told at least one is a girl | they aren't both girls) as explained by 6gvONxR4sf7o.

If you allow assuming whatever you want, then many answer are allowed!

That’s what it means that the problem is not “well-posed” as mentioned by in_cahoots. You need additional assumptions to get a definite answer - and the answer will depend on the assumptions.

As JeffJor noted it seems much more natural to have assumptions that keep the symmetry of the problem (because why not?) and the answer 1/2 is not just possible but arguably “better”.

1 days ago [-]

simonh 1 days ago [-]

These two questions are not equivalent.

Q1: I looked at only one of a pair of two randomly selected children and it was a girl. What is the probability the other I didn’t see is a girl?

Q2: I looked at both of two randomly selected children and at least one of the pair of children is a girl. What is the probability the other is also a girl?

zeroonetwothree 1 days ago [-]

There is no “the other” in the original question. Introducing that completely changes the meaning.

in_cahoots 1 days ago [-]

The other part doesn't change anything at all. Here you go:

Q1: I looked at only one of a pair of two randomly selected children and it was a girl. What is the probability there are two girls?

Q2: I looked at both of two randomly selected children and at least one of the pair of children is a girl. What is the probability there are two girls?

kbelder 1 days ago [-]

I agree, it's perfectly clear. In my humble opinion, people are bringing their incorrect assumptions to the question, and because they're wrong, are trying to blame the framing of the question. That happens a lot with the Monty Haul paradox, as well.

And, of course, neither are paradoxes. They're just math that can seem paradoxical if you don't look closely at it.

kgwgk 1 days ago [-]

The "it's perfectly clear" crowd are also bringing their own assumptions into the answer.

"Different readings of the setup imply different answers to p(what you're told | the unknowns)." See https://news.ycombinator.com/item?id=45056790

Do you think that it's perfectly clear that the answer to all the questions here is 2/3? https://news.ycombinator.com/item?id=45057514

1 days ago [-]

sixo 2 days ago [-]

It's not even worth mentioning this problem unless you talk about how the result depends on the data generating process. If you take it to be something like "you randomly sample from families with two children, discarding any without at least one girl", you get the 1/3 result, but there are various other ways to read a sampling process from the problem statement which lead to other results.

pontus 1 days ago [-]

Just to pile on here, there's also ambiguity around how the observed girl is selected. Consider the following framing:

I go to a random house on a random street and knock on the door. A young girl opens the door. I ask how many siblings they have and they say one. What's the probability that they have a sister?

Now it's 50% even though cosmetically it seems like it'd be fair to say that the family has at least one daughter. The reason is that once I see a girl at the door, I'm slightly more confident in that it's a GG household since a GB or BG household would sometimes show a boy opening the door (assuming the two kids are equally likely to open the door).

P(GG | G at door) = P(G at door | GG) P(GG) / P(G at door)

P(G at door) = 1/2 (by symmetry)

So, P(GG | G at door) = 1 * 1/4 * 2 = 1/2

MontyCarloHall 1 days ago [-]

This is the crux of the "paradox," which is really just an interpretation problem. Most people assume that the question asks exactly your scenario, i.e. if a specific child is selected and it's a girl, what's the probability that the sibling is also a girl? In that case, the event space is just GB or GG, and p(GG)/(p(GB) + p(GG)) = 0.5. (BG is not in the event space because we are conditioning on a specific child being a girl.)

However, if the question is interpreted as "what's the probability of having two girls if we know there aren't two boys," then the event space is GB, BG, GG, and p(GG)/(p(GB) + p(BG) + p(GG)) = 1/3. Both GB and BG are in the event space because we are not conditioning on the sex of one specific child.

smohare 1 days ago [-]

[dead]

the_gipsy 2 days ago [-]

Why can you not frame it as: "a random family has been sampled, the sample family has two childs, one of them is a girl"?

I.e. without "discarding", just giving some additional, but not complete, information on the random sample. Is adding information about the picked sample the same as discarding all contrarian samples? Why is this relevant?

AnotherGoodName 1 days ago [-]

If there were two possible statements they asked

"a random family has been sampled, the sample family has two childs, one of them is a girl"?

and

"a random family has been sampled, the sample family has two childs, one of them is a boy"?

and they selected each statement based on randomly picking a child from a random family then the probability actually becomes 50% boy/girl for the next child since the boy/boy or girl/girl has twice the chance of generating the above statement for the respective gender compared to the mixed gender children family.

Ie. if they say one is a girl that statement had a 50% chance of being generated by a girl/girl family (since we pick the statement based on a random selection of one of the two childrens gender and there's 2 girls, doubling the chance of a statement that one's a girl coming from a girl/girl family), there's 25% chance the statement was generated from a girl/boy family and a 25% chance the statement was generated from a boy/girl family.

If you take 50% chance girl/girl, 25% chance boy/girl and 25% girl/boy you'll see there's a 50/50 chance of the next child being either gender.

All this due to changing how we sampled.

florbnit 1 days ago [-]

> a random family has been sampled, the sample family has two childs, one of them is a girl"

It’s not a random family if it must have at least one girl. If you want to talk about a random family you can only make statements of the kind “one of the children is <gender>” where the gender depend on the specific family or “the family has between 0 and 2 girls”

markoknoebl 1 days ago [-]

[dead]

ndr 2 days ago [-]

I took this to mean exactly that:

> Assume the family is selected at random because they have at least one girl.

And then again, if they sampled all families with 2 children the posterior would not change, would it?

Still assuming boy vs girls are completely iid and equally probable

two_handfuls 2 days ago [-]

That's how I read it. What other ways were you thinking about?

bloak 1 days ago [-]

Well, one way of getting families with two children, at least one of which is a girl, would be to go to a girls' school and ask the children to raise their hand if they have exactly one sibling.

aidenn0 1 days ago [-]

I would expect that would yield a 50% chance of the other being a girl, right?

renewiltord 1 days ago [-]

Indeed. One thing they haven't mentioned is that the mother wasn't Zharata The Man Hater, who would kill any boy child. Therefore, in the Zharata case the answer is 1, and we're missing the probability of Zharata's family being considered, which could be one of pure certainty since she always puts her family forward for any puzzle question - killing any philosopher who would pose one not relating to her own family.

aaron695 1 days ago [-]

[dead]

nicknow 22 hours ago [-]

Why do we treat boy-girl and girl-boy as different outcomes since they are, per the question, equal outcomes thus representing one possible outcome.

We don't care about which came first or second, only what gender each child is.

Thus the answer, to the question given the information you have, is 50%. The only possible outcomes are girl-girl or girl-boy (where order is irrelevant.)

And this is absolutely NOT the Monty Hall Problem. I don't know why some people are making that reference. The Monty Hall Problem contains three possible choices and one is eliminated by the host, this is what makes the statistical math interesting in that problem. None of that is happening here.

Lets look a the exact wording of the question:

> a family has two children. You're told that at least one of them is a girl. What's the > probability both are girls?

We have a family with two children. Assume we don't know their gender. We'll represent them as XX.

We are told one of them is a female. So now they are represented as GX (remember GX = XG, since order doesn't matter.)

You are left with the question what is the probability that X is female? Well there are only two choices, F and M, and we are told elsewhere that the probability of having a girl is 50/50.

> Assume that the probability of having a girl or boy is 50% and that the birth order has > no effect on the probability.

So the chance of X being female is 50%. Thus the answer is 50%.

You can't say birth order doesn't matter and then use birth order to say the FM and MF are different results. The only possible results are FM and FF (since birth order is irrelevant.)

lIl-IIIl 19 hours ago [-]

It's not the birth order that is relevant.

It's the fact that there are twice as many boy-girl families as there are girl-girl families in the world.

jihadjihad 1 days ago [-]

From the Wikipedia article linked in TFA:

> Following classical probability arguments, we consider a large urn containing two children.

I like how they modified a classic from probability texts, drawing items from an urn, and made sure it would be big enough in this example to accommodate two kids.

bitwize 1 days ago [-]

This is better than an "assume a spherical cow" joke in the wild.

tromp 2 days ago [-]

Perhaps people would be more likely to give the correct answer if "at least one of them is a girl" is rephrased as the equivalent "the youngest is a girl or the oldest is a girl".

justonceokay 1 days ago [-]

Well constructing a different question to remove the “trick” of the problem is one approach. Kind of the “no child left behind” approach to riddles.

kgwgk 1 days ago [-]

Maybe people would be more likely to give the correct answer if the problem had _one_ correct answer. :-)

renewiltord 1 days ago [-]

Another way is to say: the probability is 1/3, what is the probability? In this way, more people can answer it correctly, though perhaps not all.

rml 1 days ago [-]

For something this small we can enumerate all the cases (this is a Scheme version of the Mathematica `Tuples` function)

    > (list-tuples '(B G) 2)
    ((B B) (B G) (G B) (G G))

3 cases have at least one girl

of those 3 cases, 1/3 are both girls

florbnit 1 days ago [-]

A coworker approached you and goes “hi, I have two children and one is a boy…” and is promptly vaporized because he doesn’t fit the selection criteria of the problem statement, another approaches and goes “err hi, I have two children and one is a girl…” looks nervously at the vaporizer but is left standing. What is the chance their other child is a girl, who has not been vaporized?

If you phrase the question as “someone with two children tell you the gender of a random one, what is the chance the other is the same gender?” Chance is 50/50 because 50% will have BB or GG and the vaporizer isn’t active.

Ekaros 1 days ago [-]

This reminds me of the Monty Hall. Which doesn't explicitly tell that host never reveals the car. Which for lot of people would be sensible option in game show. Just screw them up instantly. Well at least the winner gets some mutton.

taeric 1 days ago [-]

So, my problem with how this is modeled is it assumes order doesn't matter in one aspect, but that it does in another.

Simply stated, if you allow the possibility space of "boy-girl" and "girl-boy", you have to also have two "girl-girl" states. Since you don't know which of the kids is known. Why is that not correct?

State it with coins, if I know that you flipped a quarter and a dime and one turned up heads, what are the odds that both are heads?

quectophoton 11 hours ago [-]

> So, my problem with how this is modeled is it assumes order doesn't matter in one aspect, but that it does in another.

Yeah, I'm with you here. Assuming heads=girl and tails=boy:

* If order does NOT matter (GB=BG), then it means Alice-Albert (GB, Heads-Tails) is the same as Albert-Alice (BG, Tails-Heads).

* If order DOES matter (GB!=BG), then it means Alice-Barbara (GG, Heads-Heads) is different than Barbara-Alice (GG, Heads-Heads). Thereforce, GG!=GG.

Either way, the stated problem seems badly defined.

sokoloff 1 days ago [-]

There aren't two "girl-girl" states, because of the stated assumption in the problem:

> Assume the family is selected at random because they have at least one girl.

Given that plus "a family has two children" and "Assume that the probability of having a girl or boy is 50%"

That means you're starting from the set of all two child families: BB, BG, GB, and GG, being told that you do not have the BB case, leaving 3 ways in which the family could be composed and being asked about "the one which is not a G".

That's different from the dime and quarter case, and would also be different if you were told "the oldest child is a girl", because being told "the oldest child is a girl" eliminates both BB and BG.

Being told "[at least] one of the coins is heads" or "[at least] one of the kids is a girl" only eliminates one of the four cases, while being told "the quarter is heads"/"the oldest is a girl" eliminates two cases.

taeric 12 hours ago [-]

I think where this fails for me, is it seems to change the odds of one of the flips? (And again, to be clear, I fully expect that I am the one that has this modeled incorrectly...)

Consider. You flip a dime and a quarter, one of them is heads. What is the odds that the dime is heads? What are the odds that both are heads?

If you do as stated in this model, you would say the possible states are HT, TH, HH. Doing that, you conclude that the odds of any single one being heads is 2/3. But, that does not make any sense, as you did nothing to change the expected 1/2 odds of either coin. And if the odds of either being H is 2/3, then the odds of both would be?

So, in the original question, what would the expectation be on the question of "what are the odds that the first born is a girl, knowing that at least 1 is a girl?" My intuition would be that it should be 1/2? As the two children are independent events. Just as the two coin flips are independent of each other. Not knowing anything about either coin, the best you can do is the original odds. (Things change if talking about after you observe one, of course. If you see a heads, and I say that at least one is heads, you are then modeling if I am disclosing the one you already know or not. Basically, we have moved into liars dice.)

So, how can you model this such that you keep the 50% odds per coin, but also have it so that the "at least one of" could apply to either coin?

Conversely, why does knowing "at least 1 is heads" change the odds of either single coin being heads?

zeroonetwothree 1 days ago [-]

The answer is the same in your coin version. There are four possible outcomes: (Q, D) = HH, HT, TH, TT. Given that one turned up heads that eliminates TT so we see that HH has 1/3 probability.

As you can see there aren’t two HH states just as there aren’t two GG states in the original question.

taeric 1 days ago [-]

Well, sorta? You were either in the world where dime was heads, in which case the space has TH, HH, or the Quarter was Heads, in which case you have HT, or again HH. Both with 50% probability, no? You were never in a scenario where HT and TH have the same probability as each other. (Well, again, sorta. Point being that you know one of them is possible.)

Edit: I want to be clear that my initial thinking was exactly what you said. I was trying to "steelman" the bad intuition and I think I've trapped myself. :D

pinoy420 1 days ago [-]

[dead]

hkkwritesgmail 16 hours ago [-]

Its all in the wording. Such probability questions have deflated and dumbfounded generations of students and general public trying to learn the subject.

hdgvhicv 1 days ago [-]

> Select only the families that have at least one girl.

That’s not what the first question said. The first question was select a family (Bb,Bb,gb,gg)

Then that they happen to have a girl.

tantalor 1 days ago [-]

I fail to see the distinction.

hnbad 1 days ago [-]

It's explained in the Wikipedia article linked in TFA.

If the family was picked at random, a GG family is twice as likely to have resulted in us finding out one of the children is a girl as a BG or GB family (and a BB family would be ruled out by observation) so you end up with the intuitive but in this case also entirely correct 1/2 chance.

If the family was picked from only those that have at least one girl, a GG family is equally likely as a BG or GB family (and a BB family would be ruled out again but in this case by definition) you end up with the very unintuitive 1/3 chance the article describes.

So the initial filtering is necessary to create this "paradox" (it's actually not a "paradox" but a "problem", as others have mentioned). Without it, the intuitive answer is actually the correct answer.

Vermin2000 2 days ago [-]

The sisters paradox is madenning example of counter-intuitive probability. The resolution is straightforward, but it's really easy to get tied up in knots.

EMM_386 1 days ago [-]

I don't understand at all how this is maddening or counter-intuitive.

When you have a child, the odds are ~50% ... so the chance the next child is a boy or girl is almost equal. Is it because of the way it's framed that makes people think harder than they need to be?

This is like when I (very rarely) play something like "pick six".

I play 1, 2, 3, 4, 5, and 6. People think I'm crazy. They don't realize I have the same odds as any ticket they purchase.

n4r9 1 days ago [-]

If you have to share the prize money with other people that guessed the same numbers, then it seems sensible to avoid obvious patterns.

bell-cot 2 days ago [-]

> maddening example of counter-intuitive probability.

Not how I'd describe it. The setup is mundane enough for people to just assume that their intuition will work fine. The difference between the naive and correct answers is too small to spot in a small-n dataset. And ~0% of the population is actually familiar with analyzing such situations, for their "intuition" to be applicable.

It's a bit like Gell-Mann amnesia - people are too quick to apply an easy cognitive strategy, when (in theory) they know enough to rule that strategy out.

spadros 1 days ago [-]

Yes, I found this one easy. Was surprised my data management intuition came back after all these years since school. There’s really only three options:

- boy - boy

- boy - girl

- girl - girl

So it must be 1/3 chance. If you’re looking at permutations in order, that’s a different question.

AIPedant 1 days ago [-]

This intuition is wrong even if turned out to get the right answer. The three unordered options do not have equal probabilities, boy+girl is twice as likely to occur as boy+boy and girl+girl.

To get the right answer you must be careful about conditional probabilities (or draw out the sample space explicitly). The crux of the issue is that you are told extra information, which changes your estimate of the probability.

(This question as written is very easy to misinterpret. The Monty Hall problem, which illustrates the same thing, is better since the sample selection is much more carefully explained.)

taeric 1 days ago [-]

Oddly, this is a part I'm sticking with on this problem.

Specifically, if you know that one is a girl, then the unordered options seem like they are back on equal footing? That is, it isn't twice as likely if you know that one ordering can't happen? (Or, stated differently, you don't know which version of two girls you are looking at.)

So, for this one, you know that either the youngest is a girl (so, girl-boy is not possible) or that the oldest is a girl (so boy-girl is out). That puts you back to the rest of the possibilities. Boy-boy is out, sure, as you have a girl. But every other path remains? So, you have one of (boy-girl(known), girl-girl(known), girl(known)-boy, girl(known)-girl). Which drops you back to 50/50?

AIPedant 1 days ago [-]

Like others in the thread have said, the question could have been phrased more precisely. Technically you are misreading it but in an annoying and trivial way.

What the problem is really saying is this:

1) You have a large collection of families with two kids of varying genders.

2) You draw one of them at random. At this point, your only estimate of P(2 girls) is 0.25.

3) Someone tells you that the family you drew has at least one girl.

4) This extra information changes your probability estimate because the possibility of two boys has been ruled out; the naive 1/4 estimate is refined to 1/3.

The way you are interpreting it is this:

1) You have a large collection of families with two kids, at least one of whom is a girl.

2) Then the probability that the other child is a girl is clearly 50%.

As a reminder this is how the original post phrased the question:

  Here's the problem: a family has two children. You're told that at least one of them is a girl. What's the probability both are girls?

This is just too vague and admits both interpretations, they needed to be more specific about where the family "came from." That's why Monty Hall is a better illustration: it starts with you explicitly choosing a door at random. Here the family has been chosen at random from the pool of families with two children, but that's totally unclear.

taeric 1 days ago [-]

The annoying thing is this sits with my teaching fine, it is more my intuition that is failing to withstand trying to break it. :(

So, in the original: "a family has two children. You're told at least one of them is a girl." What are the possible states? Well, assume first born is the girl, then you have 50% that the next is a girl. Then, assume that the first born was a boy, then there is no chance and the second born is the girl that you know of. So, at 50/50 on those chances, you have 50% chance of having a 50% chance, or a 50% chance of it being 0. I can't see how to combine those to get 1/3. :(

And the Monty Hall explicitly covers the case that a decision is made on which door is shown to you. I don't see any similar framing to this problem. Yes, the total states are GB, BG, GG, but only if you treat GG in such a way that either BG or GB was not a possible state. (That is, using G for girl that you know of, and g for unknown, then possible states are GB, Gg, gG, BG. There is no version of Bg or gB that is possible, so to treat those as equal strikes me as problematic.)

AIPedant 15 hours ago [-]

Where you're getting confused is by trying to combine state space determination and probability determination at the same time (this is also why the problem is so similar to Monty Hall). The state space is shifting when you say "assume X, then the probability of Y." You are going back and forth between using and not using the information to decide arbitrarily that some probabilities are 50% and others are 0%, which leads to an invalid conclusion.

Specifically: it is not true that the firstborn has a 50-50 chance of being a girl, given you were told that the family has at least one girl. The firstborn has a 2/3rds chance of being a girl. This is the heart of your confusion.

In a broader sense there is an entire class of confusing conditional probability problems like this. Events which are causally independent in reality (e.g. gender of a child, which door Monty Hall hid the car behind) fail to be probabilistically independent when you have extra information. Yet these probability games are contrived in a way that our intuition takes over and we use our causal understanding even when a better probabilistic understanding gives you a better answer.

taeric 12 hours ago [-]

But in the monty hall, the "discloser" is limited by my choice/observation. It is literally part of the framing. In this framing, there is no limit based on my choice.

Consider, if you tell me that the Smiths have at least 1 girl, and I meet their daughter but you haven't met either, I have no way of knowing if I met the 1 girl or not. I could ask you, but you would just say, "I don't know, could be her. Could be the other kid. I just know they have at least 1 girl."

This is very different from the monty hall case, where the announcer knows what is behind all doors.

Similar. But different. I was using coins as an example elsewhere. If I flip a dime and a quarter, and tell you that one of them is heads, do you have increased chance of knowing if either particular one is heads? This is more liar's dice than it is monty hall.

AIPedant 11 hours ago [-]

I have no idea what point you're trying to make. I am aware the selection process is different. If you understand the argument now but you're just splitting hairs about whether it's similar to Monty Hall, fine let's agree to disagree. If you don't understand the argument and are trying to use the dissimilarity to Monty Hall as a counterargument then I don't think I can help you any further.

This is not coherent as written:

  If I flip a dime and a quarter, and tell you that one of them is heads, do you have increased chance of knowing if either particular one is heads? This is more liar's dice than it is monty hall.

"Increased chance of knowing" is nonsense. What you mean is "increased chance of being correct if I guess the dime came up heads instead of tails" and this is obviously true. Given at least one of the two coins came up as heads, the probability that the dime is heads is 2/3rds, not 1/2.

kgwgk 1 days ago [-]

> 4) This extra information changes your probability estimate because the possibility of two boys has been ruled out; the naive 1/4 estimate is refined to 1/3.

That’s not correct in general.

It’s only correct if you assume that “3) Someone tells you that the family you drew has at least one girl.” was equally likely to happen whether or not there were two girls.

That’s a quite strong assumption.

One can make different assumptions and get answers different from 1/3. For example, 1/2.

AIPedant 15 hours ago [-]

Unlike the other misreading, I think you are splitting hairs about something boring and irritating. Rephrase the problem instead to "you have a reliable device that can tell whether the family has at least one daughter but doesn't tell you how many."

kgwgk 12 hours ago [-]

> boring and irritating

Ok, AIPedant.

I understand that you may find irritating that someone points out that the original problem is ill-posed and “the answer” depends on how we decide to “rephrase” it.

However, it doesn’t seem boring in the context of a discussion of how the problem is not well-posed and additional assumptions are required to get an answer.

11 hours ago [-]

puppycodes 1 days ago [-]

And then theres that intersex kid that makes all the numbers wrong

1 days ago [-]

a3w 1 days ago [-]

Becomes intuitive if you ask "nth one is a girl. What is nth+1 one likely to be?", for any arbitrary number

pmg101 2 days ago [-]

GG / BG GB GG is 1 / 3.

What's the paradox?

AnotherGoodName 1 days ago [-]

Because it's entirely dependent on sampling assumptions. Go to a random house where there's two children, one of which randomly opens the door. Each bb, gg, bg, gb is equal probability and a random child opens the door.

Now if you see a boy disregard that since you can't make the statement that one is a girl.

If you see a girl go ahead and make the statement "a family has two children. You're told that at least one of them is a girl.

What is the probability now?

You have twice the chance of making that statement if you encounter a gg family over a bg/gb family right since there's one of two girls possibly answering the door amongst those families.

So 50% chance of that statement being enabled from a gg family, 25% chance coming from a bg family, 25% chance of coming from a gb family. Which means 50% chance the other child's a girl and 50% chance the other childs a boy.

The probabilities here are entirely dependent on details of the sampling which is not made explicit here.

teekert 1 days ago [-]

Paradoxes don't exist in reality (they do in hypothetical situations), so there is indeed no paradox as you correctly observe. Instead, most people answer this wrongly, for some reason. And for some reason we call situations where this happens "a paradox". Though I agree that we shouldn't.

Edit, ok, there are things like "This statement is false.", but we should perhaps stick to "self-referential problems" with those.

I think paradoxes just exist in our theories, languages, and formal systems when we make flawed assumptions or create inconsistent frameworks. But physical reality itself just is what it is - no contradictions, just phenomena we sometimes struggle to describe accurately.

If contradictions (paradoxes) can exist, then anything becomes possible through the principle of "explosion in logic". From a contradiction, any statement can be "proven" true. The whole foundation of rational thought would be undermined. Right?

luxcem 1 days ago [-]

The Medical Test Paradox or what's that called do exist in the sense that when a test is positive for a rare disease we always run a second one.

teekert 1 days ago [-]

To me that is not a paradox, just logical, with very low incidence you just find much more false positives than real positives. What is paradoxical about that?

It's why we don't screen for just any condition in the general population. I.e. we just do it for 65+ y/o's, 3 packs/day smokers because there we may actually find it worth the cost of the program.

There's no contradiction anywhere in this scenario, just people's incorrect intuitions meeting (mathematical) reality.

kgwgk 1 days ago [-]

The “paradox” may become apparent if you think the answer to all the questions below is the same (2/3). And if it’s not the same, why not?

——

You meet three people:

Alice has two children. You're told that at least one of them is a girl.

Bob has two children. You're told that at least one of them is a boy.

Csilla has two children. You're told that at least one of them is a lány. That clearly meant boy or girl because of the context, but you don’t know enough Hungarian to know what it is.

For each of them, what's the probability that they have a girl and a boy?

—-

You meet all the parents with two children in your neighborhood. Say there are 60 such families.

For 30 of them you’re told that at least one of them is a boy. What's the probability that they have a girl and a boy?

For the other 30 you’re told that at least one of them is a girl. What's the probability that they have a girl and a boy?

hammock 1 days ago [-]

Confusing permutations and combinations

pmg101 1 days ago [-]

Oh yes. Strange.

hammock 1 days ago [-]

This is Monty hall problem, is it not?

teekert 1 days ago [-]

Is it? Would it change your intuition when I tell you "A couple has 100 kids, at least 99 are girls, what is the probability 100 are girls?"

I'm a bit at a loss I have to admit.

meatmanek 1 days ago [-]

That would change my intuition quite a bit. Even assuming 100 kids were feasible in a human lifespan, the odds of 99 girls or 100 girls happening is (1 [arrangement where all the kids are girls] + 100 [arrangements where there are 99 girls and 1 boy]) / 2^100, or about 8e-29. Practically zero.

If we assume that each child really did have a 50/50 chance of being boy or girl, then the result would be that there's a 1/101 chance that it's 100 girls.

Given what I know about the world and genetics and such, I think it's much more likely that there's some predisposition by the couple to have girls.

If we think it's, say, 90/10, then the prior probability of the 100-girls case would be 0.9 ^ 100, and the prior probability of each of the 99-girls cases would be 0.9^990.1 -- i.e. the all-girls case is 9x more likely. 0.9^100 / (100 0.9^990.1 + 0.9^100) = 0.9 / (1000.1 + 0.9) = 0.9/10.9 = about 8% probability of having 100 girls, 92% probability of having 99 girls and 1 boy.

If we think the couple has 99:1 odds of girl:boy on each birth, then it's 0.99^100 / (100 * 0.99^990.01 + 0.99^100) or about 50/50 on whether they have 99 girls or 100 girls.

If we think the odds are 999:1, then it's 0.999^100 / (100 0.999^99*0.001 + 0.999^100) = around 90% chance they have 100 girls.

Someone else can do the math assuming an uninformative prior on the couple's girl:boy odds and calculating the posterior distribution given that we know there are 99 girls.

hammock 2 hours ago [-]

The answer is 1/2 of 1%, right?

luxcem 1 days ago [-]

The 'sample space' reduction method is indeed also used to solve the Monty Hall problem.

flappyeagle 1 days ago [-]

It is exactly the same

1 days ago [-]

bitwize 1 days ago [-]

Related to the Monty Hall "paradox". Spoiler: You'll get the car if you switch doors with 2/3 probability.

https://en.m.wikipedia.org/wiki/Monty_Hall_problem

JeffJor 1 days ago [-]

Bertrand's Box Paradox, which I wrote about in my own comment, applies to it. The upshot is that probability is not based on which prize placements _could_ lead the current game state, it is the set of all possible game states. Lets assume that the contestant starts off with door #3.

Case 1: The prize is behind door #1, and the host must open door #2. Probability 1/3.

Case 2: The prize is behind door #2, and the host must open door #1. Probability 1/3.

Case 3: The prize is behind door #3, and the host has a choice. Case 3A: The host opens door #1. Probability Q/3. Case 3B: The host opens door #2. Probability (1-Q)/3.

If the host actually opens door #1, the probability that door #2 has the prize is (Case 2)/(Case 2 + Case 3A) = (1/3)/(1/3+Q/3) = 1/(1+Q).

If the host actually opens door #2, the probability that door #1 has the prize is (Case 1)/(Case 1 + Case 3B) = (1/3)/(1/3+(1-Q)/3) = 1/(2-Q).

My point is that, since you get to see which door is opened, 2/3 is correct only if you assume Q=1/2. We aren't told what Q is, but we must assume it is 1/2 because otherwise the answer is different depending on which door is chosen.

zeroonetwothree 1 days ago [-]

Well if we frame the question as “what is the probability of winning by always switching” then this doesn’t play into it and the answer is indeed 2/3. Hence as a question about general strategy the standard answer is correct.

You’re right if we are asking about a specific case though.

wpollock 1 days ago [-]

This always bugged me. It isn't switching your choice that gives the 2/3 probability, it is exercising a choice once one of the doors has been eliminated. The odds are the same as flipping a fair coin to decide on which of the two doors remaining to pick.

Am I wrong?

bitwize 14 hours ago [-]

Yes.

You pick a door. Probability 1/3 it has the car behind it. The host then picks another door, revealing the booby prize. That door now has probability 0 of having the car. But you still picked a door with probability 1/3 of having the car, which means there's a 2/3 chance the other door you didn't pick has the car. Write a program to run a Monte Carlo simulation... of the Monty Hall problem... and you will see this is the case.

The host has given you information about where the car is by revealing where it is not. Whether you can effectively use that information is another matter.

AdrianB1 1 days ago [-]

I think the explanation is wrong. It is based on the probability of having combinations of boys and girls and then counting the combinations that have at least one girl, but this is not the situation: there is no probability in question for one of the kids, it is a confirmed, past event and the only other probability is the sex of the other kid.

Otherwise you can derive any probability as a branch of a probability tree that contains it and calculate the probabilities of the tree and then the one of the branch. This makes no sense.

For example, a family has a kid and the kid is a girl. The family wants another kid; what is the probability to be a girl? Is it 1/4 because having 2 girls is 1/4? No, it is 1/2 as it is for any new kid.

simonh 1 days ago [-]

Your example is different in a critical way. These two questions are not equivalent.

Q1: I looked at only one of a pair of two randomly selected children and it was a girl. What is the probability the other I didn’t see is a girl?

Q2: I looked at both of two randomly selected children and at least one of the pair of children is a girl. What is the probability the other is also a girl?

The question in the article is the second question, not the first. The fact that the observer looked at both children and not just one of them is crucial. As is often the case in these puzzles the exact information available is the critical issue.

flappyeagle 1 days ago [-]

This is exactly the Monty hall problem

lightvector 1 days ago [-]

One of the challenges with puzzles like this it that it gives you "at least one of them is a girl" as a mathematical assertion where you're not supposed to further introspect the context of how/why you're being given that fact.

But that's unrealistic. In real life, the context for how and why there would be a speaker telling you such a thing in the first place can be relevant and affect the probability!

How is this possible? Suppose among all the math-riddle-loving parents of two children who would ask such a puzzle in the first place there are an equal number of parents of B-B, B-G, G-B, G-G, and that each is equally likely to ask you such a riddle when you meet them.

Suppose when asking such a riddle the B-B parents tell you "at least one of them is a boy" (they don't have any girls, so that's the only way they can ask this kind of riddle), the G-G parents tell you "at least one of them is a girl" (same thing but in reverse), while the B-G and G-B parents say one of "at least one of them is a boy" and "at least one of them is a girl" equally at random.

Then, conditioned on being told that "at least one of them is a girl", the probability of another girl is actually 1/2, not 1/3 like the paradox answer claims. To see this, imagine 40 examples of the above puzzle asking taking place. You get 10 B-B parents saying "at least one of them is a boy", 10 G-G parents saying "at least one of them is a girl", and among the 20 (B-G and G-B) parents since they choose randomly, you have 10 saying "at least one of them is a boy" and saying "at least one of them is a girl".

So out of the 20 times where "at least one of them is a girl" is said, there are 10 cases where it's a G-G family and 10 cases where it's a B-G or G-B family, therefore conditioned on being told "at least one of them is a girl", the probability of two girls is actually 1/2.

If there were some gender bias in how the B-G and G-B families might ask the question, or other differences that affect how likely different of these people would be posing the puzzle to you, then the probability could be yet different than either of 1/3 or 1/2.

So there's a difference in being present something as a flat mathematical assertion that you're supposed to take at face value and not supposed to question further (where the probability is 1/3, as the article claims). Versus being told something in real life, where you always need to take into account the context and situation of the speaker, and the probability could be different.

There are real life implications of this too - the big classic one being publication bias / newsworthiness bias. As most people intuitively know by now, it is also often wrong to take the statistical analysis or claims of a particular research study or paper entirely at face value, because there is a bias in the fact that "positive" and "exciting" results are more likely to be reported in the first place, and so statistical outliers that aren't actually replicable are disproportionately likely to be reported (see also https://xkcd.com/882/). And publication bias still occurs with respect to the reporting of results, amplification or not in the media etc, even when the the authors themselves are trustworthy and have done their analysis within the paper in a statistically proper way. So conditioned on you hearing about the result in the first place, it is often less likely to be true (and less likely to replicate in the future, etc) than you would think if you just took the statistical analysis in the paper at face value, even when that analysis was done correctly. The situation in the "sisters paradox" of computing a probability taking a statement entirely at logical face value is rare in real life.

zeroonetwothree 1 days ago [-]

It’s not meant to be a real life problem but a math one. Would you like it better if it was phrased instead as: X and Y are random Bernoulli variables with p=0.5. What is P(X+Y=2| X=1 or Y=1)?

kgwgk 20 hours ago [-]

If you want people to solve that math problem then state a problem equivalent to that one!

Otherwise the context of “how/why you're being given that fact” is relevant, because the problem being discussed asks for P(X+Y=2| being told that “X=1 or Y=1”) and that depends on P(being told that “X=1 or Y=1”|X=1,Y=0), P(being told that “X=1 or Y=1”|X=0,Y=1) and P(being told that “X=1 or Y=1”|X=1,Y=1).

You cannot fault people for noticing the relevance of the missing information and pointing out that the problem you posed is not the one that you think and the answer may not be the one that you propose.

onecommentman 1 days ago [-]

When I first heard of the Monty Hall problem, I assumed the naive answer was true, then I thought more and drew up the decision tree and thought the “correct” answer was right. Now I think it is an underspecified problem and literally any probability can be assigned to the result. There are no bad answers because it is a bad(ly stated) problem.

“Here's the problem: a family has two children. You're told that at least one of them is a girl. What's the probability both are girls?”

— This is the complete statement of the problem! Everything else is an assumption that may or may not be correct. And is certainly not necessarily a complete set of underlying assumptions relevant to the problem statement.

“Assume that the probability of having a girl or boy is 50% and that the birth order has no effect on the probability. Assume the family is selected at random because they have at least one girl.”

— This is not a part of the statement of the problem! These are a subset of assumptions that can choose to accept, or not. As a modeler or decision analyst you have to make that distinction. Eh, let’s accept them, for the time being. We’ll even assume the narrator is honest, which isn’t a stated assumption.

But let’s add to that list of assumptions. The narrator telling you that one of them is a girl gets all winnings from bets on the outcome of the unknown gender of the “other child” and wants those winnings. The narrator knows that a probability tree analysis of the problem, with perhaps unwarranted assumptions of independence and prior probabilities, will lead to an assignment of 1/3 probability for the other sibling being a girl, and knows you know that result and believes you want to win. [A valid credible interpretation, not misinterpretation, of the original problem statement.]

“What do you think the probability is that both children are girls?”

— Let’s make this question more actionable. “Should you take the even odds bet on both children being girls made by the narrator?. $100 - if they are both girls, the narrator wins $100 and you lose $100; if they aren’t, you win $100 and the narrator loses $100. The narrator and you want to win the money.”

The answer to this question, which seemingly follows from the question of probabilities to be “yes”, is, in fact, “no” - under the additional valid, and quite credible, assumptions made. Because you will only be presented sets of two-girl pairs by the narrator. Let’s assume the “assumptions” are actually correct, and the families will be indeed selected at random, and in the general population there is a 50-50 mix of boys and girls. There is nothing, even in the “assumptions”, that precludes the narrator from preselecting and only presenting two-girl pairs to you, thus always winning when you believe and follow the 1/3 two-girl result.

The statement of the problem, and only the statement of the problem, underspecified as it is, leads to a whole suite of possibly correct answers. The problem is the territory, the problem statement and assumptions are the map.

None of these maps are the territory, necessarily. The probability tree answer is just as sloppy, from a decision analyst perspective, as the naive answer.

smohare 1 days ago [-]

[dead]

fkyoureadthedoc 2 days ago [-]

> a family has two children. You're told that at least one of them is a girl. What's the probability both are girls?

    +-----------+-----------+-----------+-----------+
    |   Boy     |           |   Girl    |           |
    |  (0.5)    |           |  (0.5)    |           |
    +-----------+-----------+-----------+-----------+
    | Boy-Boy   | Boy-Girl  | Girl-Boy  | Girl-Girl |
    |  (0.25)   |  (0.25)   |  (0.25)   |  (0.25)   |
    +-----------+-----------+-----------+-----------+

Does this mean the probability that they are both boys is also 25% lol

randallsquared 1 days ago [-]

That one is ruled out by the "at least one is a girl".

fkyoureadthedoc 1 days ago [-]

Well brother I'm looking at the table and it clearly says 25%. You may have been lied to about the at least one girl it seems.

furyofantares 1 days ago [-]

The text directly above that table:

> A simpler question

> Let's image you're asked a simpler question.

> A family has two children. What's the probability both are girls?

fkyoureadthedoc 1 days ago [-]

Under the 1 child policy, probably slim U+1F614