algorithmprobabilityrosalind

Algorithmic / probability exercise


I'm trying to solve an exercise from the rosalind project, but keep making some mistake apparently. The full text is available here, but my shorter abstract interpretation and attempt is as follows. Please help me find what am I doing wrong:

We have 3 groups of items: AA, Aa, aa. We start with 1 in Aa and do k iterations of generating new items. In every iteration every item in group:

As a result of iteration we count expected number of items for each group, assuming we generate 2 new items from each one in the previous iteration. So the we end up with:

The sum of expected values / population on each iteration is 2^iteration and the probability of an item being in group Aa is always 50%.

So far I hope I'm right, but what we're actually after is: what are the chances of having at least N items that are in group Aa both times if we repeat the experiment twice. (should be equivalent to: what are the chances of having at least N items in group AaBb if we extend the list of groups to AABB, AABb, .... from the original question)

So the probability of item being in Aa is 50%, population sum of expected values from iteration (or 2^iteration), and throwing that at scipy using the test data (k=2, N=1), we get this for just at least one item in group Aa:

In [75]: bin = scipy.stats.binom(4, .5)
In [76]: sum(b.pmf(x) for x in range(1, 4+1))
Out[76]: 0.93750000000000022

and this for at least one item if we have two sets of groups, so AaBb:

In [77]: sum(b.pmf(x) for x in range(1, 4+1))**2
Out[77]: 0.87890625000000044

Which is completely different from the answer in the original question: 0.684

Where did I make a mistake? (if possible please only point out the mistake, rather than give a solution, so that there are no spoilers left for people trying to solve it on their own)


Solution

  • I first followed your example and thought it seemed to make sense, but after a while I found where the problem was.

    Here is a pointer to your mistake:

    You have calculated the probability of getting at least one Aa-- in the second generetion and at least one --Bb. But this is not enough to find out if there is at least one AaBb in the second generation, the Aa-- and --Bb have to coincide.

    Consider for example the following second generation: aaBb, AABb, Aabb, AaBB All individuals have either Aa-- or --Bb but there are no AaBb in the generation.