algorithmrandom

Select a random item, without knowing the total number of items


I have a case where I need to select a random item, but I don't know the total number of items and I don't want to build a huge array then pick an item out. For example, this is what I have right now:

List<string> items;
while (true)
{
    string item = GetNextItem();
    if (item == null)
        break;
}
int index = random.GetNext(0, items.count);

As you can see, I'm building a gigantic collection that I really don't need, I just need a random number between 0 and the number of items. Here is what I am thinking of doing, and it works, but I'd like to know if any of the experts out there can find a fault with it:

int index = -1;
int total;
string selectedItem;
while (true)
{
    string item = GetNextItem();
    if (item == null)
        break;

    ++total;
    int rnd = random.Next(0, total);
    if (rnd == total- 1)
    {
        index = total- 1;
        selectedItem = item;
    }
}

This gives me my index number, and the randomly selected item. My thinking behind this is that when there are 3 total items, for example, I pick a random number between 0 and 2 (inclusive) and if it's equal to 2 I use the new item as the selected item, if not just ignore it. As the total number of items increases, each new item's chance of being selected decreases accordingly.

Is this method "good"? Is it as "random" as building the array and picking an item out later? Is it as fast as it can be? Please guide me.


Solution

  • What you're doing will work.

    Here's a restating of it that might make the algorithm slightly more clear:

    1. Select the first item, there is a 100% chance it will be the current selection
    2. If there is a second item, there is a 1/2 chance it will replace the current selection (If you do the math, then it's a 50% chance it will be the first item, and a 50% chance it will be the second item)
    3. If there is a third item, there is a 1/3 chance it will replace the current selection (again, the math the probability for each item being 1/3)
    4. If there is a fourth item, there is a 1/4 chance it will replace the current selection
    5. ... etc ...

    Note that you should be able to compute a 1/x chance by saying rand.Next(0,x) == 0 (or any other integer between 0 and x - 1 inclusive; you don't have to bother using total - 1.

    It's actually a pretty neat approach; initially I thought there wasn't going to be any good way of doing what you were asking!