sqlsql-server-2008performance

Check for winning tickets in lottery using SQL


I have a SQL efficiency question. This is concerning the Norwegian national lottery. They draw seven numbers and three bonus balls.

I have a database with all the drawings and a lot of tickets. The question is what is the most efficient table structure and way of getting all the winning tickets in a draw.

These are my two main tables:

LotteryDraw
   DrawId (int, PK)
   DrawDate (datetime)
   MainNumbers (varchar)
   BonusNumbers (varchar)
   Main1 (smallint)
   Main2 (smallint)
   Main3 (smallint)
   Main4 (smallint)
   Main5 (smallint)
   Main6 (smallint)
   Main7 (smallint)
   Bonus1 (smallint)
   Bonus2 (smallint)
   Bonus3 (smallint)

I store each of the main- and bonus numbers both separately as well as a comma separated string in sorted order.

Similary I've got:

LotteryTicket
   TicketId (int, PK)
   UserId (int, FK)
   ValidTill (datetime)
   MainNumbers (varchar)
   Main1 (smallint)
   Main2 (smallint)
   Main3 (smallint)
   Main4 (smallint)
   Main5 (smallint)
   Main6 (smallint)
   Main7 (smallint)

You get prizes for 4+1, 5, 6, 6+1 and 7 correct numbers (correct main numbers + bonus numbers). Anyone have any great ideas on how to write efficient SQL that will return all LotteryTickets with a prize for a give draw date? ValidTill is the last draw date where a ticket was valid.

My current attempt is using Linq2Sql in C# and has the speed of a hippo on ice so I really need some SQL expertise.

Server is Microsoft SQL Server 2008 R2 if that matters.

Update: After tweaking the answer from Mark B. I ended up with the following query. I needed to normalize the database a bit by adding a new table LotteryTicketNumber (ticketid, number).

SELECT LotteryTicket.TicketID, count(LotteryTicket.Numbers) AS MainBalls, (
    SELECT top 1 ltn.Number
    FROM LotteryTicketNumber ltn
    WHERE ltn.Number IN (2,4,6)
    AND ltn.TicketId = LotteryTicket.TicketId
) As BonusBall
FROM LotteryTicket
LEFT JOIN LotteryTicketNumber ON LotteryTicket.TicketId = LotteryTicketNumber.TicketId
WHERE LotteryTicketNumber.Number IN (13,14,16,23,26,27,30)
GROUP BY LotteryTicket.TicketID
HAVING count(LotteryTicketNumber.Number) >= 4

The above query returns all tickets with at least 4 correct main numbers. Also the field Bonusball != NULL if the same ticket has one or more bonus balls. This is sufficient for me.

Thanks for the help


Solution

  • If you're willing to normalize the data by splitting the list of numbers into a sub-table, then you could trivially determine winners with something like:

    SELECT LotteryTicket.TicketID, GROUP_CONCAT(LotteryTicketNumbers.number), COUNT(LotteryTicketNumbers.number) AS cnt
    FROM LotteryTicket
    LEFT JOIN LotterYTicketNumbers ON (LotteryTicketNumbers.number IN (winning, numbers, here))
    GROUP BY LotteryTicket.TicketID
    HAVING cnt >= 3;
    

    where the '3' represents the mininum number of matched numbers required to win any prize. This won't handle "bonus" numbers, if there's any, though you could repeat the same query and flag any draws where the bonus number is present with a derived field.

    Note that this isn't tested, just going off the top of my head, so probably has some syntax errors.


    comment followup:

    GROUP_CONCAT is a mysql-specific sql extension. You can rip that out since it would seem you're on SQLserver.

    The 'LottoTicketNumbers' is what you'd use to normalize your tables. Instead of a single monolitic "ticket" record, you split it into two tables:

    LottoTicket:  ticketID, drawDate
    LottoTicketNumbers: ticketID, drawNumber
    

    So let's say you had a ticket for the Apr 1/2011 draw, with numbers 1,12,23,44,55, you'd end up with something like:

    LottoTicket: ticketID = 1, drawDate = Apr 1/2011
    LottoTicketNumbers: (1,1), (1,12), (1,23), (1,44), (1,55)
    

    Structuring your tables like this makes the query work, using some basic set theory and the power of a relational database. The original table structure makes it nearly impossible to do the comparisons necessary to figure out all the possible permutations of winning numbers, you'd end up some hideous construct like

    select ...
    where (number1 in (winning, numbers here), number2 in (winning, numbers, here), number3 in (winning, numbers,here), etc....
    

    and wouldn't tell you exactly which prize you'd won (matched 3, matched 5 + bonus, etc...).

    Example query results:

    Let's say the draw numbers are 10,20,30,40,50, and you've got a ticket with 10,20,30,42,53. You've matched 3 of the 5 draw numbers, and win $10. Using the normalized table structure above, you'd have tables like:

    LottoTicket: id #203, drawDate: Apr 1/2011
    LottoTicketNumbers: (203, 10), (203, 20), (203, 30), (203, 42), (203, 53)
    

    And the query would be

    SELECT LottoTicket.TicketID, COUNT(LottoTicketNumbers.number) AS cnt
    FROM LottoTicket
    LEFT JOIN LottoTicketNumbers ON (LottoTicketNumbers.number IN (10,20,30,40,50))
    GROUP BY LottoTicket.TicketID
    HAVING CNT >= 3
    

    You'd get (ungrouped) results of

    203, 10
    203, 20
    203, 30
    

    and with the grouping/aggregate functions:

    203, 3   // ticket #203 matched 3 numbers.