sqloracle-databaseanalytical

Oracle SQL - Most purchased item in past 6 months for each row


I have data in the following format:

date        fruit
======================
1-jan-14    orange
3-jan-14    orange
1-may-14    orange
2-may-14    apple
3-may-14    apple
1-aug-14    pineapple
2-aug-14    apple

I want to add a column which will represent: most purchased fruit in past 6 months. So, for the above example:

date        fruit       most_purchased_6_months
=======================================
1-jan-14    orange      orange
3-jan-14    orange      orange
1-may-14    orange      orange
2-may-14    apple       orange
3-may-14    apple       orange
1-aug-14    pineapple   apple
2-aug-14    apple       apple

I have tried using analytical functions but the best I have reached is showing the number of times the same fruit (from the same row) has been purchased in past 6 months using "count(*) over (partition by fruit order by date range 180 preceding)".

My main issue is that using a self-join would easily solve my issue but as this is a table with about 20million rows of data it takes a few hours to run on our database. Would appreciate any direction and/or help!


Solution

  • This is your basic query:

    select date, fruit,
           count(*) over (partition by fruit order by date range 180 preceding) as NumberPurchased
    from data;
    

    I think you can do this with a subquery and first_value():

    select date, fruit,
           first_value(fruit) over (order by NumberPurchased desc range 180 preceding)
    from (select date, fruit,
                 count(*) over (partition by fruit order by date range 180 preceding) as NumberPurchased
          from data d
         ) d;
    

    I'm leaving the answer but the above doesn't work because the counts change on each row for the preceding six months.

    EDIT:

    Here is an entirely different approach. Calculate for each day and each fruit the cumulative sum into a temporary table. Then index that table and join to it to get the most common fruit using keep:

    create table FruitCountCume as
        select f.fruit, da.date, count(*) over (partition by f.fruit order by d.date) as cumecnt
        from (select distinct fruit from data) f cross join
             (select distinct date from data) d left outer join
             data da
             on f.fruit = da.fruit and f.date = da.date;
    
    create index on FruitCountCume(fruit, date);
    
    select d1.date, d1.fruit,
           max(fruit) keep (dense_rank first order by d1.cumecnt - coalesce(d2.cumecnt, 0) desc) as MostCommon
    from FruitCountCume d1 left outer join
         FruitCountCume d2
         on d1.date = d2.date + 180
    group by d1.date;
    

    The full cross join may not be necessary. It is there as a convenience to make the left join more efficient.