sqlhivehiveqlmemory-optimized-tables

Best Hive SQL query for this


i have 2 table something like this. i'm running a hive query and windows function seems pretty limited in hive.

Table dept

id | name |
1 | a |
2 | b |
3 | c |
4 | d |

Table time (build with heavy load query so it's make a very slow process if i need to join to another newly created table time.)

id | date | first | last |
1 | 1992-01-01 | 1 | 1 |
2 | 1993-02-02 | 1 | 2 |
2 | 1993-03-03 | 2 | 1 |
3 | 1993-01-01 | 1 | 3 |
3 | 1994-01-01 | 2 | 2 |
3 | 1995-01-01 | 3 | 1 |

i need to retrieve something like this :

SELECT d.id,d.name,
t.date AS firstdate,
td.date AS lastdate
FROM dbo.dept d LEFT JOIN dbo.time t ON d.id=t.id AND t.first=1
LEFT JOIN time td ON d.id=td.id AND td.last=1

How the most optimized answer ?


Solution

  • GROUP BY operation that will be done in a single map-reduce job

    select      id
               ,max(name)   as name
               ,max(case when first = 1 then `date` end) as firstdate
               ,max(case when last  = 1 then `date` end) as lastdate
    
    from       (select      id
                           ,null as name 
                           ,`date`         
                           ,first         
                           ,last 
    
                from        time
    
                where       first = 1
                        or  last  = 1
    
                union all  
    
                select      id 
                           ,name         
                           ,null as `date` 
                           ,null as first 
                           ,null as last  
    
                from        dept
                ) t
    
    group by    id 
    ;
    

    +----+------+------------+------------+
    | id | name | firstdate  |  lastdate  |
    +----+------+------------+------------+
    |  1 | a    | 1992-01-01 | 1992-01-01 |
    |  2 | b    | 1993-02-02 | 1993-03-03 |
    |  3 | c    | 1993-01-01 | 1995-01-01 |
    |  4 | d    | (null)     | (null)     |
    +----+------+------------+------------+