datetimehivehive-query

Hive: No results are displayed from the query


I am writing a query on this table to get the sum of size for all the directories, group by directory where date is yesterday. I am getting no output from the below query.

test.id        test.path           test.size     test.date
1   this/is/the/path1/fil.txt      232.24           2019-06-01
2   this/is/the/path2/test.txt     324.0            2016-06-01
3   this/is/the/path3/index.txt    12.3             2017-05-01
4   this/is/the/path4/test2.txt    134.0            2019-03-23
5   this/is/the/path1/files.json   2.23             2018-07-23
6   this/is/the/path1/code.java    1.34             2014-03-23
7   this/is/the/path2/data.csv     23.42            2016-06-23
8   this/is/the/path3/test.html    1.33             2018-09-23
9   this/is/the/path4/prog.js      6.356            2019-06-23
4   this/is/the/path4/test2.txt    134.0            2019-04-23

SELECT regexp_replace(path,'[^/]+$',''), sum(cast(size as decimal)) 
from test WHERE date > date_sub(current_date, 1) GROUP BY path,size;


Solution

  • You must not group by size, only by regexp_replace(path,'[^/]+$','').
    Also, since you want only yesterday's rows why do you use WHERE date > '2019%?
    You can get yesterday's date with date_sub(current_date, 1):

    select 
      regexp_replace(path,'[^/]+$',''), 
      sum(cast(size as decimal)) 
    from test 
    where date = date_sub(current_date, 1) 
    group by regexp_replace(path,'[^/]+$','');