regexhadoophivecaserlike

CASE WHEN - LIKE - REGEXP in Hadoop Hive


I want to write a query in a hive Table using CASE WHEN, LIKE and a regular expression. I have used regexp and rlike, but I do not get the desired results. My attempts so far are the following

    select distinct ending from
(select date, ending, name, count(distinct id) 
from (select CONCAT_WS("/",year,month,day,hour) as date, id, name,
case when type = 'TRAN' then 'tran'
when events regexp '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'con'
when events not regexp '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP'  then 'aban'
else 'other'
end as ending
from data_struct1) tmp
group by date, ending, name) tmp2;

and also

select distinct ending from
    (select date, ending, name, count(distinct id) 
    from (select CONCAT_WS("/",year,month,day,hour) as date, id, name,
    case when type = 'TRAN' then 'tran'
    when events rlike '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'con'
    when events not rlike '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP'  then 'aban'
    else 'other'
    end as ending
    from data_struct1) tmp
    group by date, ending, name) tmp2;

Both queries return incorrect results (not bad syntax, just not the correct results).


Solution

  • There are a lot of docs on regex quantifiers, for example this one: https://learn.microsoft.com/en-us/dotnet/standard/base-types/quantifiers-in-regular-expressions

    select 'opencase_2,initial_state:inquiry,inquiry:no_reply:initial_state:incomplete::,inquiry:reask:secondary_state:complete::' regexp 'no_reply:[^:]+:incomplete';
    
    OK
    true
    

    Also this is wrong: rlike '%HUP'. It should be like this '.*HUP$' (in the end of the string) or simply 'HUP' if it does not matter where the HUP is located: in the middle or in the end or in the beginning of the string

    rlike and regexp in your query work the same, better use the same operator: regexp or rlike only. These two are synonyms.

    Test: https://regex101.com/r/ksG67v/1