regexhivehiveqlword-boundaryrlike

Hive rlike matching word boundary


I am new to Hive regex matching and struggling to find the right pattern for matching word boundaries:

haystack RLIKE concat('(?i)\b', 'needle', '\b')

doesn't return anything.

Sample values which I have in DB:

haystack
---------
needless to say
this is a needle
so many (needle)
these are needles

When I use haystack RLIKE concat('(?i)', 'needle'), it returns me all the rows but I am actually looking for this is a needle.


Solution

  • In Hive use two backslashes: \\b

    Demo:

    with mytable as (
    select stack(4,
    'needless to say',
    'this is a needle',
    'so many (needle)',
    'these are needles'
    ) as haystack
    )
    
    select haystack, haystack rlike concat('(?i)\\b', 'needle', '\\b') from mytable;
    

    Result:

    haystack             _c1
    needless to say      false
    this is a needle     true
    so many (needle)     true
    these are needles    false
    

    Note that so many (needle) is also matched because ( and ) are not word characters.