I am new to Hive regex matching and struggling to find the right pattern for matching word boundaries:
haystack RLIKE concat('(?i)\b', 'needle', '\b')
doesn't return anything.
Sample values which I have in DB:
haystack
---------
needless to say
this is a needle
so many (needle)
these are needles
When I use haystack RLIKE concat('(?i)', 'needle')
, it returns me all the rows but I am actually looking for this is a needle
.
In Hive use two backslashes: \\b
Demo:
with mytable as (
select stack(4,
'needless to say',
'this is a needle',
'so many (needle)',
'these are needles'
) as haystack
)
select haystack, haystack rlike concat('(?i)\\b', 'needle', '\\b') from mytable;
Result:
haystack _c1
needless to say false
this is a needle true
so many (needle) true
these are needles false
Note that so many (needle)
is also matched because (
and )
are not word characters.