I have some string values in column like below
hel_some_data
h_some_data_more_data
hello_some_more_data_data
I need output like below
hel_some data
h_some data more data
hello_some more data data
Basically I want to replace '_' with ' ' except first occurrence.
I am trying to use regex replace but unable to find how to give the position, I have found one syntax on documentation where it looks like providing the position by code from documentation itself not working, I am trying below example for documentation.
SELECT REGEXP_REPLACE('the fox', 'FOX', 'quick brown fox', 1, 'i');
But it says function not found.
Any help, highly appreciated, Thanks,
It seems that you are looking at docs for Redshift, which does have such REGEXP_REPLACE
function allowing to specify position (which does not mean what you think it does, if I understood correctly your attempt).
Athena is not Redshift, it is based on Presto/Trino and does not support such version of REGEXP_REPLACE
.
One option is to use version which splits the string into array of strings based on regex and applies join function (docs):
-- sample data
WITH dataset(str) AS (
values ('hel_some_data'),
('h_some_data_more_data'),
('hello_some_more_data_data'),
('hel_somedata')
)
-- query
select regexp_replace(str, '([^_]*_[^_]*)(.*)', x -> x[1] || replace(x[2], '_', ' '))
from dataset;
Output:
_col0
---------------------------
hel_some data
h_some data more data
hello_some more data data
hel_some
([^_]*_[^_]*)(.*)
explanation @regex101. Basically the first group matches two strings without _
separated by _
and the second one matches what is left.