sql amazon-athena presto regexp-replace trino

AWS athena regex replace except first occurrence

I have some string values in column like below

hel_some_data
h_some_data_more_data
hello_some_more_data_data

I need output like below

    hel_some data
    h_some data more data
    hello_some more data data

Basically I want to replace '_' with ' ' except first occurrence.

I am trying to use regex replace but unable to find how to give the position, I have found one syntax on documentation where it looks like providing the position by code from documentation itself not working, I am trying below example for documentation.

SELECT REGEXP_REPLACE('the fox', 'FOX', 'quick brown fox', 1, 'i');

But it says function not found.

Any help, highly appreciated, Thanks,

Solution

It seems that you are looking at docs for Redshift, which does have such REGEXP_REPLACE function allowing to specify position (which does not mean what you think it does, if I understood correctly your attempt).

Athena is not Redshift, it is based on Presto/Trino and does not support such version of REGEXP_REPLACE.

One option is to use version which splits the string into array of strings based on regex and applies join function (docs):

-- sample data
WITH dataset(str) AS (
   values ('hel_some_data'),
        ('h_some_data_more_data'),
        ('hello_some_more_data_data'),
        ('hel_somedata')
)

-- query
select regexp_replace(str, '([^_]*_[^_]*)(.*)', x -> x[1] || replace(x[2], '_', ' '))
from dataset;

Output:

           _col0
---------------------------
 hel_some data
 h_some data more data
 hello_some more data data
 hel_some

([^_]*_[^_]*)(.*) explanation @regex101. Basically the first group matches two strings without _ separated by _ and the second one matches what is left.