pythonpysparkdatabricks

Replace first occurrence of character in spark dataframe pyspark


I know , I am asking very basic question here , But is there any way to replace first occurrence of character within pyspark dataframe.

I have below value within dataframe.

Gourav#Joshi#Karnataka#US#English

I only want to replace first occurrence of # within dataframe.

Expected Output:

Gourav Joshi#Karnataka#US#English

Solution


Just use regexp_replace and capture the sub-string before the 1st # as $1:

spark.sql("""
    select col, regexp_replace(col,'^([^#]*)#','$1 ') col_new
    from values ('Gourav#Joshi#Karnataka#US#English') as (col)
""").show(1,0)
+---------------------------------+---------------------------------+
|col                              |col_new                          |
+---------------------------------+---------------------------------+
|Gourav#Joshi#Karnataka#US#English|Gourav Joshi#Karnataka#US#English|
+---------------------------------+---------------------------------+