pythonpython-3.xregexgoogle-bigqueryfindall

Can overlapping matches with the same start position be found using regex?


I am looking for a regex or a regex flag in python/BigQuery that enables me to find overlapping occurrences.

For example, I have the string 1.2.5.6.8.10.12

and I would like to extract: [1., 1.2., 1.2.5., 1.2.5.6., ..., 1.2.5.6.8.10.12]

I tried running the python code re.findall("^(\d+(?:\.|$))+", string) and it resulted in ['12']


Solution

  • Use below (BigQuery)

    select text, 
      array(
        select regexp_extract(text, r'((?:[^.]+.){' || i || '})')
        from unnest(generate_array(1, array_length(split(text, '.')))) i
      ) as extracted
    from your_table               
    

    with output

    enter image description here