pythonregex

Extract with multiple Patterns


Having an issue that maybe some help me with. I am trying to extract two patterns from a string and place them in another column. It's extracting the first string fine but I am missing some in getting the second one there. Here's the string.

jobseries['New Column'] = jobseries['Occupation'].str.extract('(GS-\d+)(|)(WG-\d+)').fillna('')

The first string is (GS-\d+) and the second string is (WG-\d+)

I've tried a ton of variations none have worked.


Solution

  • You can use either

    jobseries['New Column'] = jobseries['Occupation'].str.extract(r'(GS-\d+|WG-\d+)').fillna('')
    

    or a shorter

    jobseries['New Column'] = jobseries['Occupation'].str.extract(r'((?:GS|WG-\d+)').fillna('')
    

    The points are: