Having an issue that maybe some help me with. I am trying to extract two patterns from a string and place them in another column. It's extracting the first string fine but I am missing some in getting the second one there. Here's the string.
jobseries['New Column'] = jobseries['Occupation'].str.extract('(GS-\d+)(|)(WG-\d+)').fillna('')
The first string is (GS-\d+)
and the second string is (WG-\d+)
I've tried a ton of variations none have worked.
You can use either
jobseries['New Column'] = jobseries['Occupation'].str.extract(r'(GS-\d+|WG-\d+)').fillna('')
or a shorter
jobseries['New Column'] = jobseries['Occupation'].str.extract(r'((?:GS|WG-\d+)').fillna('')
The points are:
Series.str.extract
and assignt he result to a single column (New Column
)((?:GS|WG-\d+)
instead of (GS-\d+|WG-\d+)
, that means a capturing group that matches either GS
or WG
and then a hyphen and then one or more digits.