I have a column with numbers separated by one of these characters : -,/,*,~,_
. I need to check if the values contain any of the characters, then split the value in another column.
Is there a different solution than shown below?
In the end, columns subnumber1, subnumber2 ...subnumber5
will be merged in one column and column number5
will be without characters. I need to use those two columns in further process.
if gdf['column_name'].str.contains('~').any():
gdf[['number1', 'subnumber1']] = gdf['column_name'].str.split('~', expand=True)
gdf
if gdf['column_name'].str.contains('^').any():
gdf[['number2', 'subnumber2']] = gdf['column_name'].str.split('^', expand=True)
gdf
Input column:
column_name
152/6*3
163/1-6
145/1
163/6^3
output:
number5 |subnumber1 |subnumber2
152 | 6 | 3
163 | 1 | 6
145 | 1 |
163 | 6 | 3
Use Series.str.split
with list of possible separators and create new DataFrame:
import re
L = ['-','/','*','~','_','^', '.']
#some values like `^.` are escape
pat = '|'.join(re.escape(x) for x in L)
df = df['column_name'].str.split(pat, expand=True).add_prefix('num')
print (df)
num0 num1 num2
0 152 6 3
1 163 1 6
2 145 1 None
3 163 6 3
EDIT: If need match values before value use:
L = ["\-_",'\^|\*','~','/']
for val in L:
df[f'before {val}'] = df['column_name'].str.extract(rf'(\d+){[val]}')
#for last value not exist separator, so match $ for end of string
df['last'] = df['column_name'].str.extract(rf'(\d+)$')
print (df)
column_name before \-_ before \^|\* before ~ before / last
0 152/2~3_4*5 3 4 2 152 5
1 152/2~3-4^5 4 4 2 152 5
2 152/6*3 NaN 6 NaN 152 3
3 163/1-6 NaN NaN NaN 163 6
4 145/1 NaN NaN NaN 145 1
5 163/6^3 6 6 NaN 163 3