I'm struggling with slicing. I thought that generally it's easy and I understand it but when it comes to the below situation my ideas don't work.
Situation: In one of my columns in DF I want to remove in all rows some string that sometimes occurs and sometimes doesn't.
The problem looks like this:
1.I don't know the exact position when this string starts (in each row it could be a different
2.This string various, depending on each row, however, it always starts from the same structure - let's say: "¯main_"
3.After "¯main_" usually, there're some numbers (it various) however the length always is the same (9 numbers)
4.I'm already after splitting and I have around ~40 columns (each with a similar problem). That's why I'm looking for some more efficient way to solve it then splitting, generating ~40 more columns and then dropping them.
5.Sometimes after this string with "¯main_" there's some additional string I'd like to leave in the same column.
Example:
Column1
A1-19
B2-52
C3-1245¯main_123456789
D4
Z89028
F7¯main_123456789,Z241
Looking for a result like this:
Column1
A1-19
B2-52
C3-1245
D4
Z89028
F7,Z241
The best solution that I prepared up till now:
a = test.find("¯")
b = a+14
df[0].str.slice(start = a, stop = b)
But:
1.It doesn't work properly
2.And I'm aware that test.find() returns -1 when it won't find a character. I don't know how to escape from it - writing a loop? I believe that some better (more efficient) solution exists. However, after a few hours of looking for it, I decided to find help.
Loop by all column, split by position and append extracted strings by positions to helper list, last assign back to column:
print (df)
Column1
0 NaN
1 B2-52
2 C3-1245¯main_123456789
3 D4
4 Z89028
5 F7¯main_123456789,Z241
for c in df.columns:
out = []
for x in df[c]:
if x == x:
p = x.find('¯')
if p != -1:
out.append(x[:p] + x[p+14:])
else:
out.append(x)
else:
out.append(x)
df[c] = out
print (df)
Column1
0 NaN
1 B2-52
2 C3-1245Â9
3 D4
4 Z89028
5 F7Â9,Z241