I want to create a new column from an extracting Data Frame (DF) column. All my testing indicates the values I am using are correct and should produce a level1 value vs NAN. Help!
CODE SNIPPET:
import pandas as pd
string = df['currentagentsnapshot']
start = string.str.find('agent-group') + 55
stop = string.str.find('}, level2=')
df['start'] = string.str.find('agent-group') + 55
df['stop'] = string.str.find('}, level2=')
df['level1'] = string.str[df['start']:df['stop']]
print(df.head())
SAMPLE OUTPUT OF KEY FIELDS:
awsaccountid | start | stop | level1 |
---|---|---|---|
992974280925 | 410 | 414 | NaN |
992974280925 | 410 | 414 | NaN |
992974280925 | 410 | 414 | NaN |
992974280925 | 408 | 412 | NaN |
992974280925 | 408 | 412 | NaN |
Note: df['currentagentsnapshot'] is a LARGE text string. As long as start and stop are both numbers -- and stop > start -- I would expect string.str[df['start']:df['stop']] to produce the expected result.
Running the above script produces NAN instead of the expected string value.
All the examples I have checked on the WEB reference constant vs calculated values.
When I substitute constant for calculated values in string.str[start : stop] it works.
data = { 'currentagent': [ "some large text with agent-group info and }, level2=more text", "another example with agent-group data here and }, level2=continued", "yet another string agent-group details and }, level2=info", "text with agent-group data and }, level2=more", "last example of agent-group information and }, level2=content" ] } df = pd.DataFrame(data)
def extract_level1(row): start = row['currentagent'].find('agent-group') + 55 stop = row['currentagent'].find('}, level2=') if start != -1 and stop != -1 and stop > start: return row['currentagentsnapshot'][start:stop] else: return None
df['level1'] = df.apply(extract_level1, axis=1)
print(df)