I have a small issue manipulating a dataframe to create a new variable function of others.
I am able to calculate it, but not to aggregate it back to the original dataframe.
Here I have my test dataframe and my new_column
test = pd.DataFrame({'name': ["john", "jack", "albert"],
'day': ["2018-01-01", "2018-01-02", "2018-01-03"],
'result': ['c("7", "6", "")', 'c("3", "6", "10")', 'c("4", "3", "")']})
def update_result(row, x):
return row[x].replace("c(", "").replace(")","").replace("\"","").replace(" ","").split(",")
new_column=test.apply(lambda row: update_result(row,2),axis=1)
But when I try to add the new_column to the data_frame, I get an error message about manipulating a copy. Do you know what would be the correct way to aggregate this column?
test['result2']=new_column
I got:
ValueError: Wrong number of items passed 3, placement implies 1
and
# check if we are modifying a copy
Thank you for your help.
If you want to apply a function to a specific column you could try it this way:
test['result2']=test['result'].apply(lambda row: row.replace("c(", "").replace(")","").replace("\"","").replace(" ","").split(","))
Out[5]:
day name result result2
0 2018-01-01 john c("7", "6", "") [7, 6, ]
1 2018-01-02 jack c("3", "6", "10") [3, 6, 10]
2 2018-01-03 albert c("4", "3", "") [4, 3, ]
In case a SettingWithCopyWarning warning shows up you can try to set or update the column as suggested:
new_col=test['result'].apply(lambda row: row.replace("c(", "").replace(")","").replace("\"","").replace(" ","").split(","))
test.loc[:, 'result2'] = new_col
The loc commands require specifying which rows you want to select (: means all rows) and which column (result2 is the name of the column you want to create or if you want to update an existing one such as result you can as well).
You could also check this page, this topic is well explained here.