I am trying to assign a value and/or a series of values to a slice of a pandas dataframe selected using .loc after sorting values.
For reference, this is the code I am trying to run, with a set string value ('filler')
df.sort_values(['col_1','col_2']).loc[
df.col_1.isin(rows_selector),col_2] = 'filler'
and with a pandas series containing the different values (the series is of the same length as the output of .loc
df.sort_values(['col_1','col_2']).loc[
df.col_1.isin(rows_selector),col_2] = filler_series
I would expect the above to assign the desired values in place in df
, but that does not seem to happen. Would appreciate any help.
You would need an intermediary step in this for it to work. While the sorted df called before the loc will be sorted, the df used in the loc (df.col_1.isin()) will still be calling from the unsorted df. For example:
df = pd.DataFrame({'col_1':[1,1,7,5,9,7],
'col_2':[4,1,5,6,6,3]})
col_1 col_2
0 1 4
1 1 1
2 7 5
3 5 6
4 9 6
5 7 3
Running df.sort_values(['col_1','col_2'])
gives:
col_1 col_2
1 1 1
0 1 4
3 5 6
5 7 3
2 7 5
4 9 6
However, if you run df.col_1.isin(rows_selector)
(where rows_selector = [1, 5, 9]
) returns the following:
0 True
1 True
2 False
3 True
4 True
5 False
Name: col_1, dtype: bool
Notice that the values that are True
do not sit on the same indexes as the sort_values
output, as it is still checking the original dataframe.
You should assign the sorted version to be df
, or df_sorted
and then use .loc
on that like this:
df_sorted = df.sort_values(['col_1','col_2'])
df_sorted.loc[df_sorted.col_1.isin(rows_selector)]
col_1 col_2
1 1 1
0 1 4
3 5 6
4 9 6