I have created the following pandas dataframe:
ds = {'col1':[1,2,2,3,4,5,5,6,7,8]}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
col1
0 1
1 2
2 2
3 3
4 4
5 5
6 5
7 6
8 7
9 8
I have then created a new field, called newCol
, which has been defined as follows:
def criteria(row):
if((row['col1'] > 0) & (row['col1'] <= 2)):
return "A"
elif((row['col1'] > 2) & (row['col1'] <= 3)):
return "B"
else:
return "C"
df['newCol'] = df.apply(criteria, axis=1)
The new dataframe looks like this:
print(df)
col1 newCol
0 1 A
1 2 A
2 2 A
3 3 B
4 4 C
5 5 C
6 5 C
7 6 C
8 7 C
9 8 C
Is there a possibility to create a dictionary like this:
dict = {
'0 <= 2' : "A",
'2 <= 3' : "B",
'Else' : "C"
}
And then apply it to the dataframe:
df['newCol'] = df['col1'].map(dict)
?
Can anyone help me please?
Yes, you could do this with IntervalIndex
:
dic = {(0, 2): 'A',
(2, 3): 'B',
}
other = 'C'
bins = pd.IntervalIndex.from_tuples(dic)
labels = list(dic.values())
df['newCol'] = (pd.Series(labels, index=bins)
.reindex(df['col1']).fillna(other)
.tolist()
)
But given your example, it seems more straightforward to go with cut
:
df['newCol'] = pd.cut(df['col1'], bins=[0, 2, 3, np.inf], labels=['A', 'B', 'C'])
Output:
col1 newCol
0 1 A
1 2 A
2 2 A
3 3 B
4 4 C
5 5 C
6 5 C
7 6 C
8 7 C
9 8 C
If you insist on your original dictionary format, you could convert using:
dic = {'0 <= 2' : "A",
'2 <= 3' : "B",
'Else' : "C"
}
dic2 = {tuple(map(int, k.split(' <= '))): v for k, v in dic.items()
if k != 'Else'}
# {(0, 2): 'A', (2, 3): 'B'}
other = dic['Else']