I have created the following pandas dataframe:
ds = {'col1' : [1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4,4,4,4,4],
'feature1' : [1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4,4,4,4,4],
'col2' : [12,3,4,5,4,3,2,3,4,6,7,8,3,3,65,4,3,2,32,1,2,3,4,5,32],
}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
col1 feature1 col2
0 1 1 12
1 1 1 3
2 1 1 4
3 1 1 5
4 1 1 4
5 1 1 3
6 1 1 2
7 2 2 3
8 2 2 4
9 2 2 6
10 2 2 7
11 3 3 8
12 3 3 3
13 3 3 3
14 3 3 65
15 3 3 4
16 4 4 3
17 4 4 2
18 4 4 32
19 4 4 1
20 4 4 2
21 4 4 3
22 4 4 4
23 4 4 5
24 4 4 32
I need to create a new column (called sumOfCols
) which is the sum of the values contained in the columns whose names begin with col
(in the example above the column feature1
would not be included in the calculation).
The resulting dataframe would look like this:
Can someone help me please? Thanks in advance.
You can filter
the columns using a regex, then sum
on axis=1
:
df["sumOfCols"] = df.filter(regex="^col").sum(axis=1)
col1 feature1 col2 sumOfCols
0 1 1 12 13
1 1 1 3 4
2 1 1 4 5
3 1 1 5 6
4 1 1 4 5
5 1 1 3 4
6 1 1 2 3
7 2 2 3 5
8 2 2 4 6
9 2 2 6 8
10 2 2 7 9
11 3 3 8 11
12 3 3 3 6
13 3 3 3 6
14 3 3 65 68
15 3 3 4 7
16 4 4 3 7
17 4 4 2 6
18 4 4 32 36
19 4 4 1 5
20 4 4 2 6
21 4 4 3 7
22 4 4 4 8
23 4 4 5 9
24 4 4 32 36