Consider the following DataFrame:
import pandas as pd
arrays = [['A','A','B','B','C','C'],[1,1,3,3,5,5,],[2,2,4,4,6,6],[0.1,0.2,0.3,0.4,0.5,0.6]]
index = pd.MultiIndex.from_arrays(arrays,names=('Sample','P1','P2','T'))
data = np.random.rand(10,6)
df = pd.DataFrame(columns=index,data=data)
I want to select: for sample A, the column with T=0.2, and for sample C, the column with T=0.5.
I can easily select each of the single columns, e.g.:
df.loc[:,('A',slice(None),slice(None),0.2)] # or
df.loc(axis=1)[('C',slice(None),slice(None),0.5)]
But how can I combine them? I tried supplying a list of tuples:
df.loc[:,[('A',slice(None),slice(None),0.2),('C',slice(None),slice(None),0.5)]]
But that generates an error.
How can I select my columns without resorting to pd.concat
?
use boolean indexing
out = df.loc[:, df.columns.droplevel([1, 2]).isin([('A', 0.2), ('C', 0.5)])]
out:
Sample A C
P1 1 5
P2 2 6
T 0.2 0.5
0 0.836079 0.368242
1 0.870087 0.520477
2 0.582020 0.105908
3 0.736918 0.324141
4 0.386489 0.613063
5 0.969809 0.358152
6 0.325047 0.958949
7 0.995300 0.474698
8 0.674752 0.949571
9 0.622846 0.878193