pythonpandasdataframe

Filter pandas dataframe with specific column names in python


I have a pandas dataframe and a list as follows

mylist = ['nnn', 'mmm', 'yyy']
mydata =
   xxx   yyy zzz nnn ddd mmm
0  0  10      5    5   5  5
1  1   9      2    3   4  4
2  2   8      8    7   9  0

Now, I want to get only the columns mentioned in mylist and save it as a csv file.

i.e.

     yyy  nnn   mmm
0    10     5     5
1    9      3     4
2    8      7     0

My current code is as follows.

mydata = pd.read_csv( input_file, header=0)

for item in mylist:
    mydata_new = mydata[item]

print(mydata_new)
mydata_new.to_csv(file_name)

It seems to me that my new dataframe produces wrong results.Where I am making it wrong? Please help me!


Solution

  • Just pass a list of column names to index df:

    df[['nnn', 'mmm', 'yyy']]
    
       nnn  mmm  yyy
    0    5    5   10
    1    3    4    9
    2    7    0    8
    

    If you need to handle non-existent column names in your list, try filtering with df.columns.isin -

    df.loc[:, df.columns.isin(['nnn', 'mmm', 'yyy', 'zzzzzz'])]
    
       yyy  nnn  mmm
    0   10    5    5
    1    9    3    4
    2    8    7    0