pandasdataframecol

Can I select columns in a dataframe using a for loop?


I have a dataframe with more than 400 columns, I'm trying to select a sub-df with about half the columns based on some conditions. I have already stored the filtered columns as a list to hopefully use a for loop to iterate through them and select for the new df but I keep only getting the last column in the list.

My list has the 200 filtered columns. I used the following for loop:

for i in list:
    df1 = df[["col1", "col2"]]
    df2 = df[[i]]
    df1 = df1.join(df2)

My final result should consist of "col1", "col2" and the subsequent 200 columns but the output I keep getting is 3 columns, "col1", "col2", and the 200th column.


Solution

  • You should never join columns repeatedly. This is inefficient and will fragment the DataFrame.

    Assuming your list is named lst, you should just do:

    out = df[['col1', 'col2']+lst]
    

    Your code failed since you're overwriting df1 at each step. This would have worked, but this is really not a good approach:

    df1 = df[["col1", "col2"]]
    for i in lst:
        df2 = df[[i]]
        df1 = df1.join(df2)