I have a dataframe with more than 400 columns, I'm trying to select a sub-df with about half the columns based on some conditions. I have already stored the filtered columns as a list to hopefully use a for loop to iterate through them and select for the new df but I keep only getting the last column in the list.
My list has the 200 filtered columns. I used the following for loop:
for i in list:
df1 = df[["col1", "col2"]]
df2 = df[[i]]
df1 = df1.join(df2)
My final result should consist of "col1", "col2" and the subsequent 200 columns but the output I keep getting is 3 columns, "col1", "col2", and the 200th column.
You should never join columns repeatedly. This is inefficient and will fragment the DataFrame.
Assuming your list is named lst
, you should just do:
out = df[['col1', 'col2']+lst]
Your code failed since you're overwriting df1
at each step. This would have worked, but this is really not a good approach:
df1 = df[["col1", "col2"]]
for i in lst:
df2 = df[[i]]
df1 = df1.join(df2)