pythonpandasdataframeselectindexing

Creating new pandas dataframe from certain columns of existing dataframe


I have read a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:

names = ['A','B','C','D']
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset['A','D']

I would like to create a new dataframe with the columns A and D from the original dataframe.


Solution

  • It is called subset - passed list of columns in []:

    dataset = pandas.read_csv('file.csv', names=names)
    
    new_dataset = dataset[['A','D']]
    

    what is same as:

    new_dataset = dataset.loc[:, ['A','D']]
    

    If need only filtered output add parameter usecols to read_csv:

    new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])
    

    EDIT:

    If use only:

    new_dataset = dataset[['A','D']]
    

    and use some data manipulation, obviously get:

    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead

    If you modify values in new_dataset later you will find that the modifications do not propagate back to the original data (dataset), and that Pandas does warning.

    As pointed EdChum add copy for remove warning:

    new_dataset = dataset[['A','D']].copy()