pythonpandaslistdataframe

Get list from pandas dataframe column or row?


I have a dataframe df imported from an Excel document like this:

cluster load_date   budget  actual  fixed_price
A   1/1/2014    1000    4000    Y
A   2/1/2014    12000   10000   Y
A   3/1/2014    36000   2000    Y
B   4/1/2014    15000   10000   N
B   4/1/2014    12000   11500   N
B   4/1/2014    90000   11000   N
C   7/1/2014    22000   18000   N
C   8/1/2014    30000   28960   N
C   9/1/2014    53000   51200   N

I want to be able to return the contents of column 1 df['cluster'] as a list, so I can run a for-loop over it, and create an Excel worksheet for every cluster.

Is it also possible to return the contents of a whole column or row to a list? e.g.

list = [], list[column1] or list[df.ix(row1)]

Solution

  • Pandas DataFrame columns are Pandas Series when you pull them out, which you can then call x.tolist() on to turn them into a Python list. Alternatively you cast it with list(x).

    import pandas as pd
    
    data_dict = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
                 'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
    
    df = pd.DataFrame(data_dict)
    
    print(f"DataFrame:\n{df}\n")
    print(f"column types:\n{df.dtypes}")
    
    col_one_list = df['one'].tolist()
    
    col_one_arr = df['one'].to_numpy()
    
    print(f"\ncol_one_list:\n{col_one_list}\ntype:{type(col_one_list)}")
    print(f"\ncol_one_arr:\n{col_one_arr}\ntype:{type(col_one_arr)}")
    

    Output:

    DataFrame:
       one  two
    a  1.0    1
    b  2.0    2
    c  3.0    3
    d  NaN    4
    
    column types:
    one    float64
    two      int64
    dtype: object
    
    col_one_list:
    [1.0, 2.0, 3.0, nan]
    type:<class 'list'>
    
    col_one_arr:
    [ 1.  2.  3. nan]
    type:<class 'numpy.ndarray'>