pythonpandasdataframe

How can I assign iterables to columns in pandas dataframes?


I have a dataframe containing rows which describe financial stocks. The following is a simplified version:

df = pd.DataFrame(
    {
        "stockprice": [100, 103, 240],
        "Characteristic1": [1, 3, 3],
        "Characteristic2": [5, 7, 1],
        "Characteristic3": [1, 4, 6],
    },
    index=["Company A", "Company B", "Company C"],
)

#            stockprice  Characteristic1  Characteristic2  Characteristic3
# Company A  100         1                5                1              
# Company B  103         3                7                4              
# Company C  240         3                1                6            

I would like to add a column which should contain for each cell a long dictionary which will be generated based on some of these characteristics - a series of cashflows. Later I will want to do some calculation on this generated dictionary.

Here is a sample function which generates the dictionary, and then the assign function to put it into my dataframe:

def cashflow_series(ch1=1, ch2=2):
    return {0: ch1, 0.5: ch2, 1: 7, 2: 8, 3: 9}


df.assign(
    cashflows=lambda x: cashflow_series(
        ch1=x["Characteristic1"], ch2=x["Characteristic3"]
    )
)

This returns

           stockprice  Characteristic1  Characteristic2  Characteristic3 cashflows
Company A  100         1                5                1                NaN     
Company B  103         3                7                4                NaN     
Company C  240         3                1                6                NaN

How can I fix this?

I want the new column 'cashflows' to contain a dictionary for each row, not a NaN.

I want something like this:

           stockprice  Characteristic1  Characteristic2  Characteristic3 cashflows
Company A  100         1                5                1                {0:1,..3:9}
Company B  103         3                7                4                {0:3,..3:9}
Company C  240         3                1                6                {0:3,..3:9}

Solution

  • You'll need to use a list comprehension:

    def cashflow_series(ch1=1,ch2=2):
        return {0:ch1,0.5:ch2,1:7,2:8,3:9}
    
    df.assign(cashflows=lambda x: [cashflow_series(ch1=x.loc[i, 'Characteristic1'],
                                                   ch2=x.loc[i, 'Characteristic3'])
                                   for i in x.index])
    

    Or with parameter unpacking:

    df.assign(cashflows=lambda x: [cashflow_series(*params)
                                   for params in
                                   df[['Characteristic1', 'Characteristic2']].values])
    

    Output:

               stockprice  Characteristic1  Characteristic2  Characteristic3                         cashflows
    Company A         100                1                5                1  {0: 1, 0.5: 1, 1: 7, 2: 8, 3: 9}
    Company B         103                3                7                4  {0: 3, 0.5: 4, 1: 7, 2: 8, 3: 9}
    Company C         240                3                1                6  {0: 3, 0.5: 6, 1: 7, 2: 8, 3: 9}