pythonpandasdataframemodelingfinancial

Financial modelling with Pandas dataframe


I have built up a simple DCF model mainly through Pandas. Basically all calculations happen in a single dataframe. I want to find a better coding style as the model becomes more complex and more variables have been added to the model. The following example may illustrate my current coding style - simple and straightforward.

# some customized formulas
def GrowthRate():
def BoundedVal()
....
# some operations
df['EBIT'] = df['revenue'] - df['costs']
df['NI'] = df['EBIT'] - df['tax'] - df['interests']
df['margin'] = df['NI'] / df['revenue']

I loop through all years to calculate values. Now I have added over 500 variables to the model and calculation also becomes more complex. I was thinking to create a separate def for each variable and update the main df accordingly. So the above code would become:

def EBIT(t):
    df['EBIT'][t] = df['revenue'][t] - df['costs'][t]
    #....some more ops
    return df['EBIT'][t]

def NI(t):
    df['NI'][t] = EBIT(t) - df['tax'][t] - df['interests'][t]
    #....some more ops
    return df['NI'][t]

def margin(t):
    if check_df_is_nan():
        df['margin'][t] = NI(t) - df['costs'][t]
        #....some more ops
        return df['margin'][t]
    else:
        return df['margin'][t]

Each function is able to 1) calculate results and update df 2)return value if called by other functions.

To avoid redundant calculation (think if margin(t) is called by multiple times), it would be better to add a "check if val has been calculated before" function to each def.

My question: 1) is it possible to add the if statement to a group of defs? similar to the if clause above. 2) I have over 50 custom defs so the main file becomes too long. I cannot simply move all defs to another file and import all because some defs also refer to the dataframe in the main file. Any suggestions? Can I set the df as a global variable so defs from other files are able to modify and update?


Solution

  • For 1, just check if the value is NaN or not.

    import pandas as pd
    def EBIT(t):
        if pd.notnull(df['EBIT'][t]):
            return df['EBIT'][t]
    
        df['EBIT'][t] = df['revenue'][t] - df['costs'][t]
        ...
    

    For 2, using global variable might work, but it's a bad approach. You should really try to avoid using them whenever possible.

    What you should do is instead make each function take the global data frame as an argument. Then you can pass in the data frame you want to operate on.

    # in some other file
    def EBIT(df, t):
        # logic goes here
    
    # in the main file
    import operations as op
    # ...
    op.EBIT(df, t)
    enter code here
    

    P.S. Have you consider doing operation on the whole column at once rather using t? It should be much faster.