pythonpandasscipyregression

Linear Regression on each column without creating for loops or functions


Applying regression on each of the columns or rows in a pandas dataframe, without using for loops.

There is a similar post about this; Apply formula across pandas rows/ regression line, that does a regression for each of the "rows," however plotting the answer given is wrong. I couldn't comment on it as i do not have enough reputation, the main problem with that is that, it takes the values of the columns but then uses the apply function on each row.

Currently I only know how to do each column eg.

np.random.seed(1997)

df = pd.DataFrame(np.random.randn(10, 4))
first_stats = scipy.stats.linregress(df.index,df[0])
second_stats = scipy.stats.linregress(df.index,df[1])

I was hoping to find an answer without creating a function or for loops, similar to; pandas df.sum(), but instead of sum i want to do a regression that results in slope, intercept, r-value, p-value and standard error.


Solution

  • Look at the following example:

    import numpy as np
    import pandas as pd
    from scipy.stats import linregress
    
    np.random.seed(1997)
    df = pd.DataFrame(np.random.rand(100, 10))
    
    df.apply(lambda x: linregress(df.index, x), result_type='expand').rename(index={0: 'slope', 1: 'intercept', 2: 'rvalue', 3: 'p-value', 4:'stderr'})
    

    It should return what you want.