pythonfor-loopcorrelationcoefficientspearson-correlation

How to print the result of pearsonr loop in the graph title for each iteration


I have this code where the loop iterates through PRDCT column then calculates the p and r value, and creates a graph for each unique product code:

for prd in df_final.PRDCT.unique():
    df_tmp = df_final[df_final.PRDCT== prd].reset_index().copy()
    coeff, p = pearsonr(df_tmp['PRDCT_mean'], np.arange(0,len(df_tmp['PRDCT_mean'])))
    plt.figure(figsize = (15,6))
    plt.plot(df_tmp['Month'],df_tmp['PRDCT_mean'], marker="o")
    plt.title(prd, fontsize=18)
    plt.ylabel('PRDCT_mean')
    plt.xlabel('Month')
    plt.grid(True)
    plt.ylim((-60,60))
    plt.xticks(rotation= 'vertical',size=8)
    plt.show()

Question 1 : How can I show the respective coefficient value of each unique product code beside the graph title of the each product?

Question 2 : How can I save the result of each pearsonr P and r value that takes place in for each iteration seperately?

Prefer these actions to include in the same code if possible

Thanks in adv


Solution

  • Consider creating a defined method that handles all steps: builds plot, concatenates string statistics to title, and returns statistics. Then create a dictionary via comprehension using DataFrame.groupby.

    def run_plot_save_stats(prd, df_tmp):
        df_tmp = df_tmp.reset_index().copy()
        coeff, p = pearsonr(df_tmp['PRDCT_mean'], np.arange(0,len(df_tmp['PRDCT_mean'])))
        title = f"Product: {prd} - pearson coeff: {coeff.round(4)} p-value: {p.round(4)}"
    
        plt.figure(figsize = (15,6))
        plt.plot(df_tmp['Month'],df_tmp['PRDCT_mean'], marker="o")
        plt.title(title, fontsize=18)
        plt.ylabel('PRDCT_mean')
        plt.xlabel('Month')
        plt.grid(True)
        plt.ylim((-60,60))
        plt.xticks(rotation= 'vertical',size=8)
        plt.show()
    
        return {"pearson": coeff, "p_value": p}
    
    prod_stats_dict = {
        grp: run_plot_save_stats(grp, df) for grp, df in df_final.groupby("PRDCT")
    }
    
    prod_stats_dict["product1"]["pearson"]
    prod_stats_dict["product1"]["p_value"]
    prod_stats_dict["product2"]["pearson"]
    prod_stats_dict["product2"]["p_value"]
    ...