pythonpandaspandas-profiling

Speeding up pandas profiling analysis using check_correlation?


Using pandas profiling to generate a report. the size of the dataset is very large to speed up the processing im trying to turn off correlations so i used check_correlations from another post I saw, ValueError: Config parameter "check_correlation" does not exist. is then the issue I get from using this line

a = prof.ProfileReport(df, title='Downloads', check_correlation=False)

which generates this issue of

ValueError: Config parameter "check_correlation" does not exist.


Solution

  • Since they have changed the configurations on version 2, you could use it as:

    import pandas_profiling
    
    profile = df.profile_report(check_correlation_pearson=False,
    correlations={'pearson': False,
    'spearman': False,
    'kendall': False,
    'phi_k': False,
    'cramers': False,
    'recoded': False})
    

    to turn off correlations. However, it is still not as fast as version 1.4. You could also investigate other configurations here.