pandasdata-scienceeda

Exploratory Data Analysis on Datasets with too much variables


My question is a little bit theoretical.

I have a dataset with 100+ columns, Every EDA method that I use results in a messed-up plot, How can I get more interpretable plots and tables with such data?


Solution

  • @Zine

    Try using only the variables you need in the visualizations.


    You can use Principal Components Analysis (PCA) to reduce the variables. It is an effective way of reducing the variables but contains the same quality data. For your reference, links to learn PCA: -

    1. https://www.sartorius.com/en/knowledge/science-snippets/what-is-principal-component-analysis-pca-and-how-it-is-used-507186

    2)https://www.geeksforgeeks.org/ml-principal-component-analysispca/

    3)https://www.machinelearningplus.com/machine-learning/principal-components-analysis-pca-better-explained/