[SOLVED] Using features without applying PCA

Suppose there are 8 features in the dataset. I use PCA to and find out that 99% of the information is in the first 3 features using the cumulative sum of the explained variance ratio. Then why do I need to fit and transform these 3 features using PCA in order to use them for training my neural network ? Why cant I just use the three features as is ?

The reason is that when PCA tells you that 99% of the variance is explained by the first three components, it doesn't mean that it is explained by the first three features. PCA components are linear combinations of the features, but they are usually not the features themselves. For example, PCA components must be orthogonal to each other, while the features don't have to be.