pythonpandasdataframedimensionality-reduction

Dimension reduction using PCA based on columns not rows


I'm trying to reduce the dimensions of timeseries data of Covid Cases. I have the Covid cases in form of a dataframe which rows for each date and a column for each district. I now want to reduce the dimensions in order to remove the time warping of the data. My Dataframe looks like this:

1001 1002
01.01.2020 35 57
02.01.2020 29 46
03.01.2020 46 61

the code I run on the dataframe:

# df = above mentioned dataframe

pca = PCA (n_components=2)
transformed_df = pca.fit_transform(df)


What I want to receive (I think) is a column wise reduction of the dimensionality, resulting in array with an array for each column with the results of the dimension reduction. So, len(transformed_df) should be equal to the number of columns I have (in the example above 2. What I receive instead is a array with an array for each row of the dataframe I think (as len(transformed_df) equals my number of rows, in above example it would be 3). So my question is, how do I perform the dimensions reduction per column and not per row.

(Addition: My data is normalized, I chose the numbers above randomly)


Solution

  • Well, turns out simply using df.transpose was enough:

    # df = above mentioned dataframe
    
    pca = PCA (n_components=2)
    transformed_df = pca.fit_transform(df.transpose())