I'm trying to reduce the dimensions of timeseries data of Covid Cases. I have the Covid cases in form of a dataframe which rows for each date and a column for each district. I now want to reduce the dimensions in order to remove the time warping of the data. My Dataframe looks like this:
1001 | 1002 | |
---|---|---|
01.01.2020 | 35 | 57 |
02.01.2020 | 29 | 46 |
03.01.2020 | 46 | 61 |
the code I run on the dataframe:
# df = above mentioned dataframe
pca = PCA (n_components=2)
transformed_df = pca.fit_transform(df)
What I want to receive (I think) is a column wise reduction of the dimensionality, resulting in array with an array for each column with the results of the dimension reduction. So, len(transformed_df) should be equal to the number of columns I have (in the example above 2. What I receive instead is a array with an array for each row of the dataframe I think (as len(transformed_df) equals my number of rows, in above example it would be 3). So my question is, how do I perform the dimensions reduction per column and not per row.
(Addition: My data is normalized, I chose the numbers above randomly)
Well, turns out simply using df.transpose was enough:
# df = above mentioned dataframe
pca = PCA (n_components=2)
transformed_df = pca.fit_transform(df.transpose())