How to reduce the dimension of CSV file?

Suppose I have one CSV file with dimension m×n means m rows and n columns. I want to reduce its dimension by replacing average value of corresponding sub matrix.

Example 1:

Given we have 6×6 matrix (CSV file):

col1,col2,col3,col4,col5,col6
a1,b1,c1,d1,e1, f1
a2,b2,c2,d2,e2, f2
a3,b3,c3,d3,e3, f3
a4,b4,c4,d4,e4, f4
a5,b5,c5,d5,e5, f5
a6,b6,c6,d6,e6, f6

If we want 2×2 matrix, then resultant CSV file should be below:

col1, col2
a',  d'
a", d"

Where

a'=(a1+a2+a3+b1+b2+b3+c1+c2+c3)/9
a"=(a4+a5+a6+b4+b5+b6+c4+c5+c6)/9
d'=(d1+d2+d3+e1+e2+e3+f1+f2+f3)/9
d"=(d4+d5+d6+e4+e5+e6+f4+f5+f6)/9

Example:2

Given we have 5×6 matrix (CSV file):

col1,col2,col3,col4,col5,col6
a1,b1,c1,d1,e1, f1
a2,b2,c2,d2,e2, f2
a3,b3,c3,d3,e3, f3
a4,b4,c4,d4,e4, f4
a5,b5,c5,d5,e5, f5

If we want 2×2 matrix, then resultant CSV file should be below:

col1, col2
a',  d'
a", d"

Where

a'=(a1+a2+a3+b1+b2+b3+c1+c2+c3)/9
a"=(a4+a5+b4+b5+c4+c5)/6
d'=(d1+d2+d3+e1+e2+e3+f1+f2+f3)/9
d"=(d4+d5+e4+e5+f4+f5)/6

Example 3: Given we have 6×5 matrix (CSV file):

col1,col2,col3,col4,col5,col6
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
a3,b3,c3,d3,e3
a4,b4,c4,d4,e4
a5,b5,c5,d5,e5
a6,b6,c6,d6,e6

If we want 2×2 matrix, then resultant CSV file should be below:

col1, col2
a',  d'
a", d"

Where

a'=(a1+a2+a3+b1+b2+b3+c1+c2+c3)/9
a"=(a4+a5+a6+b4+b5+b6+c4+c5+c6)/9
d'=(d1+d2+d3+e1+e2+e3)/6
d"=(d4+d5+d6+e4+e5+e6)/6

I want the python code which can reduce the dimension by putting the average of the sum of all sub matrix. For in example1, we have given 6×6 matrix, we want 2×2 matrix, so we consider (6÷2) × (6÷2) = 3×3 sub matrix and calculate average of 9 elements of 3×3 matrix , which is the one element of resultant 2×2 matrix and so on.

And in example2, if given dimension isn't multiple of resultant dimension, we use ceiling function, we first start by consider ceiling(5÷2)×(6÷2) = 3×3 matrix, and at the end(corner) , we mayn't get 3×3 matrix, we just calculate average of remaining elements, as we see in example2, example3.

Solution

Assuming this example:

   col1  col2  col3  col4  col5  col6
0     0     1     2     3     4     5
1     6     7     8     9    10    11
2    12    13    14    15    16    17
3    18    19    20    21    22    23
4    24    25    26    27    28    29
5    30    31    32    33    34    35

You could rename the indexes (with set_axis), stack, and groupby.mean:

import math

n, m = 2, 2 # desired shape

out = (df
 .set_axis(np.arange(df.shape[0])//math.ceil(df.shape[0]/n), axis=0)
 .set_axis(np.arange(df.shape[1])//math.ceil(df.shape[1]/m), axis=1)
 .stack().groupby(level=[0, 1]).mean().unstack()
 .rename(columns=lambda x: f'col{x+1}') # optional
)

Alternative using numpy and padding (numpy.pad) with NaNs before reshape and nanmean:

import math

n, m = 2, 2 # desired shape

out = pd.DataFrame(np.nanmean(np.pad(df.astype(float), [(0, df.shape[0]%n), 
                                                        (0, df.shape[1]%m)],
                                     constant_values=np.nan)
                                .reshape(n, math.ceil(df.shape[0]/n), m, -1),
                              axis=(1, 3)
                             )
                  ).rename(columns=lambda x: f'col{x+1}')

Output:

   col1  col2
0   7.0  10.0
1  25.0  28.0

Output with a 5x6 input (last row missing):

   col1  col2
0   7.0  10.0
1  22.0  25.0