pythonpandasnumpy

How to list a 2d array in a tabular form along with two 1d arrays from which it was generated?


I'm trying to calculate a 2d variable z = x + y where x and y are 1d arrays of unequal dimensions (say, x- and y-coordinate points on a spatial grid). I'd like to display the result row-by-row in which the values of x and y are in the first two columns and the corresponding value of z calculated from these x and y values are in the third, something like the following for x = [1, 2] and y = [3, 4, 5]:

x  y  z
1  3  4
1  4  5
1  5  6
2  3  5
2  4  6
2  5  7

The code below works (using lists here, but will probably need numpy arrays later):

import pandas as pd

x = [1, 2]
y = [3, 4, 5]
col1 = []
col2 = []
z = []
for i in range(len(x)):
    for j in range(len(y)):
        col1.append(x[i])
        col2.append(y[j])
        z.append(x[i]+y[j])

df = pd.DataFrame(zip(col1, col2, z), columns=["x", "y", "z"])
print(df)

Just wondering, is there a better way of doing this without using the loop by some combination of meshgrid, indices, flatten, v/hstack, and reshape? The size of x and y will typically be around 100.


Solution

  • Here is one way:

    import numpy as np
    import pandas as pd
    x = np.asarray([1, 2])[:, np.newaxis]
    y = np.asarray([3, 4, 5])
    x, y = np.broadcast_arrays(x, y)
    z = x + y
    df = pd.DataFrame(zip(x.ravel(), y.ravel(), z.ravel()), columns=["x", "y", "z"])
    print(df)
    #    x  y  z
    # 0  1  3  4
    # 1  1  4  5
    # 2  1  5  6
    # 3  2  3  5
    # 4  2  4  6
    # 5  2  5  7
    

    But yes, you can also use meshgrid instead of orthogonal arrays + explicit broadcasting. You can also use NumPy instead of Pandas.

    x = np.asarray([1, 2])
    y = np.asarray([3, 4, 5])
    x, y = np.meshgrid(x, y, indexing='ij')
    z = x + y
    print(np.stack((x.ravel(), y.ravel(), z.ravel())).T)
    # array([[1, 3, 4],
    #        [1, 4, 5],
    #        [1, 5, 6],
    #        [2, 3, 5],
    #        [2, 4, 6],
    #        [2, 5, 7]])