pythonarraysnumpyexpanddimension

How to add a dimension to an array and fill up the new dimension with a set of same data


I have two 1D-arrays. I need to expand the first array (a) with all lines from the second array (b) to create a new array that is a 1D-array merging the two arrays.

Example below to be clearer:

a = np.array(['x', 'y'])
b = np.array(['a', 'b', 'c'])
# how to handle the above 1D-arrays to create the below array (c)?
c = np.array(['xa', 'xb', 'xc', 'ya', 'yb', 'yc'])
print(c)

The new array c would look like:

['xa' 'xb' 'xc' 'ya' 'yb' 'yc']

Of course, I can do it with loops, but I'm looking for a smarter code. Thank you


Solution

  • For 2 lists, a smart thing is to use a list comprehension:

    In [234]: a = ['x', 'y']
         ...: b = ['a', 'b', 'c']
    In [235]: [i+j for i in a for j in b]
    Out[235]: ['xa', 'xb', 'xc', 'ya', 'yb', 'yc']
    

    For arrays you can use np.char.add as shown in the other answers:

    In [236]: A=np.array(a); B=np.array(b)
    In [237]: np.char.add(A[:,None],B)
    Out[237]: 
    array([['xa', 'xb', 'xc'],
           ['ya', 'yb', 'yc']], dtype='<U2')
    

    Timeit on such a small example has to viewed with caution. Often times for lists are better for small examples, but don't scale nearly as well. But I expect np.char.add will hurt the array scaling (the np.char functions just apply standard string methods to the array elements.).

    In [238]: timeit np.char.add(A[:,None],B)
    23.2 µs ± 57.4 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    In [239]: timeit [i+j for i in a for j in b]
    1.55 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
    

    Specifying object dtype when making the arrays, we can use the + operator, and gain some speed:

    In [240]: A=np.array(a,object); B=np.array(b,object)    
    In [241]: A[:,None]+B
    Out[241]: 
    array([['xa', 'xb', 'xc'],
           ['ya', 'yb', 'yc']], dtype=object)    
    In [242]: timeit A[:,None]+B
    7.39 µs ± 76.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    

    For reference, adding two numeric arrays:

    In [245]: %%timeit x=np.arange(2); y=np.arange(3)
         ...: x[:,None]+y
    5.95 µs ± 8.71 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    In [246]: %%timeit x=np.arange(200); y=np.arange(300)
         ...: x[:,None]+y
    100 µs ± 533 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    

    The 2nd case is 10_000 larger, but time increases only 20x.