pythonarraysnumpyconcatenationorganization

How to concatenate numpy arrays to create a 2d numpy array


I'm working on using AI to give me better odds at winning Keno. (don't laugh lol) My issue is that when I gather my data it comes in the form of 1d arrays of drawings at a time. I have different files that have gathered the data and formatted it as well as performed simple maths on the data set. Now I'm trying to get the data into a certain shape for my Neural Network layers and am having issues.

  formatted_list = file.readlines()     
  #remove newline chars     
  formatted_list = list(filter(("\n").__ne__, formatted_list))     
  #iterate through each drawing, format the ends and split into list of ints     
  for i in formatted_list:         
     i = i[1:]        
     i = i[:-2]         
     i = [int(j) for j in i.split(",")]         
     #convert to numpy array
     temp = np.array(i)
     #t1 = np.reshape(temp, (-1, len(temp))) 
     #print(np.shape(t1))
     #append to master list
     master_list.append(temp)
  print(np.shape(master_list))     

This gives output of "(292,)" which is correct there are 292 rows of data however they contain 20 columns as well. If I comment in the "#t1 = np.reshape(temp, (-1, len(temp))) #print(np.shape(t1))" it gives output of "(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)", etc. I want all of those rows to be added together and keep the columns the same (292,20). How can this be accomplished?

I've tried reshaping the final list and many other things and had no luck. It either populates each number in the row and adds it to the first dimension, IE (5840,) I was expecting to be able to append each new drawing to a master list, convert to numpy array and reshape it to the 292 rows of 20 columns. It just appears that it want's to keep the single dimension. I've tried numpy.concat also and no luck. Thank you.


Solution

  • You can use vstack to concatenate your master_list.

    master_list = []
    for array in formatted_list:
        master_list.append(array)
    
    master_array = np.vstack(master_list)
    

    Alternatively, if you know the length of your formatted_list containing the arrays and array length you can just preallocate the master_array.

    import numpy as np
    
    formatted_list = [np.random.rand(20)]*292
    master_array = np.zeros((len(formatted_list), len(formatted_list[0])))
    for i, array in enumerate(formatted_list):
        master_array[i,:] = array
    

    ** Edit **

    As mentioned by hpaulj in the comments, np.array(), np.stack() and np.vstack() worked with this input and produced a numpy array with shape (7,20).