pythonnumpyormpytablescolumn-oriented

Python ORM to NumPy arrays


I am building data simulation framework with numpy ORM, where it is much more convenient to work with classes and objects instead of numpy arrays directly. Nevertheless, output of the simulation should be numpy array. Also blockz is quite interesting as a backend here.

I would like to map all object attributes to numpy arrays. Thus, numpy arrays work like a column-oriented "persistent" storage for my classes. I also need to link "new" attributes to objects which I can calculate using numpy(pandas) framework. And then just link them to objects accordingly using the same back-end.

Is there any solution for such approach? Would you recommend any way to build it in a HPC way? I have found only django-pandas. PyTables is quite slow on adding new columns-attributes.

Something like (working on pointers to np_array):

class Instance()
    def __init__(self, np_array, np_position):
        self.np_array = np_array
        self.np_position = np_position

    def get_test_property():
        return(self.np_array[np_position])

    def set_test_property(value):
        self.np_array[np_position] = value

Solution

  • In fact there is a way to change NumPy or bcolz arrays by reference. Simple example can be found in the following code.

    a = np.arange(10)
    class Case():
        def __init__(self, gcv_pointer):
            self.gcv = gcv_pointer
    
        def gcv(self):
            return(self.gcv)
    
        def gcv_set(self, value):
            self.gcv[:] = value
    pass
    #===============================================================================
    # NumPy
    #===============================================================================
    caseList = []
    for i in range(1, 10):
        case = Case(a[i-1:i])
        caseList.append(case)
    gcvs = [case.GetGCV() for case in caseList]
    caseList[1].SetGCV(5)
    caseList[1].SetGCV(13)
    caseList[1].gcv[:] = 6
    
    setattr(caseList[1], 'dpd', a[5:6])
    
    caseList[1].dpd
    caseList[1].dpd[:] = 888