The v3 variable is string value. I could not run with below code which gives error.
import numpy as np
import pandas as pd
from numba.experimental import jitclass
from numba import types
import os
os.environ['NUMBA_VERBOSE'] = '1'
# ----- BEGINNING OF THE MODIFIED PART ----- #
recordType = types.Record([
('v', {'type': types.int64, 'offset': 0, 'alignment': None, 'title': None}),
('v2', {'type': types.float64, 'offset': 8, 'alignment': None, 'title': None}),
('v3', {'type': types.bytes, 'offset': 16, 'alignment': None, 'title': None})
], 32, False)
spec = [
('data', types.Array(recordType, 1, 'C', False))
]
# ----- END OF THE MODIFIED PART ----- #
@jitclass(spec)
class Test:
def __init__(self, data):
self.data = data
def loop(self):
v = self.data['v']
v2 = self.data['v2']
v3 = self.data['v3']
print("Inside loop:")
print("v:", v)
print("v2:", v2)
print("v3:", v3)
# Create a dictionary with the data
data = {'v': [1, 2, 3], 'v2': [1.0, 2.0, 3.0], 'v3': ['a', 'b', 'c']}
# Create the DataFrame
df = pd.DataFrame(data)
# Define the structured array dtype
dtype = np.dtype([
('v', np.int64),
('v2', np.float64),
('v3', 'S10') # Byte string with maximum length of 10 characters
])
print(df.to_records(index=False))
# Create the structured array
data_array = np.array(list(df.to_records(index=False)), dtype=dtype)
print("Original data array:")
print(data_array)
# Create an instance of the Test class
test = Test(data_array)
test.loop()
Errors:
/home/totaljj/miniconda3/bin/conda run -n bt --no-capture-output python /home/totaljj/bt_lite_strategies/test/test_units/test_numba_obj.py
Traceback (most recent call last):
File "/home/totaljj/bt_lite_strategies/test/test_units/test_numba_obj.py", line 13, in <module>
('v3', {'type': types.bytes, 'offset': 16, 'alignment': None, 'title': None})
AttributeError: module 'numba.core.types' has no attribute 'bytes'
ERROR conda.cli.main_run:execute(124): `conda run python /home/totaljj/bt_lite_strategies/test/test_units/test_numba_obj.py` failed. (See above for error)
Process finished with exit code 1,
Neither Numba 57.1, 58.1 nor 59.1 have the types.bytes
type.
Here you should use the type types.CharSeq(10)
in your case (for the S10
Numpy type). Moreover, the final size is wrong: it should be 26 instead of 32 since there are 10 characters and the two other values takes 8 byte each (with no alignment).
Here is the modified code:
import numpy as np
import pandas as pd
from numba.experimental import jitclass
from numba import types
import os
os.environ['NUMBA_VERBOSE'] = '1'
# ----- BEGINNING OF THE MODIFIED PART ----- #
recordType = types.Record([
('v', {'type': types.int64, 'offset': 0, 'alignment': None, 'title': None}),
('v2', {'type': types.float64, 'offset': 8, 'alignment': None, 'title': None}),
('v3', {'type': types.CharSeq(10), 'offset': 16, 'alignment': None, 'title': None})
], 26, False)
spec = [
('data', types.Array(recordType, 1, 'C', False))
]
# ----- END OF THE MODIFIED PART ----- #
@jitclass(spec)
class Test:
def __init__(self, data):
self.data = data
def loop(self):
v = self.data['v']
v2 = self.data['v2']
v3 = self.data['v3']
print("Inside loop:")
print("v:", v)
print("v2:", v2)
print("v3:", v3)
# Create a dictionary with the data
data = {'v': [1, 2, 3], 'v2': [1.0, 2.0, 3.0], 'v3': ['a', 'b', 'c']}
# Create the DataFrame
df = pd.DataFrame(data)
# Define the structured array dtype
dtype = np.dtype([
('v', np.int64),
('v2', np.float64),
('v3', 'S10') # Byte string with maximum length of 10 characters
])
print(df.to_records(index=False))
# Create the structured array
data_array = np.array(list(df.to_records(index=False)), dtype=dtype)
print("Original data array:")
print(data_array)
# Create an instance of the Test class
test = Test(data_array)
test.loop()
Note that converting a dataframe to records can be expensive if the dataframe has many columns since the internal default layout (the one used here) in Pandas is generally a dict of (Numpy) arrays. Records use a transposed layout which is only good for iterating over each line and when most fields are read. Besides, records tends to prevent any low-level vectorization, that is the use of SIMD instructions (which can make a code a lot faster), though not all code can benefit from that. For few columns, it is often better to use multiple arrays like Pandas does internally (especially with strings in it). Please read this and this for more information on Structure of Arrays (SoA) vs Array of Structures (AoS).