I would like to use RISC-V vector extension in my C++ application. I noticed that it is not possible to use RVV types (e.g., "vuint32m1_t") for class members. Compiling with gcc (v13.2.0), I get the following error:
error: member variables cannot have RVV type 'vuint32m1_t'
I could not find references online.
I think that the solution could be to use standard types (e.g., uint32_t) for class members and convert them in RVV types whenever I need to perform a vector operation. I believe that this solution may degrade performance.
Does anyone have any other ideas?
RVV vector registers are vector length agnostic, and you don't know the vector length at compile time. You can't put RVV vector types directly into classes, because the size of class and struct needs to be known at compile time.
Usually you don't need to store vectors in classes, you store the data instead and load from it when needed. Otherwise, you could heap allocate enough space for a vector by querying the vector length at runtime. Both of these methods are problematic if you need to load store the same data a lot, since gcc and clang can't yet do predicated vector load store elimination.
If you target a specific vector length, and only that specific vector length, then you can use the riscv_rvv_vector_bits attribute, which can be placed inside of structs and classes. Please don't use this if you can avoid it, it makes your code non portable.
Do you actually need to put vectors directly into a class for your problem?
It's often possible to change the API, such that it isn't needed, e.g. a forEach
callback function instead of an iterator API, but sometimes you can't change the API.
I'm trying to collect cases where you really need this feature, because there is a solution that gets you 95% there, but it isn't trivial to implement. Compiles could provide a type that has a fixed size of 512 bits, and generate code in such a way, that when your vector length is smaller you just access the lower bits, and when your vector length is larger you don't use the extra bits of your vectors. This gives you fully vector length agnostic code that scales from VLEN=128 to VLEN=512. It would also run on VLEN>512, although without taking full advantage of your hardware.