I am trying to use QEMU in user mode to simulate an ARMv9 SME program, but I encounter the following error during execution:
qemu: uncaught target signal 4 (Illegal instruction) - core dumped
QEMU Version: qemu-9.0.2
Target Architecture: ARMv9
matmul_opt: // x0: M, x1: K, x2: N, x3: matLeft, x4: matRight, x5: matResult
stp x19, x20, [sp, #-48]!
stp x21, x22, [sp, #16]
stp x23, x24, [sp, #32]
smstart
// constants
cntw x6 // SVLs
mul x22, x6, x1 // SVLs*K
mul x23, x6, x2 // SVLs*N
add x18, x23, x2 // SVLs*N + N
add x11, x4, x2, lsl #2 // Exit condition for N loop
mov x12, #0
cntb x6 // SVLb
mov x14, #0
ptrue pn10.b // Predicate for SME2 VLx2 (a_ptr loads)
whilelt pn8.s, x12, x0, vlx2 // tiles predicate (M dimension)
sub w6, w6, #8 // SVLb-8
.Loop_M:
// Extract tile 0/1 and tile 2/3 predicates (M) from vlx2 predicate.
pext { p2.s, p3.s }, pn8[0]
mov x16, x4 // b_base
mov x9, x5 // c_base
whilelt pn9.b, x16, x11, vlx2 // tiles predicate (N dimension)
.Loop_N:
mov x7, x3 // a_ptr = a_base
mov x17, x16 // b_ptr = b_base
mov x10, x9 // c_ptr0 = c_base
// Extract tile 0/2 and tile 1/3 predicates (N) from vlx2 predicate.
pext { p0.b, p1.b }, pn9[0]
add x8, x3, x22, lsl #2 // a_base + SVLs*K FP32 elms (bytes)
addvl x15, x8, #-1 // Exit condition for K loop
ld1w {z1.s}, p2/z, [x7] // Load 1st vector from a_ptr
zero {za}
ld1w {z2.s-z3.s}, pn9/z, [x17] // Load 2 vectors from b_ptr
fmopa za0.s, p2/m, p0/m, z1.s, z2.s // ZA0+=1st a_ptr vec OP 1st b_ptr vec
ld1w {z5.s}, p3/z, [x7, x22, lsl #2] // Load 2nd vector from a_ptr
addvl x7, x7, #1 // a_ptr += SVLb (bytes)
.Loop_K:
fmopa za2.s, p3/m, p0/m, z5.s, z2.s // ZA2+=2nd a_ptr vec OP 1st b_ptr vec
// ... (rest of the loop body)
cmp x7, x15
b.mi .Loop_K
// ... (rest of the N loop body)
.Loop_store_ZA:
// ... (rest of the store loop body)
cmp w13, w6
b.mi .Loop_store_ZA
// ... (rest of the M loop body)
smstop
ldp x23, x24, [sp, #32]
ldp x21, x22, [sp, #16]
ldp x19, x20, [sp], #48
ret
aarch64-none-elf-as -march=armv9.2-a -mcpu=cortex-a710+sme2 -o test_sme test_sme.S
qemu-aarch64 test_sme
While executing the above program, I receive the "Illegal instruction" error. What could be the reason for this? Does my version of QEMU support the SME2 instruction set that I'm using? How can I resolve this issue?
Any help would be greatly appreciated!
The instruction "ptrue pn10.b" is in the FEAT_SVE2p1 extension, which QEMU does not yet implement; similarly the form of the "while" insn you are using is FEAT_SVE2p1. So you get an "illegal instruction" error, the same as you would if you ran this binary on a real CPU that didn't implement FEAT_SVE2p1. If you want to run your code under QEMU you should stick to FEAT_SME/FEAT_SVE/FEAT_SVE2 for the moment. (FEAT_SVE2p1 and FEAT_SME2 are on the todo list but may be a while yet.)
The currently emulated set of architectural features are listed here: https://www.qemu.org/docs/master/system/arm/emulation.html