I was given the following task:
Given two arrays with 16 elements: NIZA RESW 16 and NIZB RESW 16
store in the third array (NIZC RESW 16) the following values: NIZC[i]=NIZA[i]+NIZB[i] using MMX instructions and compiling it with NASM
This is what I got so far:
%include "apicall.inc"
%include "print.inc"
segment .data
unos1 db "Array A: ", 0
unos2 db "Array B: ", 0
ispisC db "Array C : ", 0
segment .bss
NIZA RESW 16
NIZB RESW 16
NIZC RESW 16
segment .text
global start
start:
call init_console
mov esi,0
mov ecx, 16
mov eax, unos1
call print_string
call print_nl
unos_a:
call read_int
mov [NIZA+esi], eax
add esi, 2
loop unos_a
mov esi,0
mov ecx, 16
mov eax, unos2
call print_string
call print_nl
unos_b:
call read_int
mov [NIZB+esi], eax
add esi, 2
loop unos_b
movq mm0, qword [NIZA]
movq mm1, qword [NIZB]
paddq mm0, mm1
movq qword [NIZC], mm0
mov esi,NIZC
mov ecx,16
mov eax, ispisC
call print_string
call print_nl
ispis_c:
mov ax, [esi]
movsx eax, ax
call print_int
call print_nl
add esi, 2
loop ispis_c
APICALL ExitProcess, 0
After compiling the given array, and testing it with the following two arrays, the third array only stores 4 elements out of 16. (given in the following picture)
Does anybody know why it only stores 4 elements out of 16? Any help is appreciated.
If you have any question for the functions print_string
print_int
print_nl
are functions for printing out a string, new line and a integer by pushing it in the EAX register, and also note this is a 32-bit program.
Does anybody know why it only stores 4 elements out of 16?
Because you let your MMX instructions only operate on the first 4 array elements. You need a loop to process all 16 array elements.
Your task description doesn't say it, but I see you sign-extend the values from NIZC before printing, so you seem expecting signed results. I also see that you use PADDQ
to operate on 4 word-sized inputs. This will then not always give correct results! eg. If NIZA[0]=-1
and NIZB[0]=5
, then you will get NIZC[0]=4
but there will have happened a carry from the first word into the second word, leaving NIZC[1]
wrong. This will not happen if you use the right version of the packed addition: PADDW
.
You got lucky with the size errors on mov [NIZA+esi], eax
and mov [NIZB+esi], eax
. Because NIZA and NIZB follow each other in memory in the same order that you assign to them, no harm was done. If NIZB would have been placed before NIZA, then assigning NIZB[15] would have corrupted NIZA[0].
Below is a partial rewrite where I used an input subroutine in order to not have to repeat myself.
mov eax, unos1
mov ebx, NIZA
call MyInput
mov eax, unos2
mov ebx, NIZB
call MyInput
xor esi, esi
more:
movq mm0, qword [NIZA + esi]
paddw mm0, qword [NIZB + esi]
movq qword [NIZC + esi], mm0
add esi, 8
cmp esi, 32
jb more
emms ; (*)
...
MyInput:
call print_string
call print_nl
xor esi, esi
.more:
call read_int ; -> EAX
mov [ebx + esi], ax
add esi, 2
cmp esi, 32 ; Repeat 16 times
jb .more
ret
(*) For info about emms
(Empty MMX State) see https://www.felixcloutier.com/x86/emms
Tip: You can write mov ax, [esi]
movsx eax, ax
in one instruction: movsx eax, word [esi]
.