I studied mips for while now, and about the ( R-I-J ) formats and when does the sign extention becomes necessary when dealing with I format.
but in case of lh( Load Half ), or lb( Load Byte ), once they've been reed from the memory, they get sign extended from 8 bit/16 bit to 32 bit to avoid potential error, but when does this happen?
the sign extends immediate numbers even before entering the ALU, but after reading the required data from the memory the need to be sign extended before saving it to the register.
but overlays of mips architecture ( Single-Cycle ) dosen't have a sign extend step in the WriteBack stage, how does this work?
You can't take those block diagrams too seriously; there's a ton of stuff missing from them that a real processor has to do.
For example typically missing are datapaths and control signals to handle jal
s capture of the PC+4 into the $ra
return address register, also datapaths and control signals to handle jr $ra
return to caller operation.
Lots more usually missing as well. Multiply and divide are missing — these involve 2 additional registers as well as another specialized ALU (on original MIPS). Coprocessor features are absent as well (coprocessor 0 relates to exceptions & interrupts, coprocessor 1 is the floating point unit).
One possibility is that the Data Memory does the sign/zero extension job since its output is 32 bits wide already, but of course, it will need size info and to know whether to zero or sign extend, and control signals for that are not shown in those diagrams.
You are right that it could also happen in the Registers during WriteBack, but I find that less likely b/c as we move to the pipelined processor, there's a forward from MEM stage output to EX and also from MEM to MEM, and the value forwarded needs to be fully formed already, so this cannot wait until WriteBack in the pipelined processor.
MIPS originally started life as a pipelined processor and those single cycle diagrams are basically theoretical and made for educational purposes. So, if they had to handle it, probably would be done the same as in the pipelined processor, which is to say likely within or immediately after the Data Memory block (and the difference between within or after is largely a matter of diagramming and grouping of functionality by blocks).
You are also correct that instruction immediate fields are sign extended during decode and before reaching the ALU. To be clear, this is only 16 to 32 bit extension, usually diagramed as signed extension only, but must be capable of doing both sign and zero extension, e.g. for addi
vs. ori
. (Control signals indicating which form of extension for the immediates are again usually not shown.) Further, the hardware that does extension of immediates is in use and so cannot serve for extension of the Data Memory output extension, so that's not where bytes and halfwords are extended.