Suppose a Little Man Computer program requires 1 microsecond to execute one instruction. I need to write a LMC program that takes an input between 0 and 10 (inclusive), and produces an output of one 1 but after a delay of that many seconds.
For example, an input of 5 will produce an output of 1, five seconds later. The delay need not be perfect but must be accurate within 0.01% or 100 µsec. How can I achieve that?
You'll need to create loops to spend time. You'll want to have a code block that will take approximately 1 second to execute, and then repeat the execution of that block corresponding to the number that was input. So the challenge is to design code that takes 1 second to execute.
Let's try that with a loop:
LDA start
loop STA count
LDA count
SUB one
BRP loop
; ...other code...
one DAT 1
start DAT 999
count DAT
This loop would have 1000 iterations. Each iteration has 4 instructions, so the 1000 iterations take 4000µs. We could beef up the code a bit to make the loop have more instructions, but we are nowhere near 1 second, so we better employ a second loop that repeatedly executes the above. That outer loop would need to iterate some 250 times, since 250*4000 = a million, i.e. 1 second.
So we would have:
LDA start2
loop2 STA count2
; here starts the code we used earlier
LDA start3
loop3 STA count3
LDA count3
SUB one
BRP loop3
; end of the code we used earlier
LDA count2
SUB one
BRP loop2
; ... other code
one DAT 1
start2 DAT 249
start3 DAT 999
count2 DAT
count3 DAT
If we make a precise calculation of the number of instructions that are executed, then we arrive at this:
This is a bit too much (because of the overhead): the deviation is about 0.1%, so we need to tune a bit the numbers.
But let's first finish the program by implementing the outer loop that executes the above code as many times as is specified in the input:
INP
BRZ output ; output immediately!
SUB one
; a new outer loop:
loop1 STA count1
; start of what we already had:
LDA start2
loop2 STA count2
LDA start3
loop3 STA count3
LDA count3
SUB one
BRP loop3
LDA count2
SUB one
BRP loop2
; end of what we already had
LDA count1
SUB one
BRP loop1
output LDA one
OUT
HLT
one DAT 1
start2 DAT 249
start3 DAT 999
count1 DAT
count2 DAT
count3 DAT
Note that the outer loop that we added also has its overhead... again 5 instructions overhead if we include LDA start2
in that count.
One iteration of loop1
thus takes 5 + 1001250 = 1001255.
Finally, let's now tune the start2
and start3
values so that we get closer to 1000000. Playing with pairs of numbers in a spreadsheet, you can find that you get close with start2 = 541
for 542 iterations, and start3 = 459
for 460 iterations (There are other interesting pairs you could use, with similar results).
loop3
: 4loop2
: 5 + (460*4) = 1845loop1
: 5 + (542*1845) = 999995So we need 5 more instructions to get to a perfect second. That's easy... just repeat an instruction a few times in the overhead of loop1
. And so we end up with this:
INP
BRZ output ; output immediately!
SUB one
; a new outer loop:
loop1 STA count1
LDA start2
LDA start2 ; dummy instructions to spend 5µs more...
LDA start2 ; ...
LDA start2 ; ...
LDA start2 ; ...
LDA start2 ; ...
loop2 STA count2
LDA start3
loop3 STA count3
LDA count3
SUB one
BRP loop3
LDA count2
SUB one
BRP loop2
LDA count1
SUB one
BRP loop1
output LDA one
OUT
HLT
one DAT 1
start2 DAT 541 ; the two magic numbers
start3 DAT 459 ; ...
count1 DAT
count2 DAT
count3 DAT
We didn't speak yet of the overhead that is executed outside of loop1
. This includes the BRZ
, SUB
(in case non-zero input) and two instructions in the output
section (LDA
and OUT
), so 3-4µs.
This means that we get the following delays for each of the following inputs:
input | µs from INP to OUT
------+-------------------
0 | 3
1 | 1000004
2 | 2000004
... | ...
9 | 9000004
10 | 10000004
You could even get rid of those extra 4 milliseconds by skipping 4 dummy instructions in the last iteration of the outer loop. So change this:
loop1 STA count1
LDA start2
LDA start2 ; dummy instructions to spend 5µs more...
LDA start2 ; ...
LDA start2 ; ...
LDA start2 ; ...
LDA start2 ; ...
to this:
loop1 STA count1
BRZ skip ; this branches only in the last iteration
LDA start2 ; dummy instructions to spend 4µs more...
LDA start2 ; ...
LDA start2 ; ...
LDA start2 ; ...
skip LDA start2
So now the timing is exact, except for the case where the input is 0. There is no way to spend less than 3 instructions to deal with that case (zero detection, load 1, and output it).