I have a very similar question as "Binary Bomb - Phase 4" but it is still different enough that I'm not entirely sure what to do.
Here is my phase_4 code:
08048d3e <phase_4>:
8048d3e: 83 ec 2c sub $0x2c,%esp
8048d41: 8d 44 24 18 lea 0x18(%esp),%eax
8048d45: 89 44 24 0c mov %eax,0xc(%esp)
8048d49: 8d 44 24 1c lea 0x1c(%esp),%eax
8048d4d: 89 44 24 08 mov %eax,0x8(%esp)
8048d51: c7 44 24 04 75 a7 04 movl $0x804a775,0x4(%esp)
8048d58: 08
8048d59: 8b 44 24 30 mov 0x30(%esp),%eax
8048d5d: 89 04 24 mov %eax,(%esp)
8048d60: e8 6b fb ff ff call 80488d0 <__isoc99_sscanf@plt>
8048d65: 83 f8 02 cmp $0x2,%eax //making sure I have 2 inputs
8048d68: 75 0e jne 8048d78 <phase_4+0x3a> //if not explodes bomb
8048d6a: 8b 44 24 18 mov 0x18(%esp),%eax
8048d6e: 83 f8 01 cmp $0x1,%eax //has to be greater than 1
8048d71: 7e 05 jle 8048d78 <phase_4+0x3a> //otherwise jumps to bomb
8048d73: 83 f8 04 cmp $0x4,%eax
8048d76: 7e 05 jle 8048d7d <phase_4+0x3f> //has to be less than 4 or jumps to bomb
8048d78: e8 af 05 00 00 call 804932c <explode_bomb>
8048d7d: 8b 44 24 18 mov 0x18(%esp),%eax
8048d81: 89 44 24 04 mov %eax,0x4(%esp)
8048d85: c7 04 24 09 00 00 00 movl $0x9,(%esp)
8048d8c: e8 50 ff ff ff call 8048ce1 <func4> //calls function 4
8048d91: 3b 44 24 1c cmp 0x1c(%esp),%eax
8048d95: 74 05 je 8048d9c <phase_4+0x5e> //compares two values and explodes bomb if not equal
8048d97: e8 90 05 00 00 call 804932c <explode_bomb>
8048d9c: 83 c4 2c add $0x2c,%esp
8048d9f: 90 nop
8048da0: c3 ret
And here is func_4 code:
08048ce1 <func4>: //not entirely sure what's happening here but it might be a binary search?
8048ce1: 83 ec 1c sub $0x1c,%esp
8048ce4: 89 5c 24 10 mov %ebx,0x10(%esp)
8048ce8: 89 74 24 14 mov %esi,0x14(%esp)
8048cec: 89 7c 24 18 mov %edi,0x18(%esp)
8048cf0: 8b 74 24 20 mov 0x20(%esp),%esi
8048cf4: 8b 5c 24 24 mov 0x24(%esp),%ebx
8048cf8: 85 f6 test %esi,%esi
8048cfa: 7e 2b jle 8048d27 <func4+0x46>
8048cfc: 83 fe 01 cmp $0x1,%esi
8048cff: 74 2b je 8048d2c <func4+0x4b>
8048d01: 89 5c 24 04 mov %ebx,0x4(%esp)
8048d05: 8d 46 ff lea -0x1(%esi),%eax
8048d08: 89 04 24 mov %eax,(%esp)
8048d0b: e8 d1 ff ff ff call 8048ce1 <func4>
8048d10: 8d 3c 18 lea (%eax,%ebx,1),%edi
8048d13: 89 5c 24 04 mov %ebx,0x4(%esp)
8048d17: 83 ee 02 sub $0x2,%esi
8048d1a: 89 34 24 mov %esi,(%esp)
8048d1d: e8 bf ff ff ff call 8048ce1 <func4>
8048d22: 8d 1c 07 lea (%edi,%eax,1),%ebx
8048d25: eb 05 jmp 8048d2c <func4+0x4b>
8048d27: bb 00 00 00 00 mov $0x0,%ebx
8048d2c: 89 d8 mov %ebx,%eax
8048d2e: 8b 5c 24 10 mov 0x10(%esp),%ebx
8048d32: 8b 74 24 14 mov 0x14(%esp),%esi
8048d36: 8b 7c 24 18 mov 0x18(%esp),%edi
8048d3a: 83 c4 1c add $0x1c,%esp
8048d3d: c3 ret
I checked to make sure that the input must be two decimals, and I can also see that at the end two numbers are being compared with one another (line 8048d97, 0x1c(%esp) and %eax). At the beginning of phase_4 I think the code is also indicating that the first number has to be between 1 and 4, and at the end of phase 4, however the number has been modified, it must equal the second number. Please correct me if I'm wrong.
I'm just not sure what the func_4 is doing, and how to determine what the inputs should be. I think it might be the binary search, but not sure how to check how it corresponds with the first input.
what the func_4 is doing, and how to determine what the inputs should be.
8048d81: 89 44 24 04 mov %eax,0x4(%esp)
; input value loaded to [esp+4] -> [esp+0x24] inside func4
8048d85: c7 04 24 09 00 00 00 movl $0x9,(%esp)
; 9 stored to [esp+0] -> [esp+0x20] inside func4
8048d8c: e8 50 ff ff ff call 8048ce1 <func4>
; something calculated
; then the result is compared with other input value, they should be equal
I'm not sure, which input is which (you know the format string for sscanf and your ABI, so you can tell better, the one stored into [esp+0x18] and tested to be of value 2, 3 or 4 is used in calculation, the one stored into [esp+0x1c] is just compared. I would guess that input for calculation is second (for sscanf at +0x0c, the other one is at +0x08)? So password is "<func4(9,2-4)> <2-4>
"?
As that func4 is clean asm recursive code, you can just run it for all three possible inputs to see what result it produces (in debugger, stepping over instructions, so you manage to catch if some parameter leads to infinite loop or some other nastiness, plus you will get idea how it works).
Then figure out which input is which.
not sure what the func_4 is doing
Hey, that's how assembly works. You rarely take a look on stream of instructions, and recognize the algorithm from them with a quick glance. More often you need to thoroughly emulate CPU state in your head instruction by instruction, paying attention to every detail (like flag change by instruction which looks to be used only for setting up a value, then few instructions later that preserved flag is used in conditional jump), and by keeping some timeline of calculated values it's usually possible to guess the algorithm implemented.
Or if not algorithm, then at least to know what values are calculated.
I checked the code one more time, and it looks like variation on Fibonnaci theme, like func4(0,x) = 0
, func4(1,x) = x
, and func4(y, x)
= (just quick guess, too lazy to verify I got it right) x + func4(y-1, x) + func4(y-2, x)
.
edit: I'm now sure that formula is not correct, it's missing at least one constant (but maybe it's wrong even more).
So I wonder, are you too lazy to bother to emulate the CPU thoroughly and get to the correct result, or you have problem with some particular instruction, what it exactly does (ie. you are too lazy to read the instruction reference guide)? So it boils down to "are you lazy or lazy?" Because I'm definitely both, so this is my answer to your question. :)
In case you don't understand some particular detail about some instruction, just ask.
edit about comments:
"//has to be less than 4 or jumps to bomb"
No, it's jle
, that's mnemonics for "jump signed less/equal than", so the correct comment is "has to be less than or equal 4" and that's valid for values [TYPE_MIN, 4].
For "less than 4" you would need to use jl
= "jump signed less".
For "less than 4" unsigned there's jb
= "jump below", that covers values [0, 3]. jbe
covers [0, 4] values.
If you will check Jcc description, you will see those conditional jumps are based only on few flags in the flag register, nothing more (except jcxz/jecxz
, which compares cx
register and doesn't touch flags at all).
Also you may notice there are several aliases, so you can write in your code the one which semantically fits your purpose. For example jz
and je
is the same instruction, first alias stands for "jump when zero (flag)", the other is "jump when equal". So read through it and get used to those abbreviations, it will make the reading of disassembly somewhat easier.
//calls function 4
That's useless comment, that's obvious from the instruction itself. Would you add for example arguments (9, input_2), it would be of more benefit to the reader.
The next "compare" comment is of similar useless nature. What two numbers? If you already did track them down, you should write that result into comment, so something like "compares result of func4 with input_1".
I will try to show you as example the func4:
8048ce1 <func4>:
; allocates 0x1c bytes on stack for local variables
; (so return address is at [esp+0x1c] now, was at [esp])
8048ce1: 83 ec 1c sub $0x1c,%esp
; stores current ebx, esi and edi in [esp+0x10 / 0x14 / 0x18]
8048ce4: 89 5c 24 10 mov %ebx,0x10(%esp)
8048ce8: 89 74 24 14 mov %esi,0x14(%esp)
8048cec: 89 7c 24 18 mov %edi,0x18(%esp)
; esi = [esp+0x20] = first argument of func4(arg1, arg2)
8048cf0: 8b 74 24 20 mov 0x20(%esp),%esi
; ebx = [esp+0x24] = second argument of func4(arg1, arg2)
8048cf4: 8b 5c 24 24 mov 0x24(%esp),%ebx
; when esi(arg1) <= 0, jump to ExitWith0
8048cf8: 85 f6 test %esi,%esi
8048cfa: 7e 2b jle 8048d27 ExitWith0
; when esi(arg1) == 1, jump to ExitWithValueFromEbx
8048cfc: 83 fe 01 cmp $0x1,%esi
8048cff: 74 2b je 8048d2c ExitWithValueFromEbx
; store arg2 in [esp+4]
8048d01: 89 5c 24 04 mov %ebx,0x4(%esp)
; store (arg1-1) in [esp] (preparing args for recursive call)
8048d05: 8d 46 ff lea -0x1(%esi),%eax
8048d08: 89 04 24 mov %eax,(%esp)
; recursion: eax = func4(arg1-1, arg2) ; ebx/esi/edi preserved
8048d0b: e8 d1 ff ff ff call 8048ce1 <func4>
; edi = eax (result) + ebx*1 (arg2)
8048d10: 8d 3c 18 lea (%eax,%ebx,1),%edi
; [esp+4] is set to arg2 again
; (it's still there, but this asm looks like debug level of C, not very efficient)
8048d13: 89 5c 24 04 mov %ebx,0x4(%esp)
; esi -= 2 (no more original arg1 in esi), and set [esp+0]
8048d17: 83 ee 02 sub $0x2,%esi
8048d1a: 89 34 24 mov %esi,(%esp)
; second recursive call: func4(arg1-2, arg2)
8048d1d: e8 bf ff ff ff call 8048ce1 <func4>
; ebx = edi + eax*1 ; edi was arg2 + f4(arg1-1, arg2)
8048d22: 8d 1c 07 lea (%edi,%eax,1),%ebx
; so ebx = arg2 + f4(arg1-1, arg2) + f4(arg1-2, arg2), return that value
8048d25: eb 05 jmp 8048d2c ExitWithValueFromEbx
ExitWith0:
8048d27: bb 00 00 00 00 mov $0x0,%ebx
ExitWithValueFromEbx:
8048d2c: 89 d8 mov %ebx,%eax
; restore values of ebx/esi/edi (return value is in eax)
8048d2e: 8b 5c 24 10 mov 0x10(%esp),%ebx
8048d32: 8b 74 24 14 mov 0x14(%esp),%esi
8048d36: 8b 7c 24 18 mov 0x18(%esp),%edi
; restore esp value, so [esp] is return address, and return
8048d3a: 83 c4 1c add $0x1c,%esp
8048d3d: c3 ret
So I actually guessed in correctly in my answer, those ,1)
in lea confused me, I'm not used to AT&T syntax, so at a weaker moment I thought that's +1 to result, but it's *1 to index register (I'm used to Intel syntax, where it would look as lea ebx,[edi+eax]
).
But as you can see, once you start to take down the notes, and focus on single instruction only, I did manage to decipher it correctly this time.
At the moment it's important for you to make sure you understand every instruction, I mean every detail, like how that lea
works and why it doesn't read memory value, even if the argument is (...)
, and what are all possible addressing modes (how would it look if you would want to do the f(a, b) = 1 + b + f(a-1,b) + f(a-2,b)? Try to find that one, just one lea
has to be modified (any of those two)).
I'm not sure what literature you have for this, as I use Intel syntax everything (I think AT&T syntax is good for C compilers, but not so much for humans ... but overall it's not that bad, mostly annoying if you already know the other one, if you know only AT&T, it may be quite OK).
In worst case ask, although questions about instructions purpose are "low effort", as all the x86 manuals are freely available, but if you are not sure what some wording really means, and running such instruction few times in debugger doesn't help, you have to ask. Just add what you think and what words are confusing you.