I'm learning about buffer overflows because I have an exam on it tomorrow. I've been following this guide, and I'm currently on the step where I'm using immunity debugger to look for badchars. However, a weird problem is occurring where after I get to the hex number "7F", where for some reason every second number appears to be "C2".
I'm using a script which is slightly modified from the guide since I'm using python3 instead of python2. That script looks like this:
import socket
ip = "192.168.10.136"
port = 31337
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ip, port))
offset = 146
eip = "B" * 4
allchars = "\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
buffer = "A" * offset + eip + allchars
buffer += "\n"
s.sendall(buffer.encode('utf-8'))
I couldn't find anything online, it was a kind of hard problem to troubleshoot, especially since I'm not super experienced with python3 or immunity debugger. Any help is very appreciated, and I'll do my best to answer any questions if there's any important information I've forgotten to include.
Comment promoted to answer at OP's request.
As @Sören points out, you have encoded your original data as UTF-8. Your original data goes from byte value 01
to FF
which, because it is in a string, represents the Unicode codepoints U+0001
to U+00FF
.
But the values in the second half of that block, Unicode codepoints U+0080
to U+00FF
, are represented in UTF-8 as two bytes. So when you encoded the original data 0x80
0x81
etc as UTF-8 you got the UTF-8 2-byte representations C2 80
C2 81
etc.
To fix: Make allchars
a bytestring b"..."
and use it as is without encoding.
Unicode is a 21-bit system that can accommodate a theoretical 1,114,112 different codepoints (though only 144,697 of them were assigned as of Unicode 14.0, in 2021). UTF-8 does represent some codepoints as a single byte: essentially, the 127 characters of the ASCII character set from 1963. But it follows from number of codepoints that any representation in 8-bit units will be variable width, with some codepoints occupying two, three or even four bytes.