memset() does not work when called in GDB thru Python APIs

I'm trying to memset the data at some address in a gdb session.

Lets say it is initially filled with 1's and I'm trying to overwrite it with 0's.

(gdb) set $i = (int*)malloc(sizeof(int))

(gdb) set *$i = -1

(gdb) x/t $i

0x76d8550:  11111111111111111111111111111111

The data is not modified at all when I:

Make a gdb.Value out of memset function pointer and call it in Python with the right address;
Run ctypes' memset(), passing it the right address.

(gdb) pi memset = gdb.parse_and_eval("(void*(*)(void*,int,size_t))memset").dereference()

(gdb) pi str(memset)

'{void *(void *, int, size_t)} 0x7fffe992e760 <memset>'

(gdb) pi

>>> i = 0x76d8550

>>> memset(i,0,4)

<gdb.Value object at 0x7fdc5c0fbef0>

>>> gdb.execute("x/t $i")

0x76d8550:  11111111111111111111111111111111

>>> import ctypes

>>> ctypes.memset(i,0,4)

124618064

>>> gdb.execute("x/t $i")

0x76d8550:  11111111111111111111111111111111

The data is modified as expected when I:

Evaluate a string with the complete memset() expression using gdb.parse_and_eval().

>>> gdb.parse_and_eval("(void*)memset({},0,4)".format(i))

<gdb.Value object at 0x7fdb9eef58b0>

>>> gdb.execute("x/t $i")

0x76d8550:  00000000000000000000000000000000

Any explanation on why the 1st two options aren't working?

Thanks

Solution

Judging by the addresses you get printed you are probably running on Linux/x86_64, and possibly using GLIBC as your standard C library. If so ...

... memset is complicated.

First, there are two separate implementations of memset -- a minimal one in ld-linux.so and a full-function one inside libc.so.6.

Second, the full implementation in libc.so.6 is a GNU IFUNC, which means that it doesn't itself write to memory, it just returns the address of the function that should be used to write to memory on a given processor.

Lastly, as sbssa commented, ctype.memset() can't possibly work, because that's a memset that is within the GDB itself, not the memset in the inferior (being debugged) process. By calling ctypes.memset(i,0,4) you are corrupting a random location within GDB. This could result in anything, starting from "no effect" (if the corrupted address was unused but valid) to having the expression immediately crash (if the "to be corrupted" address is invalid) to a random crash in GDB later (if that corrupted address is actually used by GDB for something).

Putting this all together:

#include <string.h>

int jj = -1;
int main()
{
  memset(&jj, 0, sizeof(jj)); return 0;
  return 0;
}

Compiled with gcc -g x.c, and running under GDB on Fedora 38 x86_64:

gdb -q ./a.out
Reading symbols from ./a.out...
(gdb) start
Temporary breakpoint 1 at 0x40112a: file x.c, line 6.
Starting program: /tmp/a.out

Temporary breakpoint 1, main () at x.c:6
6         memset(&jj, 0, sizeof(jj)); return 0;
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.37-10.fc38.x86_64
(gdb) p &memset
$1 = (void (*)(void)) 0x7ffff7fec530 <memset>

(gdb) pi memset = gdb.parse_and_eval("(void*(*)(void*,int,size_t))memset").dereference()
(gdb) pi print(str(memset))
{void *(void *, int, size_t)} 0x7ffff7fec530 <memset>

(gdb) info sym 0x7ffff7fec530
memset in section .text of /lib64/ld-linux-x86-64.so.2

Here you can see that GDB got the wrong memset (the minimal implementation). It would still work, but is suboptimal and may have other restrictions -- it was never intended to be used outside of ld-linux itself. For example, it may assume that the buffer is 8-byte aligned, or that the size is at least 8 bytes, etc.

What about the real memset that is called in main?

(gdb) b memset
Breakpoint 2 at 0x7ffff7e7cf60 (2 locations)
(gdb) c
Continuing.

Breakpoint 2.1, 0x00007ffff7e7cf60 in memset_ifunc () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7e7cf60 in memset_ifunc () from /lib64/libc.so.6
#1  0x00007ffff7fdac42 in elf_ifunc_invoke (addr=<optimized out>) at ../sysdeps/x86_64/dl-irel.h:32
#2  _dl_fixup (l=0x7ffff7ffe2d0, reloc_arg=<optimized out>) at dl-runtime.c:125
#3  0x00007ffff7fdcf3e in _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:130
#4  0x000000000040113e in main () at x.c:6

Note that this is the IFUNC` I was talking about.

(gdb) fin
Run till exit from #0  0x00007ffff7e7cf60 in memset_ifunc () from /lib64/libc.so.6
0x00007ffff7fdac42 in _dl_fixup (l=0x7ffff7ffe2d0, reloc_arg=<optimized out>) at dl-runtime.c:125
125     dl-runtime.c: No such file or directory.
(gdb) p/x $rax
$2 = 0x7ffff7f37950
(gdb) info sym 0x7ffff7f37950
__memset_avx2_unaligned in section .text of /lib64/libc.so.6

The __memset_avx2_unaligned is the actual memset implementation selected for this host (out of several possible; the other possibilities in this build of GLIBC are: __memset_erms, __memset_avx2_unaligned_erms, __memset_evex_unaligned, __memset_evex_unaligned_erms).

Note that even though we've already returned from "memset", the value of jj is still -1:

(gdb) x/t &jj
0x40400c <jj>:  11111111111111111111111111111111
(gdb) watch -l jj
Hardware watchpoint 3: -location jj
(gdb) c
Continuing.

Hardware watchpoint 3: -location jj

Old value = -1
New value = 0
0x00007ffff7f37ae4 in __memset_avx2_unaligned_erms () from /lib64/libc.so.6

P.S. Why is the value changed by __memset_avx2_unaligned_erms() and not by __memset_avx2_unaligned() ?

Because the latter uses "tail call" to the latter:

(gdb) disas __memset_avx2_unaligned
Dump of assembler code for function __memset_avx2_unaligned:
   0x00007ffff7f37950 <+0>:     endbr64
   0x00007ffff7f37954 <+4>:     vmovd  %esi,%xmm0
   0x00007ffff7f37958 <+8>:     mov    %rdi,%rax
   0x00007ffff7f3795b <+11>:    cmp    $0x20,%rdx
   0x00007ffff7f3795f <+15>:    jb     0x7ffff7f37aa0 <__memset_avx2_unaligned_erms+224>
   0x00007ffff7f37965 <+21>:    vpbroadcastb %xmm0,%ymm0
   0x00007ffff7f3796a <+26>:    cmp    $0x40,%rdx
   0x00007ffff7f3796e <+30>:    ja     0x7ffff7f37a09 <__memset_avx2_unaligned_erms+73>
   0x00007ffff7f37974 <+36>:    vmovdqu %ymm0,-0x20(%rdi,%rdx,1)
   0x00007ffff7f3797a <+42>:    vmovdqu %ymm0,(%rdi)
   0x00007ffff7f3797e <+46>:    vzeroupper
   0x00007ffff7f37981 <+49>:    ret