I am trying to build a pintool that should be able to instrument an open()
syscall that targets a specific file/directory and replace the file path argument with another value.
For example, here is a very simple code that I want to instrument:
#include <iostream>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
using namespace std;
int main(int argc, char **argv)
{
int i = open("/home/preet_derasari/important.txt", O_RDONLY);
cout << "fid: " << i << endl;
}
In this example I want Pin to change the file path from /home/preet_derasari/important.txt
to /home/preet_derasari/dummy.txt
.
In order to do this I wrote a very simple pintool after referring to some example pintools and Pin APIs:
#include "pin.H"
#include <iostream>
#include <fstream>
#include <syscall.h>
#include <string>
using namespace std;
INT32 Usage()
{
cout << "This tool prints out the number of dynamically executed " << endl
<< "instructions, basic blocks and threads in the application." << endl
<< endl;
cout << KNOB_BASE::StringKnobSummary() << endl;
return -1;
}
void SyscallEntry(THREADID threadIndex, CONTEXT *ctxt, SYSCALL_STANDARD std, void *v)
{
ADDRINT sysNum = PIN_GetSyscallNumber(ctxt, std);
cout << "entered syscall: " << sysNum << endl;
if(sysNum == SYS_open)
{
cout << "open encountered!" << endl;
char *path = (char *)PIN_GetSyscallArgument(ctxt, std, 0);
cout << "Original File Path: " << path << endl;
int match = strcmp((char *)PIN_GetSyscallArgument(ctxt, std, 0), "/home/preet_derasari/important.txt");
if(!match)
{
string pathDummy = "/home/preet_derasari/dummy.txt";
PIN_SetSyscallArgument (ctxt, std, 0, (ADDRINT) pathDummy.c_str());
cout << "Dummy File Path: " << pathDummy << endl;
}
}
}
int main(int argc, char* argv[])
{
cout << "Open Syscall Value: " << SYS_open << endl;
if (PIN_Init(argc, argv))
{
return Usage();
}
cout << "===============================================" << endl;
cout << "This application is instrumented by MyPinTool" << endl;
cout << "===============================================" << endl;
PIN_AddSyscallEntryFunction(SyscallEntry, 0);
// Start the program, never returns
PIN_StartProgram();
return 0;
}
I run the pintool with this command: ../../../pin -t obj-intel64/MY_pin.so -- test
where MY_pin.so
is the pintool shared object library and test is the sample code mentioned above.
The output just baffles me because Pin is instrumenting all syscalls except open:
Open Syscall Value: 2
===============================================
This application is instrumented by MyPinTool
===============================================
entered syscall: 12
entered syscall: 158
entered syscall: 21
entered syscall: 257
entered syscall: 5
entered syscall: 9
entered syscall: 3
entered syscall: 257
entered syscall: 0
entered syscall: 17
entered syscall: 17
entered syscall: 17
entered syscall: 5
entered syscall: 9
entered syscall: 17
entered syscall: 17
entered syscall: 17
entered syscall: 9
entered syscall: 9
entered syscall: 9
entered syscall: 9
entered syscall: 9
entered syscall: 3
entered syscall: 158
entered syscall: 10
entered syscall: 10
entered syscall: 10
entered syscall: 11
entered syscall: 12
entered syscall: 12
entered syscall: 257
entered syscall: 5
entered syscall: 9
entered syscall: 3
entered syscall: 3
As you can see pin instruments all syscalls except open
i.e., syscall number 2 (based on x86_64
ISA).
An interesting observation is that the program doesn't output the cout
from my test program (cout << "fid: " << i << endl;
) which makes me question if Pin is doing something weird with the open syscall?
Specifications:
Can someone please help me understand why this is happening?
strace cat foo
shows you that programs don't use the old open(2)
system call anymore:
...
openat(AT_FDCWD, "foo", O_RDONLY) = 3
...
__NR_openat
is 257, which your PIN tool observed 3 times. Apparently even the open()
libc wrapper function internally uses the openat
Linux system call. (The __NR_open = 2
system call does still work; the kernel also has code to pass its args to the current implementation. IDK which is more efficient, like maybe it just sets up an AT_FDCWD
arg and calls sys_openat()
which has to decode it again, just like glibc does in user-space.)
The open(2) man page also documents openat(2).
The dirfd argument is used in conjunction with the pathname argument as follows:
If the pathname given in pathname is absolute, then dirfd is ignored.
If the pathname given in pathname is relative and dirfd is the special value
AT_FDCWD
, then pathname is interpreted relative to the current working directory of the calling process (like open())....
openat
/ linkat
and so on, when used with an fd
from open(O_DIRECTORY)
, let programs like find
avoid TOCTOU races, and/or let multi-threaded programs avoid having to actually chdir
(because there's only one CWD per process, not per thread.)
Using them with AT_FDCWD
has no advantage or disadvantage vs. old-style open(2)
.