clinuxbashsymlinkhashbang

How to find full path to interpreter binary in C in linux if script uses #!/usr/bin/env foo


I'm writing new script interpreter in C in linux. It is in ~/foo directory:

~/foo          - directory
~/foo/foo.c    - source
~/foo/foo      - compiled C binary
~/foo/data.txt - file with data the interpreter will need

I want to be able to run interpreter from command line so I put symlink in ~/bin (which I have in PATH):

~/bin/foo      - symlink to "~/foo/foo" binary

This works, when I type "foo" in terminal it runs the interpreter. Now I want to be able to run scripts using "#!/usr/bin/env foo" hashbang. I wrote this executable script in ~/test.fs:

#!/usr/bin/env foo
some foo code

When I run the script using ./test.fs it works, it is executed using foo interpreter. However in interpreter itself if I print argv[0] it contains just "foo". I tried to use realpath(argv[0], ...) function however it does not find the real path:

char resolved_path[MAXPATHLEN];
if(realpath(argv[0], resolved_path)) {
    printf("resolved_path=%s\n", resolved_path);
}

realpath("foo", resolved_path) does not return true because "foo" is in ~/bin but I am in other directory when I'm running the script.

In interpreter I simply need to find absolute path to "foo" binary even if it was executed via /usr/bin/env foo and ~/bin/foo is symlink.

If I knew the absolute path to ~/bin/foo I could use readlink to find target of that symlink but I don't know how to get from "foo" to "~/bin/foo".

Here is source of interpreter:

#include <stdio.h>
#include <sys/param.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
    printf("argv[0]=%s\n", argv[0]);
    char resolved_path[MAXPATHLEN];
    if(realpath(argv[0], resolved_path)) {
        printf("resolved_path=%s\n", resolved_path);
    }
    printf("actual interpreter stuff goes here...\n");
    return 0;
}

Summary:

  1. read argv[0] to be "foo"
  2. find that executed file was actually "~/bin/foo"
  3. if it is symlink find target
  4. use realpath to find absolute path of binary

I am stuck at point 2.

I cannot put data to /usr/share/... or some other directory because this is still in development and I constantly need to edit files and I need multiple versions at the same time so the best way is to have data alongside the binary. Each version will have data next to the binary, this way I can have ~/foo1/foo, ~/foo2/foo, and so on. ~/foo1/foo will read ~/foo1/data.txt and ~/foo2/foo will read ~/foo2/data2.txt.


Solution

  • how to get from "foo" to "~/bin/foo".

    On Linux, readlink("/proc/$PID/exe") is a symlink to the executable location.

    If not on Linux, you would getenv("PATH") then split on : and for each directory check if dir/foo exists. The first one you find is the one.

    Remember to handle errors. Not only the executable file might not exist itself, also argv[0] may be NULL or empty or be anything.

    You are asking XY question. The question you did not ask, but I guess you are interested in:

    How should you tell your executable where to find data.txt?

    You should not store data in PATH, only executables. Instead, your foo should take a command line parameter or read an environment variable (or both) that points to data.txt that you want to use, with a default inferred from /usr standard directory location. See endless command line examples, like vim -u file or tmux -f file or docker --config string or virtually any other command line tool.

    You could then create a shell script in ~/bin to just execute foo -c ~/foo/data.txt "$@".

    Some tools take a compile time parameter to specify the default location of its shared data. Like cmake has CMAKE_INSTALL_PREFIX or autotools does with --prefix to specify the default installation prefix. You would take this prefix and configure the default program shared data location to the one specified by the user at ~/foo/ at compile time.