pythonemacsetags

Python in Emacs: Jump to the definition of a global constant


After creating a TAGS file for my project (find . -name "*.py" | xargs etags) I can use M-. to jump to the definition of a function. That's great. But if I want the definition of a global constant -- say, x = 3 -- Emacs does not know where to find it.

Is there any way to explain to Emacs where constants, not just functions, are defined? I don't need this for anything defined within a function (or a for-loop or whatnot), just global ones.

More detail

Previous incarnations of this question used "top-level" instead of "global", but with @Thomas's help I realized that's imprecise. What I meant by a global definition is anything a module defines. Thus in

import m

if m.foo:
  def f():
    x = 3
    return x
  y, z = 1, 2
else:
  def f():
    x = 4
    return x
  y, z = 2, 3
del(z)

the things defined by the module are f and y, despite the sites of those definitions being indented to the right. x is a local variable, and z's definition is deleted before the end of the module.

I believe that a sufficient rule to capture all global assignments would be to simply ignore them inside def expressions (noting that the def keyword itself might be indented at any level) and otherwise parse for any symbol to the left of = (noting that there might be more than one, because Python supports tuple assignments).


Solution

  • Etags does not seem to be able to produce such information for Python files which you can easily verify by running it on a trivial test file:

    x = 3
    
    def fun():
        pass
    

    Running etags test.py produces a TAGS file with the following contents:

    /tmp/test.py,13
    def fun(3,7
    

    As you can see, x is completely absent in this file, so Emacs has no chance of finding it.

    Invoking etags' man page informs us that there is an option --globals:

       --globals
              Create tag entries for global variables in  Perl  and  Makefile.
              This is the default in C and derived languages.
    

    However, this seems to be one of those sad cases where the documentation is out of sync with the implementation, as this option does not seem to exist. (etags -h does not list it either, only --no-globals - probably because --globals is the default, as it says above.)

    However, even if --globals is the default, the documenation snippet says it applies only to Perl, Makesfiles, C, and derived languages. We can check whether this is the case by creating another trivial test file, this time for C:

    int x = 3;
    
    void fun() {
    }
    

    And indeed, running etags test.c produces the following TAGS file:

    /tmp/test.c,26
    int x 1,0
    void fun(3,12
    

    You see that x is correctly identified for C. So it seems that global variables are simply not supported by etags for Python.

    However, because of Python's use of whitespace, it is not too hard to identify global variable definitions in source files - you can basically grep for all lines that don't start with whitespace but contain a = sign (of course, there are exceptions).

    So, I wrote the following script to do that, which you can use as a drop-in replacement for etags, as it calls etags internally:

    #!/bin/bash
    
    # make sure that some input files are provided, or else there's
    # nothing to parse
    if [ $# -eq 0 ]; then
        # the following message is just a copy of etags' error message
        echo "$(basename ${0}): no input files specified."
        echo "  Try '$(basename ${0}) --help' for a complete list of options."
        exit 1
    fi
    
    # extract all non-flag parameters as the actual filenames to consider
    TAGS2="TAGS2"
    argflags=($(etags -h | grep '^-' | sed 's/,.*$//' | grep ' ' | awk '{print $1}'))
    files=()
    skip=0 
    for arg in "${@}"; do
        # the variable 'skip' signals arguments that should not be
        # considered as filenames, even though they don't start with a
        # hyphen
        if [ ${skip} -eq 0 ]; then
            # arguments that start with a hyphen are considered flags and
            # thus not added to the 'files' array
            if [ "${arg:0:1}" = '-' ]; then
                if [ "${arg:0:9}" = "--output=" ]; then
                    TAGS2="${arg:9}2"
                else
                    # however, since some flags take a parameter, we also
                    # check whether we should skip the next command line
                    # argument: the arguments for which this is the case are
                    # contained in 'argflags'
                    for argflag in ${argflags[@]}; do
                        if [ "${argflag}" = "${arg}" ]; then
                            # we need to skip the next 'arg', but in case the
                            # current flag is '-o' we should still look at the
                            # next 'arg' so as to update the path to the
                            # output file of our own parsing below
                            if [ "${arg}" = "-o" ]; then
                                # the next 'arg' will be etags' output file
                                skip=2                  
                            else
                                skip=1
                            fi
                            break
                        fi
                    done
                fi
            else
                files+=("${arg}")
            fi
        else
            # the current 'arg' is not an input file, but it may be the
            # path to the etags output file
            if [ "${skip}" = 2 ]; then
                TAGS2="${arg}2"
            fi
            skip=0
        fi
    done
    
    # create a separate TAGS file specifically for global variables
    for file in "${files[@]}"; do
        # find all lines that are not indented, are not comments or
        # decorators, and contain a '=' character, then turn them into
        # TAGS format, except that the filename is prepended
        grep -P -Hbn '^[^[# \t].*=' "${file}" | sed -E 's/([0-9]+):([0-9]+):([^= \t]+)\s*=.*$/\3\x7f\1,\2/'
    done |\
    
    # count the bytes of each entry - this is needed for the TAGS
    # specification
    while read line; do
        echo "$(echo $line | sed 's/^.*://' | wc -c):$line"
    done |\
    
    # turn the information above into the correct TAGS file format
    awk -F: '
        BEGIN { filename=""; numlines=0 }
        { 
            if (filename != $2) {
                if (numlines > 0) {
                    print "\x0c\n" filename "," bytes+1
    
                    for (i in lines) {
                        print lines[i]
                        delete lines[i]
                    }
                }
    
                filename=$2
                numlines=0
                bytes=0
            }
    
            lines[numlines++] = $3;
            bytes += $1;
        }
        END {
            if (numlines > 0) {
                print "\x0c\n" filename "," bytes+1
    
                for (i in lines)
                    print lines[i]
            }
        }' > "${TAGS2}"
    
    # now run the actual etags, instructing it to include the global
    # variables information
    if ! etags -i "${TAGS2}" "${@}"; then
        # if etags failed to create the TAGS file, also delete the TAGS2
        # file
        /bin/rm -f "${TAGS2}"
    fi
    

    Store this script on your $PATH using a convenient name (I suggest sth. like etags+) and then call it like so:

    find . -name "*.py" | xargs etags+
    

    Besides creating a TAGS file, the script also creates a TAGS2 file for all global variable definitions, and adds a line to the original TAGS file that references the latter.

    From the perspective of Emacs, there's not difference in usage.