pythonbashsubprocesszgrep

subprocess.check_output(), zgrep, and match limit


Context: I'm trying to find a github repository of a python package. To do that, I'm zgrep'ping package archive for github urls. And it works fine until I limit output by 1 result:

# works, returns a lot of results
subprocess.check_output(["zgrep", "-oha", "github", 'Django-1.10.1.tgz'])  #  works, a lot of results
# add -m1 to limit output, returns status 2 (doesn't work)
subprocess.check_output(["zgrep", "-m1", "-oha", "github", 'Django-1.10.1.tgz'])  #  works, a lot of results
# same command, different file - works
subprocess.check_output(["zgrep", "-m1", "-oha", "github", 'grabber.py'])

From the command line, all three commands work fine. Any ideas?

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['zgrep', '-m1', '-oha', 'github', 'pkgs/Django-1.10.1.tar.gz']' returned non-zero exit status 2

Command line:

$ zgrep -m1 -oha "github.com/[^/]\+/django" pkgs/Django-1.10.1.tar.gz
github.com/django/django

Solution

  • So, the reason is: zgrep is a shell script, which simply pipes the archive through gzip and egrep. If we limit number of results, egrep terminates the pipe, so gzip exits and complaints. In a console we never see it, but subprocess somehow catches this signal and raises an exception.

    Solution: write mini-version of zgrep that doesn't complain

    gunzip < $FILE 2> /dev/null | egrep -m1 -ohia $PATTERN