pythontry-exceptkeyboardinterruptjpype

Ctrl-C ends my script but it is not caught by KeyboardInterrupt exception


I have a Python script that contains a big loop reading a file and doing some stuff (I am using several packages like urllib2, httplib2 or BeautifulSoup).

It looks like this :

try:
    with open(fileName, 'r') as file :
        for i, line in enumerate(file):
            try:
                # a lot of code
                # ....
                # ....
            except urllib2.HTTPError:
                print "\n >>> HTTPError"
            # a lot of other exceptions
            # ....
            except (KeyboardInterrupt, SystemExit):
                print "Process manually stopped"
                raise
            except Exception, e:
                print(repr(e))
except (KeyboardInterrupt, SystemExit):
    print "Process manually stopped"
    # some stuff

The problem is that the program stops when I hit Ctrl+C but it is not caught by any of my two KeyboardInterrupt exceptions though I am sure it is currently in the loop (and thus at least inside the big try/except).

How is that possible? At first I thought it was because one of the packages I'm using doesn't handle the exceptions correctly (like by using an "except:" only) but if it were the case, my script wouldn't stop. But the script DOES stop and it should be caught by at least one my two except, right?

Where am I wrong?

Thanks in advance!

EDIT:

With adding a finally: clause after the try-except and printing the traceback in both try-except blocks, it usually displays None when I hit Ctrl+C, but I once managed to get this (seems that it comes from urllib2, but I don't know if it is the reason why I can't catch a KeyboardInterrupt):

Traceback (most recent call last):

File "/home/darcot/code/Crawler/crawler.py", line 294, in get_articles_from_file
  content = Extractor(extractor='ArticleExtractor', url=url).getText()
File "/usr/local/lib/python2.7/site-packages/boilerpipe/extract/__init__.py", line 36, in __init__
  connection  = urllib2.urlopen(request)
File "/usr/local/lib/python2.7/urllib2.py", line 126, in urlopen
  return _opener.open(url, data, timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 391, in open
  response = self._open(req, data)
File "/usr/local/lib/python2.7/urllib2.py", line 409, in _open
  '_open', req)
File "/usr/local/lib/python2.7/urllib2.py", line 369, in _call_chain
  result = func(*args)
File "/usr/local/lib/python2.7/urllib2.py", line 1173, in http_open
  return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/lib/python2.7/urllib2.py", line 1148, in do_open
  raise URLError(err)
URLError: <urlopen error [Errno 4] Interrupted system call>

Solution

  • I already suggested in my comments to the question, that this problem is likely to be caused by the code section that is left out in the question. However, the exact code should not be relevant, as Python should normally throw a KeyboardInterrupt exception, when Python code gets interrupted by Ctrl-C.

    You mentioned in the comments that you use the boilerpipe Python package. This Python package uses JPype to create the language binding to Java... I can reproduce your problem with the following Python program:

    from boilerpipe.extract import Extractor
    import time
    
    try:
      for i in range(10):
        time.sleep(1)
    
    except KeyboardInterrupt:
      print "Keyboard Interrupt Exception"
    

    If you interrupt this program with Ctrl-C the exception is not thrown. It seems that the program is terminated immediately leaving the Python interpreter with no chance to throw the exception. When the import of boilerpipe is removed, the problem disappears...

    A debugging session with gdb indicates that a bulk amount of threads got started by Python if boilerpipe is imported:

    gdb --args python boilerpipe_test.py
    [...]
    (gdb) run
    Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
    warning: Could not load shared library symbols for linux-vdso.so.1.
    Do you need "set solib-search-path" or "set sysroot"?
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/usr/lib/libthread_db.so.1".
    [New Thread 0x7fffef62b700 (LWP 3840)]
    [New Thread 0x7fffef52a700 (LWP 3841)]
    [New Thread 0x7fffef429700 (LWP 3842)]
    [New Thread 0x7fffef328700 (LWP 3843)]
    [New Thread 0x7fffed99a700 (LWP 3844)]
    [New Thread 0x7fffed899700 (LWP 3845)]
    [New Thread 0x7fffed798700 (LWP 3846)]
    [New Thread 0x7fffed697700 (LWP 3847)]
    [New Thread 0x7fffed596700 (LWP 3848)]
    [New Thread 0x7fffed495700 (LWP 3849)]
    [New Thread 0x7fffed394700 (LWP 3850)]
    [New Thread 0x7fffed293700 (LWP 3851)]
    [New Thread 0x7fffed192700 (LWP 3852)]
    

    gdb session without the boilerpipe import:

    gdb --args python boilerpipe_test.py
    [...]
    (gdb) r
    Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
    warning: Could not load shared library symbols for linux-vdso.so.1.
    Do you need "set solib-search-path" or "set sysroot"?
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/usr/lib/libthread_db.so.1".
    ^C
    Program received signal SIGINT, Interrupt.
    0x00007ffff7529533 in __select_nocancel () from /usr/lib/libc.so.6
    (gdb) signal 2
    Continuing with signal SIGINT.
    Keyboard Interrupt Exception
    [Inferior 1 (process 3904) exited normally 
    

    So I assume that your Ctrl-C signal gets handled in a different thread or that jpype does other odd things that breaks the handling of Ctrl-C.

    EDIT: As a possible workaround you can register a signal handler that catches the SIGINT signal that the process receives when you hit Ctrl-C. The signal handler gets fired even if boilerpipe and JPype are imported. This way you will get notified when the user hits Ctrl-C and you will be able to handle that event at a central point in your program. You can terminate the script if you want to in this handler. If you don't, the script will continue running where it was interrupted once the signal handler function returns. See the example below:

    from boilerpipe.extract import Extractor
    import time
    import signal
    import sys
    
    def interuppt_handler(signum, frame):
        print "Signal handler!!!"
        sys.exit(-2) #Terminate process here as catching the signal removes the close process behaviour of Ctrl-C
    
    signal.signal(signal.SIGINT, interuppt_handler)
    
    try:
        for i in range(10):
            time.sleep(1)
    #    your_url = "http://www.zeit.de"
    #    extractor = Extractor(extractor='ArticleExtractor', url=your_url)
    except KeyboardInterrupt:
        print "Keyboard Interrupt Exception"