I have a Python script that contains a big loop reading a file and doing some stuff (I am using several packages like urllib2, httplib2 or BeautifulSoup).
It looks like this :
try:
with open(fileName, 'r') as file :
for i, line in enumerate(file):
try:
# a lot of code
# ....
# ....
except urllib2.HTTPError:
print "\n >>> HTTPError"
# a lot of other exceptions
# ....
except (KeyboardInterrupt, SystemExit):
print "Process manually stopped"
raise
except Exception, e:
print(repr(e))
except (KeyboardInterrupt, SystemExit):
print "Process manually stopped"
# some stuff
The problem is that the program stops when I hit Ctrl+C but it is not caught by any of my two KeyboardInterrupt exceptions though I am sure it is currently in the loop (and thus at least inside the big try/except).
How is that possible? At first I thought it was because one of the packages I'm using doesn't handle the exceptions correctly (like by using an "except:" only) but if it were the case, my script wouldn't stop. But the script DOES stop and it should be caught by at least one my two except, right?
Where am I wrong?
Thanks in advance!
EDIT:
With adding a finally:
clause after the try-except and printing the traceback in both try-except blocks, it usually displays None
when I hit Ctrl+C, but I once managed to get this (seems that it comes from urllib2, but I don't know if it is the reason why I can't catch a KeyboardInterrupt):
Traceback (most recent call last):
File "/home/darcot/code/Crawler/crawler.py", line 294, in get_articles_from_file
content = Extractor(extractor='ArticleExtractor', url=url).getText()
File "/usr/local/lib/python2.7/site-packages/boilerpipe/extract/__init__.py", line 36, in __init__
connection = urllib2.urlopen(request)
File "/usr/local/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/local/lib/python2.7/urllib2.py", line 409, in _open
'_open', req)
File "/usr/local/lib/python2.7/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/local/lib/python2.7/urllib2.py", line 1173, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/lib/python2.7/urllib2.py", line 1148, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 4] Interrupted system call>
I already suggested in my comments to the question, that this problem is likely to be caused by the code section that is left out in the question. However, the exact code should not be relevant, as Python should normally throw a KeyboardInterrupt
exception, when Python code gets interrupted by Ctrl-C.
You mentioned in the comments that you use the boilerpipe
Python package. This Python package uses JPype
to create the language binding to Java... I can reproduce your problem with the following Python program:
from boilerpipe.extract import Extractor
import time
try:
for i in range(10):
time.sleep(1)
except KeyboardInterrupt:
print "Keyboard Interrupt Exception"
If you interrupt this program with Ctrl-C the exception is not thrown. It seems that the program is terminated immediately leaving the Python interpreter with no chance to throw the exception. When the import of boilerpipe
is removed, the problem disappears...
A debugging session with gdb
indicates that a bulk amount of threads got started by Python if boilerpipe
is imported:
gdb --args python boilerpipe_test.py
[...]
(gdb) run
Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffef62b700 (LWP 3840)]
[New Thread 0x7fffef52a700 (LWP 3841)]
[New Thread 0x7fffef429700 (LWP 3842)]
[New Thread 0x7fffef328700 (LWP 3843)]
[New Thread 0x7fffed99a700 (LWP 3844)]
[New Thread 0x7fffed899700 (LWP 3845)]
[New Thread 0x7fffed798700 (LWP 3846)]
[New Thread 0x7fffed697700 (LWP 3847)]
[New Thread 0x7fffed596700 (LWP 3848)]
[New Thread 0x7fffed495700 (LWP 3849)]
[New Thread 0x7fffed394700 (LWP 3850)]
[New Thread 0x7fffed293700 (LWP 3851)]
[New Thread 0x7fffed192700 (LWP 3852)]
gdb
session without the boilerpipe
import:
gdb --args python boilerpipe_test.py
[...]
(gdb) r
Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7529533 in __select_nocancel () from /usr/lib/libc.so.6
(gdb) signal 2
Continuing with signal SIGINT.
Keyboard Interrupt Exception
[Inferior 1 (process 3904) exited normally
So I assume that your Ctrl-C signal gets handled in a different thread or that jpype
does other odd things that breaks the handling of Ctrl-C.
EDIT: As a possible workaround you can register a signal handler that catches the SIGINT
signal that the process receives when you hit Ctrl-C. The signal handler gets fired even if boilerpipe
and JPype
are imported. This way you will get notified when the user hits Ctrl-C and you will be able to handle that event at a central point in your program. You can terminate the script if you want to in this handler. If you don't, the script will continue running where it was interrupted once the signal handler function returns. See the example below:
from boilerpipe.extract import Extractor
import time
import signal
import sys
def interuppt_handler(signum, frame):
print "Signal handler!!!"
sys.exit(-2) #Terminate process here as catching the signal removes the close process behaviour of Ctrl-C
signal.signal(signal.SIGINT, interuppt_handler)
try:
for i in range(10):
time.sleep(1)
# your_url = "http://www.zeit.de"
# extractor = Extractor(extractor='ArticleExtractor', url=your_url)
except KeyboardInterrupt:
print "Keyboard Interrupt Exception"