Last modified: 2014-02-02 19:33:59 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1657/ Reported by: valhallasw Created on: 2013-08-22 15:14:10.237000 Subject: core hangs on killed http(s) connection Original description: Steps to reproduce: >>> p = pywikibot.Page(pywikibot.getSite('nl', 'wikipedia'), 'Wikipedia') >>> p.get() # now kill the connection, e.g. using tcpview in windows >>> p = pywikibot.Page(pywikibot.getSite('nl', 'wikipedia'), 'Wikipedia') >>> p.get() # hangs Expected result: pywikibot reconnects and executes the .get() Actual result: complete hang (deadlock?)
Stack trace during hang: *** STACKTRACE - START *** # ThreadID: 140208457426688 File: "/usr/lib/python2.7/threading.py", line 524, in __bootstrap self.__bootstrap_inner() File: "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File: "connectionkilltest.py", line 16, in run for filename, lineno, name, line in traceback.extract_stack(stack): # ThreadID: 140208472311552 File: "/usr/lib/python2.7/threading.py", line 524, in __bootstrap self.__bootstrap_inner() File: "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File: "pywikibot/comms/threadedhttp.py", line 343, in run item.data = self.http.request(*item.args, **item.kwargs) File: "pywikibot/comms/threadedhttp.py", line 219, in request max_redirects, connection_type File: "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1444, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File: "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1196, in _request (response, content) = self._conn_request(conn, request_uri, method, body, headers) File: "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1166, in _conn_request response = conn.getresponse() File: "/usr/lib/python2.7/httplib.py", line 1030, in getresponse response.begin() File: "/usr/lib/python2.7/httplib.py", line 407, in begin version, status, reason = self._read_status() File: "/usr/lib/python2.7/httplib.py", line 365, in _read_status line = self.fp.readline() File: "/usr/lib/python2.7/socket.py", line 430, in readline data = recv(1) File: "/usr/lib/python2.7/ssl.py", line 241, in recv return self.read(buflen) File: "/usr/lib/python2.7/ssl.py", line 160, in read return self._sslobj.read(len) # ThreadID: 140208510514944 File: "pwb.py", line 103, in <module> run_python_file(fn, args) File: "pwb.py", line 49, in run_python_file exec compile(source, filename, "exec") in main_mod.__dict__ File: "connectionkilltest.py", line 36, in <module> p.put_async('test edit') File: "pywikibot/page.py", line 887, in put_async async=True, callback=callback, **kwargs) File: "pywikibot/page.py", line 872, in put async=async, callback=callback, **kwargs) File: "pywikibot/page.py", line 782, in save if not force and not self.botMayEdit(): File: "pywikibot/page.py", line 710, in botMayEdit username = self.site.user() File: "pywikibot/site.py", line 206, in user if self.logged_in(True): File: "pywikibot/site.py", line 842, in logged_in if sysop and 'sysop' not in self.userinfo['groups']: File: "pywikibot/site.py", line 910, in getuserinfo uidata = uirequest.submit() File: "pywikibot/data/api.py", line 284, in submit body=paramstring) File: "pywikibot/comms/http.py", line 118, in request request.lock.acquire() File: "/usr/lib/python2.7/threading.py", line 323, in acquire self.__cond.wait() File: "/usr/lib/python2.7/threading.py", line 243, in wait waiter.acquire() *** STACKTRACE - END *** So: probably not a deadlock, just the ssl library hanging.
- **Priority**: 5 --> 2
Works without any issues in windows - could be either related to linux (v. windows) or killing the connection locally or upstream (the original was run from a VM). Reducing priority.
- **Group**: confirmed --> rewrite
I do not know whether this is similar this bug reported above and am unable to investigate the reason of that bug. I only found the last output where the bot checks a wiki page every 15 seconds: defendant: Textkontrolle accuser: Neozoon time: 2013 Okt 20 21:26 >> 05:20:37: defendant: Textkontrolle accuser: Neozoon time: 2013 Okt 20 21:26 >> 05:20:54: ... a lot of lines deleted ... >> 05:42:19: >> 05:42:33: ... >> 05:53:41: >> 05:53:55: >> 05:54:10: >> 05:54:24: >> 05:54:38: >> 05:54:53: >> 05:55:07: >> 05:55:21: now it stops, CTL-C does not work, I have to kill the process via task manager. I checked my router and found it had an autoreset of the internet connection 5 minutes before 6 a.m. this morning and I guess this was also yesterday. On the other hand the same bot worked well the three days before.
Additional comment: pwb.py version Pywikibot: pywikibot/__init__.py (r-1 (unknown), ???????, 2013/09/21, 15:53:03, OUTDATED) Release version: 2.0b1 Python: 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] unicode test: ok
Yes my bug depends on a killed internet connection while my router was off for 20 seconds!
Here are the last log entries before the bot hanged: 2013-11-20 05:50:37 api.py, 685 in __iter__: DEBUG PropertyGenerator received [u'7165976']; limit=-1 2013-11-20 05:50:51 vm-auto-erl-neu.py, 79 in output: INFO >> 05:50:51: 2013-11-20 05:50:51 threadedhttp.py, 97 in pop_connection: DEBUG Retrieved connection from 'https:de.wikipedia.org' pool. 2013-11-20 05:50:51 threadedhttp.py, 215 in request: DEBUG ('https://de.wikipedia.org/w/api.php', 'POST', 'maxlag=5000&format=json&bkprop=user%7Cby%7Ctimestamp%7Cexpiry%7Creason&list=blocks&bkusers=&meta=userinfo&action=query&bkend=20131120005959&uiprop=blockinfo%7Chasmsg', {'cookie': 'centralauth_Token=8c1df3ec61f7a92677f4c0b561c30585; centralauth_User=Xqbot; forceHTTPS=1; forceHTTPS=true; dewikiUserName=Xqbot; dewikiSession=3144518b40881659814792e3c0fc34cc; dewikiUserID=627628', 'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': 'pwb/r-1 (unknown) Pywikipediabot/2.0'}, 5, None) 2013-11-20 05:50:51 api.py, 308 in submit: DEBUG API response received: {"query":{"blocks":[],"userinfo":{"id":627628,"name":"Xqbot"}}} 2013-11-20 05:50:51 api.py, 611 in update_limit: DEBUG PropertyGenerator: Set query_limit to 5000. 2013-11-20 05:50:51 threadedhttp.py, 97 in pop_connection: DEBUG Retrieved connection from 'https:de.wikipedia.org' pool. 2013-11-20 05:50:51 threadedhttp.py, 215 in request: DEBUG ('https://de.wikipedia.org/w/api.php', 'POST', 'maxlag=5000&format=json&rvprop=ids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Ccontent&prop=info%7Crevisions&titles=Wikipedia%3AVandalismusmeldung&meta=userinfo&indexpageids=&action=query&uiprop=blockinfo%7Chasmsg', {'cookie': 'centralauth_Token=8c1df3ec61f7a92677f4c0b561c30585; centralauth_User=Xqbot; forceHTTPS=1; forceHTTPS=true; dewikiUserName=Xqbot; dewikiSession=3144518b40881659814792e3c0fc34cc; dewikiUserID=627628', 'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': 'pwb/r-1 (unknown) Pywikipediabot/2.0'}, 5, None)
Change 108325 had a related patch set uploaded by Ladsgroup: [BUGFIX] fixing bug 55127 https://gerrit.wikimedia.org/r/108325
Minimal test case for the SSL issue: import httplib2 h = httplib2.Http() resp, content = h.request("https://en.wikipedia.org/", "GET") raw_input("kill the connection...") resp, content = h.request("https://en.wikipedia.org/", "GET") print "done" So there are three issues: - A connection can hang indefinitely if the connection is killed. There should be some timeout for receiving some data here. - The framework does not respond to ctrl-c when a thread doesn't return. This can be solved by changing pywikibot/comms/http.py to be non-blocking (e.g. in a loop with a 100ms time.sleep) - Threads do not receive KeyboardInterrupts (which is OK!) but this means a non-returning call will hang forever (i.e. will not be killed when python exits) Amirs solution might be useful for the last issue, but I'm not 100% sure about that. I think I'd rather hook the atexit handler to kill all processing threads. Maybe we can also start threads as daemons, which should kill them automatically when the main thread exits...
I tried this on _windows_ but I didn't got it hanging: Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> resp, content = h.request("https://en.wikipedia.org/", "GET") File "C:\Pywikipedia\ssh\pywikibot\core\externals\httplib2\python2\httplib2\__init__.py", line 1591, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File "C:\Pywikipedia\ssh\pywikibot\core\externals\httplib2\python2\httplib2\__init__.py", line 1333, in _request (response, content) = self._conn_request(conn, request_uri, method, body, headers) File "C:\Pywikipedia\ssh\pywikibot\core\externals\httplib2\python2\httplib2\__init__.py", line 1289, in _conn_request response = conn.getresponse() File "C:\Python27\lib\httplib.py", line 1030, in getresponse response.begin() File "C:\Python27\lib\httplib.py", line 407, in begin version, status, reason = self._read_status() File "C:\Python27\lib\httplib.py", line 365, in _read_status line = self.fp.readline() File "C:\Python27\lib\socket.py", line 430, in readline data = recv(1) File "C:\Python27\lib\ssl.py", line 241, in recv return self.read(buflen) File "C:\Python27\lib\ssl.py", line 160, in read return self._sslobj.read(len) error: [Errno 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat
Change 110853 had a related patch set uploaded by Merlijn van Deen: (bug 55127) Implement network timeouts https://gerrit.wikimedia.org/r/110853
This solves the basic issue, but does not fix the other two issues (the main thread will not respond to ctrl-c until the timeout is reached; also the thread will not be stopped until the timeout is raised)
Change 110866 had a related patch set uploaded by Merlijn van Deen: (bug 55127) Allow user to cut off pending network requests at shutdown https://gerrit.wikimedia.org/r/110866
Change 110867 had a related patch set uploaded by Merlijn van Deen: (bug 55127) Allow user to abort during http request https://gerrit.wikimedia.org/r/110867
And with the latest two patches, this should also be fixed.
Change 110853 merged by jenkins-bot: (bug 55127) Implement network timeouts https://gerrit.wikimedia.org/r/110853
Change 110866 merged by jenkins-bot: (bug 55127) Allow user to cut off pending network requests at shutdown https://gerrit.wikimedia.org/r/110866
Change 110867 merged by jenkins-bot: (bug 55127) Allow user to abort during http request https://gerrit.wikimedia.org/r/110867
Change 108325 abandoned by Xqt: [BUGFIX] fixing bug 55127 Reason: https://gerrit.wikimedia.org/r/110853 https://gerrit.wikimedia.org/r/110866 https://gerrit.wikimedia.org/r/110867 https://gerrit.wikimedia.org/r/108325