Last modified: 2013-05-05 20:18:11 UTC
It seems that the TOR block extension is not working anymore, probably caused because https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=208.80.152.2 gives a HTTP 403 (Forbidden). Noticed, because my open proxy monitoring system started to report edits via (long existing) TOR exit nodes. Now started to maintain again the TOR exit nodes in my database of open proxies. Recent test: http://nl.wikipedia.org/w/index.php?title=Overleg_Wikipedia:Zandbak&diff=27083157&oldid=26766446, reported here: http://nl.wikipedia.org/w/index.php?title=Wikipedia:Open_proxy_detectie&diff=27083159&oldid=27082535 Usable input for TOR exit nodes is now: http://torstatus.blutmagie.de/ (link on new main page of torproject.org), to be filtered for exit nodes. Is less precise, because of http://meta.wikimedia.org/wiki/Tor_Exit_Node_Configuration configs can't be noticed. Alternative is via https://www.torproject.org/projects/tordnsel.html.en, but generates a lot of traffic. Combination of both may be considered if original url remains not accessible.
Removing "easy" tag, the extension needs to be mostly rewritten. Ideally we would get the data from Tor's directory system rather than relying on a script running on the personal webspace of some random software developer.
The list at http://meta.wikimedia.org/wiki/Tor_Exit_Node_Configuration is outdated. We are using a ip per project now.
The python script used is not hard. We could almost run it locally or reimplement it in php. Its algorithm is: * Fetch a raw list of all exitAddresses by grepping ExitAddress from RawExitList. Store it as a parsed-exit-list (a list of all potential exit ips) * When asked about an ip + port, perform a query for each ip on the parsed list to the trordnsel service [1] on <clientIP>.<port>.<target>.ip-port.exitlist.torproject.org. (NXDomain not accessible, 127.0.0.2 accessible) and cache it. The problem is getting the exitAddresses list. TorBulkExitList.py reads it from a local file at /srv/check.torproject.org/tordnsel/state/exit-addresses with a comment pointing to download it from http://exitlist.torproject.org/exitAddresses (which doesn't load) tordnsel mention that "it establishes a persistent controller connection to Tor, receiving updated nodes and exit policies as Tor fetches them from directories." which is probably the best to . Another way may be to "parsing the cached-routers file,". Both alternatives seem to require running tor. 1- https://svn.torproject.org/svn/check/trunk/cgi-bin/TorBulkExitList.py https://www.torproject.org/projects/tordnsel.html.en
If we know of a tor server providing directory information (eg. 10.10.10.10:9030), I think we could get the list of servers by doing wget -O - http://10.10.10.10:9030/tor/status-vote/current/consensus.z | grep ^r\ /tmp/consensus.z | cut -d ' ' -f 7 Of course, if we have the consensus (or a cached copy by a client), we also have the list of accepted/rejected ports for each server. But it seems we would also need a copy of the descriptors for matching the ips. For that we would fetch instead http://10.10.10.10:9030/tor/server/all and use a different parsing.
I think it's best to run our own Tor client rather than rely on someone else's. There's no Tor single server with a sufficient commitment to uptime, including *.torproject.org as this bug demonstrates. Extracting information from a normal Tor client gives us a better chance of being able to build a valid exit list if the relationship between the Tor community and projects like TorBlock becomes adversarial. There's no file called cached-routers in my version of Tor, but there is a /var/lib/tor/cached-consensus which seems to have the required information.
/var/lib/tor/cached-consensus is the consensus The other data could be available at /var/lib/tor/cached-descriptors but that seems to include more things, like public keys of hidden services. If we run a tor client, it seems preferable to investigate the protocol to get a live feed from our client. The other method may be kept for users which won't be running a daemon.
(In reply to comment #7) > If we run a tor client, it seems preferable to investigate the protocol to get > a live feed from our client. The other method may be kept for users which won't > be running a daemon. Can we export the exit list from TorBlock itself and publish that data on WMF servers for the benefit of TorBlock users without a Tor client? Say with an API module?
Is there any progress on this matter? With the current batch of spam bots, it would be good to be able to rule this out as one of the holes in the defence. We are seeing such a persistent level of spam bot attacks that seem to be concentrated from within certain IP networks (within a number of /16 with some repeat IP, others single use IP) and one wonders whether this matter is part of the issue. From an IP address we get a blurt of account creations, some the same account multiple sites, and/or multiple accounts. Generally focuses around the same set of wikis, though it has been noticed to be spreading to more wikis, no obvious pattern. Nothing specifically is showing for XFF. Thanks.
It looks like both: https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=208.80.152.2 and http://exitlist.torproject.org/exit-addresses are working again. Running our own node seems like it would get us the list of all exit nodes (same as http://exitlist.torproject.org/exit-addresses). The preferred way to check an ip seems to be the dns exit list (tordnsel) since that would test if a particular ip address is a hidden node. But that seems like a lot of traffic for us to do a dns lookup for every incoming ip. Tim, are you still thinking the whole extension needs a rewrite still?
Then blocking will have fixed itself. Note we don't want to block all exit nodes, only those from which we can be reached. What do you mean by hidden node? A hidden service?
Oh, I was referring to this: "Previous DNSELs scraped Tor's network directory for exit node IP addresses, but this method fails to list nodes that don't advertise their exit address in the directory. TorDNSEL actively tests through these nodes to provide a more accurate list." [1] Since users have know about servers they route through, any unlisted node would have a somewhat limited userbase, so I'm not sure if we need to be as concerned about them for spam. But if we really want to catch them all, we would need to do a dns lookup for each connecting ip. Is it possible to confirm that the list is being updated for enwiki, and billinghurst's problems are not related/current? If that's the case, I think this ticket can either be closed, or turned into a feature request for making the extension use the dns method. [1] - https://www.torproject.org/projects/tordnsel.html.en
Yes, it is updated. I just asked and the list currently has 642 ips. The problems reported on March should be gone now.
So this: (In reply to comment #5) > I think it's best to run our own Tor client rather than rely on someone else's. > There's no Tor single server with a sufficient commitment to uptime, including > *.torproject.org as this bug demonstrates. should be split to another bug?
I'd just repurpose this bug, then.
(In reply to comment #14) > I'd just repurpose this bug, then. Ok.
FWIW, Extension:TorBlock was recently rewritten (and as of now actually works) using the Onionoo protocol. If WMF wants, it can set up its own Onionoo server and then just point the $wgOnionooServer variable to the local server.