Last modified: 2012-08-23 19:07:27 UTC
I've just noticed that the link from http://en.wikipedia.org/w/index.php?title=Service-oriented_architecture&oldid=77758121#SOA_definitions to TechEncyclopedia contains a ";jsessionid=..." part. The link works perfectly without this random session-specific addition. I manually removed this (and verified that the link still works). But was wondering whether it would be worthwhile to have a robot to look for links which contain such parts and automatically remove them or mark them to be checked by humans.
Resolving WORKSFORME. These sorts of things could easily be added to the URL blacklist or the AbuseFilter.
The point is not to block such links as malicious but to detect that the sessionid was included by mistake and remove it, since I expect this is the usual case.
The AbuseFilter can be set to warn instead of prevent the action. Also, MediaWiki shouldn't be deciding that session IDs aren't supposed to be a part of URIs. They are a perfectly valid part of a URI and should be parsed as such. If a community wants to write a bot to do that, they can. Re-suggest WFM.
Yes, this is typical use case for AbuseFilter or bot fixes.