Last modified: 2013-06-14 03:48:03 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T18166, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 16166 - Remove "hits" from CAPTCHA dictionary
Remove "hits" from CAPTCHA dictionary
Status: NEW
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Low minor with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/wiki/Image:Pe...
: platformeng
Depends on: 21025
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-28 20:27 UTC by Ilmari Karonen
Modified: 2013-06-14 03:48 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ilmari Karonen 2008-10-28 20:27:23 UTC
I'm forking this from bug 10408.  The word "hits" tends to produce unfortunate CAPTCHAs whenever it gets appended to a plural word.  While we discuss the reasonability of adding a blacklist regexp for naughty words at the original bug, in the mean time it would be reasonable to simply remove the offending word from Wikimedia's word list.
Comment 1 Sam Reed (reedy) 2010-12-01 18:09:06 UTC
Brion, any idea on this one?
Comment 2 Platonides 2011-02-06 14:33:29 UTC
The words 'shit' and 'shits', should be removed too. :)
Comment 3 Sam Reed (reedy) 2011-07-07 23:30:43 UTC
Do we actually have a "dictionary", or is this farmed off to google using ConfirmEdit/Recaptcha...?
Comment 4 Brion Vibber 2011-07-08 20:18:01 UTC
We use/used a system dictionary file I think (eg /usr/share/words or some such) to generate the captcha images via the python script included with ConfirmEdit. Not sure whether they're being actively generated now or if we're just using a large existing image pool.

(Wikimedia doesn't use Recaptcha due to its proprietary licensing.)
Comment 5 Platonides 2011-07-16 20:08:15 UTC
Yes, we use that file with a few filters (I prefer not to give too much details here, although our wordlist is public).
I don't think the files are ever regenerated.

Anyone with shell should be able to fix this and regenerate the files, giving to Tim just because he is familiar with it.
Comment 6 p858snake 2011-07-17 01:57:47 UTC
(In reply to comment #4)
> (Wikimedia doesn't use Recaptcha due to its proprietary licensing.)
We do for fund-raising stuff, the reason we don't normally is because of the external reliance stuff (as of the last discussion), but we apparently pay no attention to that for the fund-raising.
Comment 7 Platonides 2011-07-17 12:56:57 UTC
We shouldn't. The developer which worked on it didn't really know how to use our FancyCaptcha at that time.
Comment 8 Rob Lanphier 2012-11-01 00:34:59 UTC
What's left to even do on this one?
Comment 9 MZMcBride 2012-11-01 01:06:22 UTC
(In reply to comment #8)
> What's left to even do on this one?

I think resolving bug 21025 would be a better use of time than focusing energy on this bug.
Comment 10 Platonides 2012-11-02 19:12:47 UTC
Marked as depending on bug 21025, as they are closely related.

RobLa, this is one of those 10-minute bugs that lag for years because they need shell access.

Steps to take:
- Find out where in fenari the blacklist is.
- Commit it to ConfirmEdit (closes bug 21025)
- Add 'hits' to it and commit.
- Run captcha.py in a new folder, eg. /mnt/upload7/private/captcha-en (see bug 38699 or similar irc logs for bug 38391)
- Change in CommonSettings.php:	$wgCaptchaDirectory to the new folder and sync-file.
- Close this bug

Captchas just being sent (but not answered) will still work.
Comment 11 Aaron Schulz 2012-12-18 23:52:09 UTC
Looks like /mnt/upload7/private/captcha2 has has word lists, including a bad one.
Comment 12 Platonides 2012-12-19 00:07:27 UTC
Can you commit it to the ConfirmEdit repository? :)
Comment 13 Andre Klapper 2013-04-18 17:52:09 UTC
MW (ConfirmEdit extension) territory -> no "ops" in this case.

(In reply to comment #11 by Aaron Schulz)
> Looks like /mnt/upload7/private/captcha2 has has word lists, including a bad
> one.

Does this need any help from another team? I assume not.
Comment 14 MZMcBride 2013-06-14 01:39:27 UTC
Now that bug 21025 is resolved, this bug just needs someone in ops to verify which blacklist is being used on Wikimedia wikis.
Comment 15 Aaron Schulz 2013-06-14 01:50:00 UTC
There isn't one really. The one in my home dir on fenari should be put somewhere standard...probably in puppet to (though it would be an amusing commit to make).
Comment 16 MZMcBride 2013-06-14 01:52:40 UTC
I assume Wikimedia wikis are using captcha.py. The only question then is whether it's already passing the --blacklist option currently. If not, the default value should get picked up in the next code update. Maybe.
Comment 17 Aaron Schulz 2013-06-14 03:48:03 UTC
(In reply to comment #16)
> I assume Wikimedia wikis are using captcha.py. The only question then is
> whether it's already passing the --blacklist option currently. If not, the
> default value should get picked up in the next code update. Maybe.

We only run it manually I did the last run...it should really happen periodically with proper image rotation though...I think that's a bug report.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links