Last modified: 2014-09-03 01:10:52 UTC
In light of the recent compromised accounts on the English Wikipedia, I'd like to propose a few improvements in the way of securing log-ins to MediaWiki. It is my firm belief that both of these compromised accounts were the result of simplistic password-cracking: In the one case it appears that the user's username was the same as his password, in the other it appears that the user's password was "password". As such my first recommendation is that user's be required to select a password containing at least 6-8 characters, comprised of at least one digit and both capital and lowercase alpabetic characters. Basically, this is just to force users to select stronger passwords. Secondly, I would like to suggest a log-in captcha at Special:Userlogin. After one failed attempt, the user must also complete the captcha to log-in. This will prevent automated password-crackers from being used to get user's passwords and will make it much more difficult and time-consuming for others to manually guess passwords. I would also like to propose that the highly unsecure log-in method provided by Api.php be removed. This uses a simple GET with the user's username and password in the URL, and absolutely no throttling whatsoever. Clearly, this is a high security risk. If the captcha idea is rejected, or even if it is accepted, I would like to suggest that a throttle on log-in attempts be implemented, such that after X-number of tries to authenticate from a host, regardless of the username, that host must wait 30 seconds before being allowed to try again. This will additionally curb the problem of both automated and manual password crackers. With the millions of users of MediaWiki, it's time that we started to get serious about security issues, especially on Wikimedia. Most other prominent sites have realized this; it's time we do too. At present time, any idiot who knows any programming at all can set up a script to use the monkey-on-a-keyboard approach to guess any password; this is simply unacceptable. Even iff my ideas are rejected, I do hope that _something_ will be done to improve security.
Captchas might not be such a good idea, it makes it very hard on those that cannot see the captcha. But the other ideas of throttling are very good. Please fix.
While we're at it, can we get better captchas than simple sums that even a bot could read? We could have a captcha competition, with people submitting the best captcha images. Okay maybe not a competition. Just not the crappy ones most sites have that are actually completely unreadable.
We have better captchas, that use python libraries, but they generate more overhead and are turned off.
(In reply to comment #1) > Captchas might not be such a good idea, it makes it very hard on those that > cannot see the captcha. But the other ideas of throttling are very good. Please fix. Well, a lot of sites provide the captchas in both audio and visual form so that the blind can use them as well, and we certainly don't have to use the illegible ones that most sites use. I find the ones that, say, Google uses to be quite legible though. Captchas also do not have to create overhead for the devs, as there are many captcha libraries available for free that can be easily incorporated into MediaWiki.
Additionally, encrypt the passwords, and use HTTPS. I know, the secure server needs more resources, but at least some sort of encryption can be done to stop passwords from being intercepted.
(In reply to comment #5) > Additionally, encrypt the passwords, and use HTTPS. I know, the secure server > needs more resources, but at least some sort of encryption can be done to stop > passwords from being intercepted. And, from what a little birdy told me, the secure server uses a null cipher for performance reasons, so you're really not getting any added security if this is the case, other than the illusion of security provided by the little padlock in your status bar :). I don't see why we couldn't at least use some kind of MD5 or PGP encryption on transmissions though. Granted the MD5 encryption has now been successfully reversed, but it would help quite a bit. I think encryption of posts is probably lower priority than the above mentioned issues though, and it would require quite a bit more resource.
(In reply to comment #6) > And, from what a little birdy told me, the secure server uses a null cipher for > performance reasons, so you're really not getting any added security if this is > the case, other than the illusion of security provided by the little padlock in > your status bar :). Update: It appears this was a lie. I heard this at http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_%28technical%29&direction=next&oldid=96637542#Status_of_secure.wikimedia.org.3F but it turns out to be false. Having just checked, it appears the secure server does use an actual MD5 hash, so logging in via it is likely to be fairly secure. It would, therefore, definitely improve security to handle all log-in requests through the secure server, though I understand that this may be quite resource-intensive.
Throttle login attempts, please. I mistype my username and password fairly often, but 5 attempts in, say, 5 minutes, should be enough for anyone. Don't do captchas, they're a nuisance.
(In reply to comment #8) > Throttle login attempts, please. I mistype my username and password fairly > often, but 5 attempts in, say, 5 minutes, should be enough for anyone. Don't do > captchas, they're a nuisance. Captchas would be useful along with throttling, only appearing when you have entered the wrong password several times.
Without captchas, people could get around IP throttling by just using a lot of tor nodes/open proxies simultaneously. The throttling would have to be per- account as well as per-IP.
An other idea would be to require a Lost password retrieval procedure after a given number of unsuccessful tries in a short time. The bad side is that if you don't have any e-mail address filled in you can't get your access back, but on the other hand that would solve the problem without requiring captchas.
Security is nice, but it shouldn't make logging in impossible for legitimate users even in cases of flood.
(In reply to comment #11) > An other idea would be to require a Lost password retrieval procedure after a > given number of unsuccessful tries in a short time. The bad side is that if you > don't have any e-mail address filled in you can't get your access back, but on > the other hand that would solve the problem without requiring captchas. This would introduce a straightforward vector for abuse.
I oppose the captcha I support a delay of between 3 and 15 minutes after three bad attempts from a given IP - this should apply foremost to the IP that attempted login; Max Semenik has pointed out that applying it to the account for which login was attempted is a bad thing because it can prevent genuine logins. However, this could be additionally enacted when there is good evidence of IP-hopping (at least x IPs attempted logins to a given account, and failed at least y times). The password strength test that has been proposed should include a dictionary search for each part-string as well as common letter-number substitutions, specifically O-0, I-1, A-4, T-7, S-5, Z-2, E-3 and also Z-7; this list may be incomplete.
Of course, we don't want throttling at a level that will allow a DOS.
(In reply to comment #15) > Of course, we don't want throttling at a level that will allow a DOS. Clearly. For this reason, I suggest that the throttling be targeted toward hosts and not usernames, as that's just asking for trouble. Additionally, I would suggest the throttling not be done in a matter that requires WM to maintain a connection with the host; rather, it should simply remember the time of the last failed log-in attempt from a given host and reject any log-in attempts within, say, 10-15 seconds of that last log-in attempt. If the purpose of the throttling is to prevent brute-forcing passwords, that should be sufficient. To phi1ipp@yahoo.com: Do you mind to elaborate on why you oppose the use of a captcha? To me, this seems like the most logical and low resource-intensive method way of curbing brute-force cracking, and it is the method that most sites have adapted. It is not like the captcha would kick-in everytime you log in, but rather only if you fail to successfully log-in from a given host, say, 3 times. Again the captcha would be targeted to the host, not to the username.
About using HTTPS: it would be great if we could login via the secure server, but still use the faster non-HTTPS one for everything else (the secure server could return a single-use token in a redirect to the normal servers; the normal servers would then check the token, invalidate it, and copy the session data from the secure server). This way a sniffer would only be able to hijack the session, but not the password.
Repurposing to tracking bug, there's too much going on here. Please open new bugs to discuss specific requests not already covered elsewhere.
Might be good to have a limit on failed logins per hour.
> To phi1ipp@yahoo.com: Do you mind to elaborate on why you oppose the use of a > captcha? It's redundant with the delay mechanism. Additionally, my experience of captchas from a variety of other websites is that they rarely work. They also exclude some visually impaired users. If an exception is made for those users, the whole purpose is defeated.
(In reply to comment #20) > > To phi1ipp@yahoo.com: Do you mind to elaborate on why you oppose the use of a > > captcha? > > It's redundant with the delay mechanism. Additionally, my experience of captchas > from a variety of other websites is that they rarely work. They also exclude > some visually impaired users. If an exception is made for those users, the whole > purpose is defeated. No, because a) the group of targets is vastly smaller and b) they could be subjected to stricter password requirements. No delay mechanism is planned for addition because of DoS concerns (lock out all admins/users by spamming random passwords). Per-IP delay is still a concern, albeit smaller, due to dynamic IPs. Anyway, please keep discussion on specific features to specific bugs (e.g., bug 9836), not the tracker bug.
Thanks for the security clarification, Daniel Cannon. I also heard that secure.wikimedia.org used null encryption. TLS (successor to SSL) for login-only would be great, but there are some people, hopefully a minority, who will want to use the secure server for everything. For example, people who use a hardblocked proxy through http but not https, or people who are just paranoid. It would probably be better to implement a separate TLS login for the regular site. However, the developers know more about what kind of load the servers can handle than I do. Once TLS login is required, it might not be a bad idea to require everyone to change their passwords. Too much security could introduce the possibility of denial of service attacks, where a cracker makes it impossible for the owner of the account to access the account. Requiring a user to request a new password after a certain number of failed tries would make it easy to DOS attack users without email addresses. Same thing for captchas, as some users may be either blind or using a text-only browser such as Lynx. Perhaps, after a certain number of failed attempts, the software could refuse to let the user try again until a given amount of time expires OR the user enters a captcha?
> Anyway, please keep discussion on specific features to specific bugs > (e.g., bug 9836), not the tracker bug. Replies where questions. Regards.
Adding 'tracking' keyword.
I also don't like captchas. But please don't force users to use at least one digit or something like that, because instead of increasing the security it will actually reduce the search space. Instead I propose this: (1) convert the password at least to Unicode NFC (and apply any other suitable normalization like compression of whitespaces). Possibly even to NFKC (to avoid compatibility characters). If that password, after normalization, is different from what the user typed, make sure to inform the user to confirm that this is what is happening. (2) compute the basic size S of the alphabet : - if a lowercase ASCII letter is used anywhere in the password, add 26 to the the alphabet size - if an uppercase ASCII letter is used anywhere in the password, add 26 to the the alphabet size - if a decimal ASCII digit is used anywhere in the password, add 10 to the the alphabet size - if a ASCII punctuation is used anywhere in the password, add the size of this ASCII punctuation subset to the alphabet size. - on localized wikis, consider other subsets consisting in non-ASCII letters used in their alphabet (take CLDR data appropriate for that language, remove the characters already part of the previous subsets, and then add the remaining characters to the basic size S). - if other Unicode characters are included, accept them individually by adding 1 to S for each distinct character (but inform users that they may have difficulties to connect from some environments with such password). (3) take the base-2 logarithm of the alphabet size, and multiply by the password length (N). This gives the raw "bit-length" strength of a password. In other words : raw bit-length strength = log2(S)*N (4) if a space is accepted in the password, it should just occur in the middle and not at the begining or end and not in sequences of more than one space. Because of that, a password of length N cannot contain more than (N - 1) DIV 2 spaces, which adds ((N-1)DIV 2)*log(S+1)/log(S) to the row bit-length strength. Of course you can check that basic default passwords is not used (like "0000" or "1234" or "password" or "admin" or the username itself, or any word contained in the user's own public identity like hist public first name or last name, or any word contained in his user page, or in the first 1KB of his talk page). But using any large dicionary to forbid passwords may actually reduce the bit-length strength rather than increasing it, for brute-force attacks (even if it protects from dictionnary-based attacks), by allowing them to skip too words contained in that known dictionary. And it may also forget many wellknown common words (including first names) from other foreign languages (my opinion is that the dictionary used should just be built from the terminology used in the MediaWiki messages stored in the "MediaWiki:" space, in all its supported languages, and for each extension that is activated in the wiki where the account is created). However, even if a password is not strong enough, users should still not be forbidden access completely: he should be denied from using the secure server, but will be informed that his password is not strong enough to be used there, but he will have the option to go to the non-secure servers. I also suggest then that the user's Preferences panel include such password bit-length strength (computed like above) and a visual color bar indicating him the basic security of his account, and if the bitlength strength is suitable for identification on the secure server.
(In reply to comment #25) > I also don't like captchas. But please don't force users to use at least one > digit or something like that, because instead of increasing the security it > will actually reduce the search space. > > Instead I propose this: > [snip huge proposal] Per comment #18, please open new bugs for new proposals. I'll add as a quick side note that I do not believe in restricting user's passwords to force them to be stronger: users choose their own passwords, as wisely or as unwisely as they want. This is their own responsibility. We can and should help keep their passwords from being compromised by using SSL on every login; I believe there's already an open bug for that.
Note that for dictionnary lookups (when evaluating the bit-length strength), there's already good dictionnaries available: you may just check the existence of the word in an article in the main space of the list of existing Wiktionnaries that have more than about 10000 entries. This just requires a single http request per tested wiki. Then the actual computed bit-length strength can be reduced to the base-2 logarithm of the tested Wiktionnary sizes (measured as the number of articles in the main space of the tested wikis, summed together, or to their logarithmic average). The computed numeric value should also be made visible (and should be recomputed each time the user visits its Preferences page, if the algorithm is later updated), in addition to the color-coded visual evaluation of that value (such as, black: insufficient and not acceptable, red: strong warning, yellow: acceptable, green: good, blue: strong).
"I do not believe in restricting user's passwords to force them to be stronger: users choose their own passwords, as wisely or as unwisely as they want". Did I suggest that? No. I exactly propose to help user to choose their password wisely, and in fact more freeely that what the other suggestions below are doing, because I don't want to force users to use a mix of capital/lowercase letters, or digits. The algorithm will be relaxed enough to allow users to choose whatever characters they want, or the password length they want, or even pass phrases (when accepting spaces), WITHOUT reducing the search space (in fact it does NOT restrict the search space, but increases it by allowing MORE freedom for users, and MORE difficulties for password crackers). And I also give caution that dictionnary lookups are bad if the dictionnary is wellknown and is enforced in a restrictive way (because it will actually help the password crackers if it is enforced).