Last modified: 2014-06-27 13:25:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44599, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42599 - Add option to apply nofollow only to external links added in revisions marked unpatrolled
Add option to apply nofollow only to external links added in revisions marked...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Patrolling (Other open bugs)
1.21.x
All All
: Low enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: parser
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-01 10:57 UTC by Nathan Larson
Modified: 2014-06-27 13:25 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nathan Larson 2012-12-01 10:57:27 UTC
In accordance with Google's suggestion to use or not use nofollow depending on trustworthiness of the user or edit ( http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569 ), this is a proposal to add an option (turned on or off by a new configuration setting) to apply nofollow only to external links added in revisions marked unpatrolled.

Is there any reason why this shouldn't be implemented as a core feature? I'm thinking, a boolean el_patrolled field should be added to externallinks. The value will be 1 if (a) the URL was added by an autoconfirmed user, or (b) the URL was added by a newbie/anon, but that URL was already in the table as patrolled on another page. Otherwise, the value will be 0, and nofollow will apply to the link.

The value will be switched to 1 as soon as any page is patrolled that contains that URL, and then nofollow will NOT apply to that link anymore. I suppose caches would need to be cleared accordingly.

I don't think the spam whitelist should be used in determining whether nofollow will be applied to a link, because even whitelisted domains are susceptible to being used for spamming. E.g., a person can link to a spammy page he added to a mostly legitimate website. Right now, nofollow is generally used for all external links (since $wgNoFollowLinks defaults to true) so this would still be a significant improvement in the precision of this anti-spam method.
Comment 1 Nathan Larson 2012-12-02 11:53:46 UTC
Does the premise of this proposal, that most spamlinks are in unpatrolled revisions, seem sound? I think this might work well especially if a beefed-up autopromote scheme (e.g. one that doesn't count edits outside of content namespaces toward one's edit count) were to be implemented; https://www.mediawiki.org/wiki/Extension:EnhancedAutopromote could be revised to make that possible. Also, perhaps this could be used in conjunction with $wgNoFollowNsExceptions to apply nofollow to non-content namespaces, since spammy links in userspace and so on might have a greater tendency to go unreverted.
Comment 2 Nathan Larson 2013-11-15 01:52:26 UTC
I think there should be a configuration setting to disable this feature if wiki system administrators want to just apply nofollow or dofollow to ALL external links (as per the status quo) depending on the value of $wgNoFollowLinks.
Comment 3 Nathan Larson 2013-11-15 01:56:00 UTC
Also, I guess if they've set $wgUseRCPatrol to false, then this feature should automatically disable itself.
Comment 4 Nemo 2013-11-15 07:32:48 UTC
(In reply to comment #3)
> Also, I guess if they've set $wgUseRCPatrol to false, then this feature
> should
> automatically disable itself.

Yes; and sysadmins would be interested in this only if their wiki has a strict definition of what's patrollable which matches the assumptions here. Patrolling however would cause reparsing, no idea what consequences this has.
Comment 5 Nathan Larson 2013-11-15 18:58:19 UTC
I doubt the impact of reparsing on performance would be all that major (depending on what you consider major), since typically most users' edits autopatrol anyway. The anons and newbies' edits have usually been a relatively small proportion of the edits on the wikis I've seen. Comparing https://en.wikipedia.org/w/api.php?action=query&list=logevents&letype=patrol&lelimit=500 to https://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rctype=edit&rclimit=500 , if you let the following variables be thus:
*A: Timestamp of most recent patrol event
*B: Timestamp of 500th most recent patrol event
*C: Timestamp of most recent edit
*D: Timestamp of 500th most recent edit

A - B = E
C - D = F

E / F ~ 16. So in other words, patrol activity is about 1/16th as heavy as editing activity. Hmm, would that be a dealbreaker for WMF to do that much reparsing?
Comment 6 Nathan Larson 2013-11-15 20:11:29 UTC
On the other hand, wouldn't patrol actions only require reparsing if the anon/newbie added a new external link? I can do an analysis to find out how often that occurs, if the information would be helpful.
Comment 7 Nathan Larson 2013-12-09 00:35:40 UTC
(In reply to comment #4)
> Yes; and sysadmins would be interested in this only if their wiki has a
> strict
> definition of what's patrollable which matches the assumptions here.

We could add a hook that lets extensions implement other methods for designating external links as patrolled.
Comment 8 Nathan Larson 2014-01-05 10:06:51 UTC
A downside to using this option is that good external links wouldn't have the nofollow applied until patrolled. I suspect that big sites like Wikipedia will reject this option because they'll worry about stealth spammers marking spammy external links as patrolled (whether by autopatrol or as part of a tag team that includes a patroller). I suspect that small sites with little or no spam will prefer to just keep $wgNoFollowLinks set to false, since that allows them to reap any advantages of dofollow as quickly as possible (i.e. without waiting for pages to be patrolled). https://www.mediawiki.org/wiki/Manual:Costs_and_benefits_of_using_nofollow I dunno if any mid-sized sites would want this option; if so, please post a comment, so that can be taken into consideration in deciding whether it's worth the effort of implementing. Thanks.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links