Last modified: 2012-08-04 21:10:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34993, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 32993 - Need full Article Samples White-list and Black-list
Need full Article Samples White-list and Black-list
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
ArticleFeedbackv5 (Other open bugs)
unspecified
All All
: Highest blocker (vote)
: ---
Assigned To: Dario Taraborelli
:
Depends on:
Blocks: 32885 39042
  Show dependency treegraph
 
Reported: 2011-12-12 18:35 UTC by Fabrice Florin
Modified: 2012-08-04 21:10 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Fabrice Florin 2011-12-12 18:35:54 UTC
We need to get from Dario these two article sample lists: 

* Article samples white-list (Article Feedback_5 category)
* Article samples black-list (Article Feedback_Blacklist)

See data/metrics plan:
http://meta.wikimedia.org/wiki/Research:Article_feedback/Data_and_metrics
Comment 1 Fabrice Florin 2011-12-12 20:51:21 UTC
A couple things need to happen to make this work:

1) Dario will ask for WMF engineering help to run bot, so that we have AFTv5 White List by end of day Monday.

2) Roan will show OmniTI how to change the AFT4 Configuration file on prototype to treat the AFT5 white list as a blacklist for AFT4 -- or do the job himself

3) Roan will change these configuration files again on en.labs.wikimedia.org pre-deployment, then on  en.wikimedia.org at launch.
Comment 2 Fabrice Florin 2011-12-12 20:57:03 UTC
See related bug 32885 for testing all this works.
https://bugzilla.wikimedia.org/show_bug.cgi?id=32885
Comment 3 Dario Taraborelli 2011-12-14 04:54:36 UTC
Copying here from the mail I sent earlier:

http://toolserver.org/~dartar/temp/enwiki_noredirect_random.txt

This is a 0.3% random sample (N=11851) from all enwiki articles excluding redirects. The list is slightly larger than 10K articles to account for the fact that ~6% of all enwiki articles are disambiguation pages that will be blacklisted.

* articles from the random sample list (enwiki_noredirect_random.txt) need to be added to a hidden category called Category:Article Feedback 5
* AFT5 needs to be configured to honor Category:Article Feedback Blacklist as a blacklist and Category:Article Feedback 5 as a whitelist
* AFT4 needs to be configured to honor both Category:Article Feedback Blacklist and Category:Article Feedback 5 as blacklists
Comment 4 Yoni Shostak 2011-12-14 19:08:14 UTC
so can we close this ticket?
Comment 5 Fabrice Florin 2011-12-14 19:10:24 UTC
Yes, I am closing this ticket now. 

The bot is now 1/3 done in finishing the new category on En.WP. Should be done in a couple hours.  See below.


On Dec 14, 2011, at 7:19 AM, Sam Reed wrote:

Hi All,
 
I gzipped the files in Darios ~/public_html/temp folder [1] [2] [3], so download would be much faster for us on non-super-fast connections ;)
 
Category created [4]
 
Bot is now running [5]
 
Currently it’s running at 11-14 epm, reckons about 14 hours at the low end…
 
Sam
 
[1] http://toolserver.org/~dartar/temp/enwiki_geotagged.txt.gz
[2] http://toolserver.org/~dartar/temp/enwiki_noredirect.txt.gz
[3] http://toolserver.org/~dartar/temp/enwiki_noredirect_random.txt.gz
[4] https://en.wikipedia.org/wiki/Category:Article_Feedback_5
[5] https://en.wikipedia.org/wiki/Special:Contributions/Reedy_Bot

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links