Last modified: 2012-08-04 21:10:38 UTC
We need to get from Dario these two article sample lists: * Article samples white-list (Article Feedback_5 category) * Article samples black-list (Article Feedback_Blacklist) See data/metrics plan: http://meta.wikimedia.org/wiki/Research:Article_feedback/Data_and_metrics
A couple things need to happen to make this work: 1) Dario will ask for WMF engineering help to run bot, so that we have AFTv5 White List by end of day Monday. 2) Roan will show OmniTI how to change the AFT4 Configuration file on prototype to treat the AFT5 white list as a blacklist for AFT4 -- or do the job himself 3) Roan will change these configuration files again on en.labs.wikimedia.org pre-deployment, then on en.wikimedia.org at launch.
See related bug 32885 for testing all this works. https://bugzilla.wikimedia.org/show_bug.cgi?id=32885
Copying here from the mail I sent earlier: http://toolserver.org/~dartar/temp/enwiki_noredirect_random.txt This is a 0.3% random sample (N=11851) from all enwiki articles excluding redirects. The list is slightly larger than 10K articles to account for the fact that ~6% of all enwiki articles are disambiguation pages that will be blacklisted. * articles from the random sample list (enwiki_noredirect_random.txt) need to be added to a hidden category called Category:Article Feedback 5 * AFT5 needs to be configured to honor Category:Article Feedback Blacklist as a blacklist and Category:Article Feedback 5 as a whitelist * AFT4 needs to be configured to honor both Category:Article Feedback Blacklist and Category:Article Feedback 5 as blacklists
so can we close this ticket?
Yes, I am closing this ticket now. The bot is now 1/3 done in finishing the new category on En.WP. Should be done in a couple hours. See below. On Dec 14, 2011, at 7:19 AM, Sam Reed wrote: Hi All, I gzipped the files in Darios ~/public_html/temp folder [1] [2] [3], so download would be much faster for us on non-super-fast connections ;) Category created [4] Bot is now running [5] Currently it’s running at 11-14 epm, reckons about 14 hours at the low end… Sam [1] http://toolserver.org/~dartar/temp/enwiki_geotagged.txt.gz [2] http://toolserver.org/~dartar/temp/enwiki_noredirect.txt.gz [3] http://toolserver.org/~dartar/temp/enwiki_noredirect_random.txt.gz [4] https://en.wikipedia.org/wiki/Category:Article_Feedback_5 [5] https://en.wikipedia.org/wiki/Special:Contributions/Reedy_Bot