Last modified: 2013-01-03 11:28:47 UTC
Currently, all uploads to Commons using UploadWizard are categorised into: http://commons.wikimedia.org/wiki/Category:Uploaded_with_UploadWizard There are now some 2 million files in this category, which makes it almost impossible to browse due to the sheer number of files. It would be better if uploads were categorised by the month and year they were uploaded. e.g. "Category:Uploaded with UploadWizard (November 2012)" to make browsing of files easier. Can we get UploadWizard amended so that this can occur in future. Current uploads can be re-categorised by one of our friendly Commons bot operators.
The category is not of any use any more; at least this way (without a date). It was used for statistics, I think but now it only proves that MediaWiki can handle very large categories, but this is not Wikimedia Commons' scope. Since I guess for Month-categories you have to change UpWiz' code and can't just change LocalSettings.php so I don't retarget this bug.
(In reply to comment #0) Please enter reasonable summaries for bugs, a URL doesn't explain much. I have edited it now. UW can be customized to upload to different categories by using autoCategories in UploadWizard.config.php so I guess this has been handled manually. Also it doesn't make much sense for UW to support month-wise categorization. Maybe some commons admin can throw more light into what is needed.
First the Commons community should decide what it wants: * Get rid of the category * Keep the category split up by date * Empty tag template * (maybe another option) Discussion should probably happen at https://commons.wikimedia.org/wiki/Category_talk:Uploaded_with_UploadWizard and you should advertise it at the village pump. When you have consensus you should probably come back to bugzilla to have it implemented. For the meantime I would close this bug.
(In reply to comment #3) > First the Commons community should decide what it wants: > * Get rid of the category > * Keep the category split up by date > * Empty tag template > * (maybe another option) Logically, using an change tags for upload actions by upload wizard would make sense. Then users can browse by different dates in Special:log/upload filtered to just upload wizard uploads. I suppose what approach is best all depends on what the point of adding the category was, but thought I should mention that as another possibility (or something to do in addition to the category).
I would favor getting rid of the category entirely. One of the reasons the category was created was actually to find bugs in UploadWizard when it was first being developed. For example, if a few image pages had a weird formatting problem and they both happened to be one of the 10,000 images in the 'Uploaded with UploadWizard' category, we knew the problem was an UploadWizard bug. Now the category is fairly useless for debugging purposes (and there are other ways we can see if an image is uploaded via UploadWizard anyway).
I have raised the issue for discussion at https://commons.wikimedia.org/wiki/Commons:Categories_for_discussion/2012/12/Category:Uploaded_with_UploadWizard -- I wouldn't expect consensus to take forever, so we could leave this report open for the time being?
This category is still used for statistical purposes. For example, the "upload activity levels" table at http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm relies on the existence of this category. I don't see why the benefit of either removing or subdividing this category outweighs the cost.
(In reply to comment #7) I don't know why you still have the need to track. You've 70% now. This indicates that UpWiz is now usable :-) I still think it was wrong making a buggy tool default. For anyone else who didn't recognize it, Erik's comment means WONTFIX.
Actually, we were at 46.3% in the pre-WLM month, and then at 69.6% during WLM. It'll probably drop again to ~50% in October/November as the influx of new uploaders goes down. Tracking those types of changes over time, and seeing how they may be influenced e.g. by the integration of new features like the Flickr upload feature or improved WP integration, is precisely the point.
(In reply to comment #7) >used for statistical purposes For statistical purposes, you can use the upload summary. At least there is no need that the category remains at the page forever? Can I build automated removal of that category in our clean-up regexps?
My understanding is that Erik's scripts run each time against the whole dump, so if you removed the category after a wihle, that would mean the counts would change. We should give Erik time to clarify that. It's possible that these could be switched over to parse the edit summary, provided the software adds and has always added the summary in the same way it's added the category. But tradeoffs and issues with that approach need to be analyzed first. I still fail to see what's accomplished by doing this, as we also have other "Uploaded with/to" categories that have already become very large (Commonist is at >100K files, images from WLM2012 ist at >300K files). What's the problem we're trying to solve by removing or restructuring these categories? What, if any, operation is currently slowed down by these categories?
Ryan, what are those other ways to see if an image is uploaded via UploadWizard? The category Category:Uploaded_with_UploadWizard is not suitable for navigation, but I don't see a need to remove it, either. It's purpose is to log the upload tool.
Indeed, all dump stats are regenerated from scratch each time. This is on purpose. Provided the integrity of the dumps is maintained, and I think we're good there, that gives us most consistent metrics over time, as fixed bugs (few) and new metrics (occasional) yield improved or new data for all history. But it brings the caveat Erik described above. If decision is taken to remove the category I can switch to the comment field. Alternative (e.g. when comment was not consistently set from the beginning) would be to have a script replace [[..uploadwizard..]] by an in-article comment <!--uploadwizard (do not remove this comment, for stats purposes)-->
(In reply to comment #13) Thanks for the reply. I guess at some point you'll stop gathering statistics. Just let us know when so we can remove this category when other stuff is done anyway (e.g. categorizing or i18n replacements). I don't think there is a pressing need to create more work now [there were times where the comment field was totally empty in UpWiz uploads; and now it is also not perfect] so I suggest, "resolved later"
[Removing RESOLVED LATER as that is deprecated.]
(In reply to comment #15) > [Removing RESOLVED LATER as that is deprecated.] There should be a message telling me this when selecting. (In reply to comment #13) > all dump stats are regenerated from scratch each time This way your statistics are wrong. The cumulative risk for each single file to be deleted grows over the time. If you then generate statistics from the files (uploaded with the wizard) alive, it will always look like having a growing number of Upload Wizard uploads even if the number of Upload Wizard uploads remains constant. This effect is small but it's there. However, the percentage of Upload Wizard uploads may remain constant over time, if you computed the "total" numbers also from non-deleted files. But there is no prove that Upload Wizard uploads have the same chance getting deleted like any other uploaded file. Using the upload log from when Upload Wizard started using a special "upload summary" is more reliable. Finally, I think one should attach some sources if required for these statistics and how you computed them next to each table/figure e.g. using footnotes. Often this is both important and interesting.
@Platonides: The file upload comment says "User created page with UploadWizard", although this isn't as easy to exploit for statistical queries. Looks like the opinion at the Commons discussion is split (as well as in the bug comments here). I'm going to go ahead and close this as WONTFIX. If stats.wikimedia.org stops relying on it or a consensus develops on Commons to delete it, feel free to reopen.
(In reply to comment #17) > @Platonides: The file upload comment says "User created page with > UploadWizard", although this isn't as easy to exploit for statistical > queries. > hopefully we don't start using that for stats. Its questionable if its appropriate to have the img_comment be the same for all files uploaded with upload wizard (kind of defeats the point of having an img_comment field). Its not unimaginable that later versions of upload wizard could change that behaviour (and earlier versions of upload wizard didn't even have that behaviour) </offtopic rant>