Last modified: 2014-08-17 22:04:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59490, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57490 - Write extension to store data about what revisions were imported from what wikis
Write extension to store data about what revisions were imported from what wikis
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Extensions requests (Other open bugs)
master
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-23 21:03 UTC by Nathan Larson
Modified: 2014-08-17 22:04 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nathan Larson 2013-11-23 21:03:14 UTC
Especially in this age of Scribunto, it sometimes happens that revisions will be imported that end up breaking templates or causing other problems. E.g., you might import a bunch of templates from Wikipedia and then import some from other wikis and find that you're getting script errors because you overwrote something you didn't mean to. It can be hard to sort out which templates are causing what problems. Re-importing from Wikipedia doesn't necessarily help because those revisions might be older than the revisions that are causing the problems.

It would be helpful to be able to do a query and find out what pages' current revisions (page.page_latest) were imported from what wikis. That way, one could sort out what needs to be reverted. Therefore, I propose one of two solutions for storing in the database the data that is in <sitename> in the XML file:
(1) Add a revision.rev_imported field to store the name of the source wiki of imported revisions.
(2) Add a new table that will store the same data as in option 1.

Option 1 makes sense if a large proportion of the revisions on the wiki will have been imported from other wikis. Option 2 makes sense if the proportion is lower. I suspect that people are going to want to go with option 2.
Comment 1 Nathan Larson 2014-01-15 14:01:24 UTC
Another potential use of this might be if a wiki wanted to make sure it was in compliance with licensing requirements; it could do a query to find out what pages have revisions imported from Wikipedia, and verify that those pages also have the CC-by-SA license template applied (if necessary).
Comment 2 Nathan Larson 2014-01-15 14:34:19 UTC
It could also produce some interesting statistics for WikiApiary on what revisions, pages, etc. are widely imported where in the wikisphere. We could determine what percentage of a wiki's revisions (and current revisions) consist of imported content, which could produce some better measures of how much new content a wiki has added. We could do a diff of the most recently imported revision of a page, and its current revision, and see how much of the content differs to figure out the statistic for that page.
Comment 3 Nathan Larson 2014-01-15 16:13:13 UTC
For WikiApiary purposes, an acceptable alternative might be to fix bug 60090, adding the revision data to log_params.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links