Last modified: 2010-05-15 15:29:14 UTC
I created a patch for undeletion of particular revisions. Special:Undelete shows a checkbox per revision, and undelete only checked revisions. We sysops of ja.Wikipedia need "deletion of particular revisions" very much. But Special:Import has not been released yet. I believe this patch is simple and certain way to resolve this problem.
Created attachment 43 [details] Patch for undeletion of paticular revisions.
A couple notes on the patch: * There's no validation on the timestamps, and they're not escaped. This could allow SQL injection attacks. * The timestamp conversion functions aren't used so it probably won't work on PostgreSQL. Looks otherwise generally OK; can you update to current CVS and repost please?
Yeah! これが使えたらja.wpもだいぶ落ち着きます。 This will make happy to ja.wp people!!
(In reply to comment #3) > Yeah! これが使えたらja.wpもだ いぶ落ち着きます。 > This will make happy to ja.wp people!! Testing some more... ひらがな
テスト投稿です
Sorry for the comment spam. ;) Testing again: カタカナ [firefox]
put comment from MacOSX Safari 読めるかな?
Created attachment 48 [details] Patch for undeletion of paticular revisions. (HEAD) A new patch for CVS HEAD with timestamp validation and SQL sanitizing. In addition, this patch will restore all revisions when no revisions are checked for backward compatibility. The timestamp conversion functions are not used yet, because SpecialUndelete.php seems not to use them. I'll make a patch for REL_1_3 later again.
No notify mails...
Created attachment 49 [details] Patch for undeletion of paticular revisions. (HEAD) Update; a bit bug fixed.
Created attachment 50 [details] Patch for undeletion of paticular revisions. (REL1_3) for REL1_3
Looks generally functional, but there's a serious problem: it doesn't handle compressed revisions ($wgCompressRevisions) properly. If the article is not currently present (so a new 'cur' entry has to be created), and you don't restore the last revision, the raw old_text of the most recent restored revision is inserted into cur_text. If that revision was compressed, we see binary gibberish instead of the expected text. With the old code this wasn't a problem since the most recent revision to be restored would always have come from cur in the first place and was thus uncompressed. It'll be necessary to use Article::getRevisionText() on the entry to be placed into cur, breaking up the INSERT...SELECT into two queries.
Created attachment 55 [details] Patch for undeletion of paticular revisions. (HEAD) update; $wgCompressedRevisions compatible I verified this had undeleted revisions with ar_flags='gzip' correctly. If this patch is OK, I create a new patch for REL1_3.
Committed patch to CVS HEAD; needs testing on PostgreSQL.
Created attachment 78 [details] Patch for undeletion of paticular revisions. (REL1_3) backport to REL1_3.
If this feature is on by default I fear it may result in widespread history "corruption", and histories not compatible with GFDL. Ideally all revisions should be undeleted or at least listed, but those you'd want to ommit could be hidden or the content deleted. At the very least some log should show that (and what) revisions were not undeleted.
(In reply to comment #16) > Ishould be undeleted or at least listed, but deally all revisions > those you'd want to ommit could be hidden or the content deleted. I think hiding paticular version may cause probrem that differencial shows content that has Copyright or various problem.
(In reply to comment #16) > If this feature is on by default I fear it may result in widespread history > "corruption", and histories not compatible with GFDL. This feature is certainly not a panacea. For certain pages, this feature should not be applied. But it solves quite a few problems. The problem with GFDL compliance could be achieved simply by following a certain procedure like this: 1. undelete the whole thing 2. revert to the last version before the copyvio is inserted. 3. delete. 4. undelete the latest version, and the versions before the copyvio is inserted. By doing this, we can legitimately omit the history info. in between. Of course, if history info. can be preserved while the main text is hidden, that would be more useful. But it is quite helpful as it is now.
I think GFDL compliance probrem with history information may be solved with decision of a suitable usage policy. (In reply to comment #18) > The problem with GFDL compliance could be achieved simply by following a certain procedure like this: > 1. undelete the whole thing > 2. revert to the last version before the copyvio is inserted. > 3. delete. > 4. undelete the latest version, and the versions before the copyvio is inserted. I propose following usage policy: If targer article previously reverted to the last before problem occures, only some edition from "Problem occured version" to "Revertion to the last before problem occures" must not be undeleted. If target article not reverted to the last before problem occures, "Probrem occured version" and after that must not be undeleted. The policy above is pretty simple, but I think it is important.
Some requirements for this general type of feature: 1. Legal compliance by not continuing to distribute specific revisions 2. Legal record keeping by never permanently deleting any revision (so a situation which does result in subsequent legal action doesn't end up with all records either party needs no longer available from an independent source) 3. Showing an accurate history which includes any problematic items in that history but without the copyright infringement or other problem. 4. Easy and complete reversibility of any action This feature set is best achieved by a per-revision hide flag which can be turned on or off and which allows anyone with the right account setting to see the hidden articles in their proper context (sysop flag seen by any person with sysop in their user rights, for example). Also avoids problems with the various compression features, either gzip or diff-based, since nothing is actually being deleted or restored. Doesn't seem like a good idea to ship this in 1.4, since it's not really the right approach to the problem.
A basic issue with the suggestion in comment #20 is that we distribute database dumps. If the reason we're deleting individual revisions is because we can't legally distribute them, then not separating them creates a problem.
Regarding Jamesday's comment; I agree that recordkeeping is sometimes important, but it can be easily done using XML export. It is not a reason to delay the introduction of this feature. Some articles dealing with controversial subjects (sexal, religious, etc) receive what seem to be an intentional copy-n-paste of copyrighted materials just so that the articles get deleted. Occasionally, we get things on Main Page and articles with a very long history. It is very hard to choose between deleting them altogether and bearing legal risks. I know this is not a news to Jamesday, but Japanese ISP liability law makes people like Wikipedia admins liable for not deleting obvious infringement that admins know.
I checked with some admins and looked at the lists of versions to be deleted. The requests dates as far back as a year, and there are over 1,000 versions to be deleted on Japanese Wikipedia. Not everything could be taken care of by this feature, but this (or XML Import) would help a lot.
(In reply to comment #20) > This feature set is best achieved by a per-revision hide flag which can be > turned on or off and which allows anyone with the right account setting to see > the hidden articles in their proper context (sysop flag seen by any person with > sysop in their user rights, for example). Also avoids problems with the various > compression features, either gzip or diff-based, since nothing is actually being > deleted or restored. I think this feature occures a probrem that diff shows content contains various probrem. The patch is restoring particular revisions from archive table to old and cur tables, so other revisions remain in archive table. If somebody (lawyer or ISP) want content to be deleted, sysops can restore them in order to respond to a request. (If the restore to respond to request can be marked temporaly restore and can re-delete only temporaly restored revisions, that's best.) I also think following feature is helpful: Each entries in archive table have "Protected" flag that the revision be marked for preventing being permanently deleted from archive table.
Jamesday suggested to me on IRC to encrypt individual versions instead of deleting them. As fas as I undestand (and that's not saying much) this would make thing easier, because no info is actually lost, and no records need to be deleted, so ther's no mess up in the IDs. Just the content of a single field in the database is updated (and possibly something like "eccrypted" could be added to the log-string in the version entry, but that's not neccessary). I like that idea very much, because it is simple, transparent and easy to undo. Also, it would untie this bug from #603, as far as I can see. If the software had access to the secret key, too, it could make the encrypted version visible for admins, etc. But that's not really neccessary. The most important thing is that texts that have been put on the wikipedia in violation of copyrights are no longer publically accessible.
This was applied on 1.4+ some time ago. Resolving as fixed.