Last modified: 2010-05-15 15:33:31 UTC
References: http://bugzilla.wikipedia.org/show_bug.cgi?id=603 (delete/undelete cycle does not preserve oldid) http://bugzilla.wikipedia.org/show_bug.cgi?id=454 (Enotif 1.33) Introducing an abbreviation: last-visited revision (LVR; lvr) I would like to ask you something with respect to a suggestion to improve the recent-changes and page-history behaviour. Status: ====== As far as I understand the software and Brion, old_id is *not* a permanent and fixed identifier for a certain revision of a page. It is only valid for a while and under certain circumstances. However, the Email-Notification patch (Enotif) and some users request a "(diff-to-my-last-visited-revision)" (lvr-diff) link, which *is* implemented in the recent Enotif 1.33 patch coming this weekend - based on old_id - which works unless that lvr revision is deleted in the databases e.g. by scheduled RC pruning. Problems: ======== 1) old_id cannot be used as 100%secure pointer to a certain revision (eg. LVR) of a page. 2) Currently, any older revision of a page is deleted after a while Question to you and proposal: ======================== Given, that the RC History may be pruned after a while and that old_id can change due to a delete/undelete cycle of that page, I propose to built an LVR-REPOSITORY (last-visited-revision), which can be compressed. If a certain pageX is watched by a UserZ, I herewith propose to permanently save the "last visited revision (lvr) of pageX". This is the page revision just before the UserZ got an enotif, because someone else edited the pageX to revision (lvr+1). Enotif 1.33 already has this implemented and knows (lvr), but does currently not save the page content. This pageX(lvr) must now neither be touched by regular RC history pruning nor by delete/undelete cycles and must be saved. To free memory resources, it needs theoretically only be saved *until* the watching UserZ visits the *current* revision of page - as this action automatically clears the notification flag (this mechanism being open for further improvements). In the worst case, we need a repository of size "total number of watched pages of all watching users". For example, if 1.000 users have 50 pages in each of their watchlists, we need a repository for 50.000 pages, which stores the "last-visited-revisions" for all watched pages for all users. Please let me know, how you think about my proposal. Enotif could manage the repository, as it keeps track of users visiting their watch-listed pages. The repository can be a separate database or realized as flag in the old and rc databases, which forbids the RC pruning or other routines to manipulate (eg. delete) that certain LVR. Invitation ======= If you have another idea, or if I have overlooked something, which can happen, please let me know this by mailto:mail@tgries.de?Subject=LVR . Thanks in advance Tom Berlin
I don't really see the need for this; as far as I know, old revisions are *never* deleted automatically in MediaWiki, and it is perfectly reasonable for manual deletions to prevent diffs being made against the deleted revisions (indeed, it would be something of a security glitch if it didn't, since it would allow any user to view deleted pages, which are normally considered restricted data). So all that is actually needed for an lvr-enabled watchlist (which is bug 536) is revision IDs that are guaranteed to last; the only currently identified impediments are therefore bug 181 (current revisions do not have a lasting ID in current DB schema), and bug 603 (old_id not preserved across delete-undelete cycle). Unless I'm missing something, such as plans to introduce automatic pruning of old revisions, I suggest closing this as invalid, or as a duplicate of bug 536. If there are plans for pruning, perhaps this could be rephrased as a blocker for that - i.e. if you are developing a pruning system, you need to mark lv revisions with a do_not_prune flag.
(In reply to comment #1) > I don't really see the need for this; as far as I know, old revisions are > *never* deleted automatically in MediaWiki I am not sure, whether you are right. How can one get to older revisions of a page, let's say (worst case), to the initial version ? This appears to be limited to a certain maximum "go-back" number (say, 500 as a maximum). When I use page history, I see that I can have 500 at maximum. But I see, what you mean. And if you are right, that currently all revision can be retrieved, than only the "keep permanent old_id" problem as in http://bugzilla.wikipedia.org/show_bug.cgi?id=603 (delete/undelete cycle does not preserve old_id) need to be solved. This weekend, I'll have a look to the 1.3.7 and CVS code, if everything fits together. Thank you you for your valuable comments ! Than I can closed this bugzilla and track only the other http://bugzilla.wikipedia.org/show_bug.cgi?id=536 . Provisionally, as it looks now, this 804 depends at least on 536, so I decided to set this dependency flag now. Tom
Dear Brion, instead of so ultra-quickly invalidating this bug without any comment, which could be regarded as unfriendly, please could you as the master brain please indicate and answer the herein-stated question: whether really ALL OLD VERSIONS are kept ? I am not sure. If, and only if really all old versions are kept, then this 804 is a 100% duplicate of bugzilla536 and can be deleted. We mini-developers cannot overlook all brion-vibber-features of MediaWiki and I would kindly inspire you to give understandable explanations - from which certainly many mini-developers can learn, don't you think so ? I guess, that anyone admires you and the co-workers and your admirable results, as I do, but the documentation of the MediaWiki code is something, which leaves sometimes doubts about its meaning and mechanisms, at least for me. Tom
Yes, every revision is kept in the old table.
(In reply to comment #4) > Yes, every revision is kept in the old table. Can you then please program (a.s.a.p.) a quick ad-hoc fix to the http://bugzilla.wikipedia.org/show_bug.cgi?id=603 (delete/undelete cycle doesn not prevserve old_id) problem ? Then everyone would be happy: - The Enotif patch can permanently point to a certain revision ("lvr") of a page - As stated elsewhere, the Enotif patch will very soon display the requested marker (on the "lvr" revision) - regardless whether the user actually enabled or disabled to receive mails (I hope, that this is clear: every user can define, if he/she wants to receive such MAILS. The "lvr" (or updated) marker is shown independently from sending the mail) - Several bugzillas can be closed, when I come up with the "lvr" marker as a by-product of Enotif . I guess, that closing this bugzilla is fine now, as the question a) is dealt with in bugzilla603 (old_id) and question b) is answered now by Brion (yes, all revisions are kept). I am happy now, really. So to summarise: Can you, Brion, program or propose a QUICK solution to the http://bugzilla.wikipedia.org/show_bug.cgi?id=603 problem ? We need permanent IDs. What's about the usage of md5() in this context, perhaps this leads to a solution: use md5(namespace:page_title:revision-id) as unique number ? I have made excellent experiences with using md5() in several of my other programs and I also know the md5() collision paper http://eprint.iacr.org/2004/199/ , but this discovery shouldn't be a problem for us. May I say "us" now ?) Tom
Just thought I'd clear up some misunderstandings: (In reply to comment #5) > We need permanent IDs. What's about the usage of md5() in this context, perhaps > this leads to a solution: use md5(namespace:page_title:revision-id) as unique > number ? There isn't really any need for a new ID: every revision currently has a unique key in the "old" table of the database (which in a future version will also include the current revision of each page). The only reason bug 603 exists is that the "archive" table doesn't include this information, only the timestamp and contents - so when a revision is undeleted, it is simply given a new, unique, value as its old_id. > When I use page history, I see that I can have 500 at maximum. I think you're misunderstanding the interface here: if you click the "next 50" link enough times, you will simply continue through the history until you reach the very first revision; the other links "(20 | 50 | 100 | 250 | 500)" are for setting how many revisions to show *on each page*. So clicking the "500" will go to a page with the first 500 revisions and - if there are more - a link labelled "next 500". [e.g. http://meta.wikimedia.org/w/wiki.phtml?title=Main+Page&action=history&limit=500&offset=0 - that page actually has somewhere over 1000 revisions stored] [btw, were it valid this would have blocked bug 536, not been blocked by it]
(In reply to comment #6) > (In reply to comment #5) > There isn't really any need for a new ID: every revision currently has a unique > key in the "old" table of the database (which in a future version will also > include the current revision of each page). The only reason bug 603 exists is > that the "archive" table doesn't include this information, only the timestamp > and contents - so when a revision is undeleted, it is simply given a new, > unique, value as its old_id. Thanky you Rowan for explaining. I also think, that you fully understand, what I need, perhaps others, too: a permanent revision number for a (namespace:page_title;revision), regardless where the content acutally is saved in the database tables. After an accidental or intendend deletion, this revision id must flagged as ("invisible, but in use") - thus not be given free ! - so that normal user accesses to that specific revision are prohibited --- until possibly this page revision is later undeleted by WikiAdmin. In this case, exactly that id needs to be re-born again. Can someone program this for the next release ? It would close many bugzillas at once ... Anyway, thank you so much for explaining. Tom Berlin P.S. Have you tried my Enotif patch, see http://bugzilla.wikipedia.org/show_bug.cgi?id=454 ?