Last modified: 2014-05-23 23:18:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T28122, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 26122 - No way to get the ID of a deleted page from deletion logs
No way to get the ID of a deleted page from deletion logs
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
unspecified
All All
: Normal minor (vote)
: ---
Assigned To: Matthew Flaschen
: patch, patch-need-review
Depends on: 18104
Blocks: 27810
  Show dependency treegraph
 
Reported: 2010-11-25 20:34 UTC by Superyetkin
Modified: 2014-05-23 23:18 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Partial patch: records old page_id into log_page on page deletions (831 bytes, patch)
2010-12-01 21:52 UTC, Brion Vibber
Details

Description Superyetkin 2010-11-25 20:34:10 UTC
Using the "recentchanges" option, I can get the title of a deleted page but there is no way to learn the corresponding ID. Is there any workaround to this?
Comment 1 Superyetkin 2010-11-30 16:26:22 UTC
Any update?
Comment 2 Roan Kattouw 2010-11-30 16:27:19 UTC
Deleted pages, being deleted, don't have page IDs.
Comment 3 Superyetkin 2010-11-30 16:29:13 UTC
How come deleted pages have names but not IDs?
Comment 4 Roan Kattouw 2010-11-30 16:31:02 UTC
(In reply to comment #3)
> How come deleted pages have names but not IDs?

Because they don't exist any more, they've been deleted.
Comment 5 Superyetkin 2010-11-30 17:00:49 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > How come deleted pages have names but not IDs?
> 
> Because they don't exist any more, they've been deleted.

You do not seem to get what I mean here? The name of the deleted page is included in the API response, but pageID (now old) is not. What is the use of delete logs without having the ability to get pageID as well as pagename?
Comment 6 Roan Kattouw 2010-11-30 17:03:02 UTC
(In reply to comment #5)
> You do not seem to get what I mean here? The name of the deleted page is
> included in the API response, but pageID (now old) is not. What is the use of
> delete logs without having the ability to get pageID as well as pagename?
What would you do with that page ID? The page is no longer known by that ID in the page table or anywhere else.
Comment 7 Superyetkin 2010-11-30 17:08:09 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > You do not seem to get what I mean here? The name of the deleted page is
> > included in the API response, but pageID (now old) is not. What is the use of
> > delete logs without having the ability to get pageID as well as pagename?
> What would you do with that page ID? The page is no longer known by that ID in
> the page table or anywhere else.

I need the pageID as my logging script checks to see if a certain page is deleted and I do not prefer to use pagenames for efficiency reasons. The (old) pageID must be included in the API response for delete logs.
Comment 8 Roan Kattouw 2010-11-30 17:47:20 UTC
(In reply to comment #7)
> I need the pageID as my logging script checks to see if a certain page is
> deleted and I do not prefer to use pagenames for efficiency reasons. The (old)
> pageID must be included in the API response for delete logs.
If you want to do an existence check, use the title, not the page ID.
Comment 9 Superyetkin 2010-11-30 17:56:35 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > I need the pageID as my logging script checks to see if a certain page is
> > deleted and I do not prefer to use pagenames for efficiency reasons. The (old)
> > pageID must be included in the API response for delete logs.
> If you want to do an existence check, use the title, not the page ID.

It results in space inefficiency. Any legitimate reason for not including the old pageID in the API response for recent changes?
Comment 10 Bryan Tong Minh 2010-11-30 19:27:32 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > I need the pageID as my logging script checks to see if a certain page is
> > > deleted and I do not prefer to use pagenames for efficiency reasons. The (old)
> > > pageID must be included in the API response for delete logs.
> > If you want to do an existence check, use the title, not the page ID.
> 
> It results in space inefficiency. Any legitimate reason for not including the
> old pageID in the API response for recent changes?

Because it is not stored anywere. MediaWiki forgets the page id once a page is deleted.
Comment 11 Victor Vasiliev 2010-11-30 22:42:29 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > (In reply to comment #7)
> > > > I need the pageID as my logging script checks to see if a certain page is
> > > > deleted and I do not prefer to use pagenames for efficiency reasons. The (old)
> > > > pageID must be included in the API response for delete logs.
> > > If you want to do an existence check, use the title, not the page ID.
> > 
> > It results in space inefficiency. Any legitimate reason for not including the
> > old pageID in the API response for recent changes?
> 
> Because it is not stored anywere. MediaWiki forgets the page id once a page is
> deleted.

It is stored in ar_page_id as far as I am aware.
Comment 12 Superyetkin 2010-11-30 23:04:45 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > (In reply to comment #8)
> > > > (In reply to comment #7)
> > > > > I need the pageID as my logging script checks to see if a certain page is
> > > > > deleted and I do not prefer to use pagenames for efficiency reasons. The (old)
> > > > > pageID must be included in the API response for delete logs.
> > > > If you want to do an existence check, use the title, not the page ID.
> > > 
> > > It results in space inefficiency. Any legitimate reason for not including the
> > > old pageID in the API response for recent changes?
> > 
> > Because it is not stored anywere. MediaWiki forgets the page id once a page is
> > deleted.
> 
> It is stored in ar_page_id as far as I am aware.

There is no such key in the response array.
Comment 13 Roan Kattouw 2010-12-01 19:16:54 UTC
(In reply to comment #12)
> > It is stored in ar_page_id as far as I am aware.
> 
> There is no such key in the response array.
He meant the ar_page_id field in the database.

And yes, it's stored there, but it doesn't really have any meaning. It's not even used for restoring the page (although it should be; there's a separate bug about that).
Comment 14 Superyetkin 2010-12-01 19:21:16 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > > It is stored in ar_page_id as far as I am aware.
> > 
> > There is no such key in the response array.
> He meant the ar_page_id field in the database.
> 
> And yes, it's stored there, but it doesn't really have any meaning. It's not
> even used for restoring the page (although it should be; there's a separate bug
> about that).

The topic is related to an API request, so there is no need to say ar_page_id stores it since it is unreachable. This is a major issue and I cannot get why it is too hard to include this in the response. The unmeaningful pageID key (0 for deleted pages) is there, but old page ID is not. That is a shame!
Comment 15 Brion Vibber 2010-12-01 19:27:52 UTC
(In reply to comment #14)
> The topic is related to an API request, so there is no need to say ar_page_id
> stores it since it is unreachable.

It's actually a very useful thing to say -- it indicates to the other developers that yes, there is a way internally to get that information, and therefore that's how an implementation would be built to expose the information.

Please bear with us; these discussions sometimes take a while to get everyone on the same page!


A general note however: the original page_id number isn't locked to a page title as such, but is rather a property of each individual page *revision* that's been deleted.

There may be multiple past page IDs that have belonged to a given title. In fact, the same deleted page ID may be associated with multiple past titles, if it's had individual revisions deleted while it's been in the system if it's been renamed over time.

So depending on exactly what sort of request you're pulling, it might or might not be appropriate or straightforward to pass back a page ID from archive.ar_page_id.

superyetkin -- can you give an example of a recentchanges API request that you're doing, so we can confirm what format style is there and think about how it might be able to fit in there?
Comment 16 Roan Kattouw 2010-12-01 19:31:22 UTC
(In reply to comment #15)
> There may be multiple past page IDs that have belonged to a given title. In
> fact, the same deleted page ID may be associated with multiple past titles, if
> it's had individual revisions deleted while it's been in the system if it's
> been renamed over time.
> 
> So depending on exactly what sort of request you're pulling, it might or might
> not be appropriate or straightforward to pass back a page ID from
> archive.ar_page_id.
> 
Given these considerations, deleted page IDs are utterly useless AFAICT. If you have a use case other than "I want to save 20 bytes of bandwidth by not using titles", I'd love to hear it.
Comment 17 Brion Vibber 2010-12-01 19:48:00 UTC
A good use case would be performing cleanup on a page that was deleted without worrying about accidentally deleting a different page of the same name, since something pulling data from recentchanges would not be synchronous. Using title as primary key could end up referring to the wrong page, for instance if pages have been shuffled around and some of the pages deleted and recreated.

(It may be that this isn't a really suitable use for the recentchanges API; this sort of thing was definitely an issue in developing the OAI extension, which is actually designed to serialize the latest actions on wiki pages into a stream you can pull to build an up-to-date replica.)



Backing up to more general cases -- there *is* a log_page field which holds an optional page ID for referenced items in log events. For deletion events, this currently seems to store a 0. Storing the pre-deletion page ID, and exposing that through API requests for log events, probably makes a lot of sense.

Possibly that would/could/should be accessible through recentchanges as well, or possibly not.
Comment 18 Superyetkin 2010-12-01 19:57:52 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > There may be multiple past page IDs that have belonged to a given title. In
> > fact, the same deleted page ID may be associated with multiple past titles, if
> > it's had individual revisions deleted while it's been in the system if it's
> > been renamed over time.
> > 
> > So depending on exactly what sort of request you're pulling, it might or might
> > not be appropriate or straightforward to pass back a page ID from
> > archive.ar_page_id.
> > 
> Given these considerations, deleted page IDs are utterly useless AFAICT. If you
> have a use case other than "I want to save 20 bytes of bandwidth by not using
> titles", I'd love to hear it.

I cannot believe it! You are saying using page IDs just does not make sense? To answer your question: YES, I want to utilize space efficiency and prefer to use page IDs instead of pagenames. Satisfied?

Can you answer my previous question that if there is a logical reason why a defunct pageid (0 for deleted pages) is returned and not the old ID? Can you not see that my request is ONLY for delete logs?
Comment 19 Roan Kattouw 2010-12-01 20:41:48 UTC
(In reply to comment #18)
> Can you answer my previous question that if there is a logical reason why a
> defunct pageid (0 for deleted pages) is returned and not the old ID?
Yes, because:
* it's quite possible that there are multiple page IDs, or none at all (if the page never existed)
* there's barely any use for it/them, if at all
* we'd have to go through extra trouble (query an additional table) to get it

> Can you
> not see that my request is ONLY for delete logs?
I can see that just fine; I have eyes, you know. It doesn't matter much what it's for.
Comment 20 Superyetkin 2010-12-01 20:43:29 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > The topic is related to an API request, so there is no need to say ar_page_id
> > stores it since it is unreachable.
> 
> It's actually a very useful thing to say -- it indicates to the other
> developers that yes, there is a way internally to get that information, and
> therefore that's how an implementation would be built to expose the
> information.

What is the use for saying it if that piece of information is not accessible?
This is a bug report about the API, so we do not need to know it is stored in
the database, right? I do not think there is a single "developer" who think it
is not stored in the database!
Comment 21 Superyetkin 2010-12-01 20:47:01 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > Can you answer my previous question that if there is a logical reason why a
> > defunct pageid (0 for deleted pages) is returned and not the old ID?
> Yes, because:
> * it's quite possible that there are multiple page IDs, or none at all (if the
> page never existed)
> * there's barely any use for it/them, if at all
> * we'd have to go through extra trouble (query an additional table) to get it
> 
> > Can you
> > not see that my request is ONLY for delete logs?
> I can see that just fine; I have eyes, you know. It doesn't matter much what
> it's for.

We are not talking about pages that did not exist here. My quesy is about the delete logs, so this is off-topic. You do not seem to get what we are behind. You just cannot force people to store pages with name instead of ID. This is illogical.
Comment 22 Bryan Tong Minh 2010-12-01 21:01:33 UTC
(In reply to comment #20)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > The topic is related to an API request, so there is no need to say ar_page_id
> > > stores it since it is unreachable.
> > 
> > It's actually a very useful thing to say -- it indicates to the other
> > developers that yes, there is a way internally to get that information, and
> > therefore that's how an implementation would be built to expose the
> > information.
> 
> What is the use for saying it if that piece of information is not accessible?
> This is a bug report about the API, so we do not need to know it is stored in
> the database, right? I do not think there is a single "developer" who think it
> is not stored in the database!

This is a discussion forum not only to request features, but also on how to implement them. The fact that the deleted page id is stored in the database is not as obvious as it may seem to your misinformed mind. 

Also, please try to be respectful in your responses here; your current tone is absolutely disrespectful and arrogant towards the developers who are considering whether or not your request has merit. I might add to this that people are much more likely to fulfill a civil and respectful request, than one like yours.

If you are unable to do so, go somewhere else and live with the fact that your requested feature may not be implemented.
Comment 23 Superyetkin 2010-12-01 21:14:48 UTC
(In reply to comment #22)
> (In reply to comment #20)
> > (In reply to comment #15)
> > > (In reply to comment #14)
> > > > The topic is related to an API request, so there is no need to say ar_page_id
> > > > stores it since it is unreachable.
> > > 
> > > It's actually a very useful thing to say -- it indicates to the other
> > > developers that yes, there is a way internally to get that information, and
> > > therefore that's how an implementation would be built to expose the
> > > information.
> > 
> > What is the use for saying it if that piece of information is not accessible?
> > This is a bug report about the API, so we do not need to know it is stored in
> > the database, right? I do not think there is a single "developer" who think it
> > is not stored in the database!
> 
> This is a discussion forum not only to request features, but also on how to
> implement them. The fact that the deleted page id is stored in the database is
> not as obvious as it may seem to your misinformed mind. 
> 
> Also, please try to be respectful in your responses here; your current tone is
> absolutely disrespectful and arrogant towards the developers who are
> considering whether or not your request has merit. I might add to this that
> people are much more likely to fulfill a civil and respectful request, than one
> like yours.
> 
> If you are unable to do so, go somewhere else and live with the fact that your
> requested feature may not be implemented.

I am also a developer and know what respect means, so you are the last person to "teach" me what moral values mean. However, I can tell that your tone does not seem friendly, so you need to watch your words before they come to your mouth.

I am not begging for anything here, but it is just unreasonable to include the meaningless pageid parameter for deleted pages in the API response. Also, you cannot explain how you would use the archive ID because it has no meaning for API requests.
Comment 24 Brion Vibber 2010-12-01 21:28:38 UTC
This is a place of business where people do work for the common benefit; please limit yourself to polite, on-topic discussion or your Bugzilla account will be suspended.

Reassigning priority to "minor". Reopening bug; the Bugzilla account of the submitter has been blocked so that productive comments can still be added.
Comment 25 Brion Vibber 2010-12-01 21:36:46 UTC
Here's a sample API recentchanges query that pulls log entries:

http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rctype=log&rcprop=loginfo

A page deletion comes up like this:

      <rc type="log" logid="33091010" logtype="delete" logaction="delete">
        <param />
      </rc>

which indeed isn't super detailed.

(The log_params field is not used for these entries, nor apparently is there special handling as there is for move entries in the API output format.)


There's also the logevents query, which provides a different format:

http://en.wikipedia.org/w/api.php?action=query&list=logevents&lelimit=200

      <item logid="33091102" pageid="0" ns="10" title="Template:Le Tourment Vert" type="delete" action="delete" user="WOSlinker" timestamp="2010-12-01T21:35:01Z" comment="[[WP:CSD#T3|T3]]: Unused, redundant template" />

This looks like it would directly expose the log_page ID value if it were stored at delete logging time.
Comment 26 Brion Vibber 2010-12-01 21:52:19 UTC
Created attachment 7882 [details]
Partial patch: records old page_id into log_page on page deletions

This is a quick patch which moves the resetting of the article id on the title object from before to after the log entry saving in Article::doArticleDelete().

With this change in, the old page ID now gets stored into log_page in the logging table record instead of it recording 0.

However, the API logevents query does not appear to be using that value in its output; in fact it appears to pull whatever the current page ID for the logged title is, regardless of what's recorded as log_page. (Eg, if you create a new page with the same title, logevents shows you the page ID of the *new* page on all log entries for the old page, even those that recorded a different, older page ID.)
Comment 27 Roan Kattouw 2010-12-02 13:58:13 UTC
(In reply to comment #26)
> Created attachment 7882 [details]
> Partial patch: records old page_id into log_page on page deletions
> 
> This is a quick patch which moves the resetting of the article id on the title
> object from before to after the log entry saving in Article::doArticleDelete().
> 
Have you tested this in the UI too? At first sight it looks like this would produce a blue link rather than a red link in Special:Log and RC.

> However, the API logevents query does not appear to be using that value in its
> output; in fact it appears to pull whatever the current page ID for the logged
> title is, regardless of what's recorded as log_page. (Eg, if you create a new
> page with the same title, logevents shows you the page ID of the *new* page on
> all log entries for the old page, even those that recorded a different, older
> page ID.)
That's an interesting bug, lemme look at that.
Comment 28 p858snake 2011-04-30 00:10:07 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 29 Sumana Harihareswara 2011-11-09 16:26:20 UTC
Inferring from discussion that Brion's patch still needs further review, so, +need-review.  And Roan, Brion, did you open a new bug re the API logevents issue regarding page ID vs log_page, or fix it?
Comment 30 Sumana Harihareswara 2012-06-13 19:01:19 UTC
Brad, can you take a look at this?
Comment 31 Aaron Halfaker 2013-11-14 22:51:38 UTC
I folks.  I just came across this bug.  I figured it might be helpful if I added my own use-case.  

I'm a research scientist working in the analytics team at the WMF.  I'm working with a the Growth team to redesign page creation for newcomers and we'd like to understand how page creation/deletion worked historically.  To do that, I'm trying to reconstruct the history of page creations, deletions and moves.   

In order to track edits to pages that lead up to deletion, I'm using a combination of the Wikipedia API, the archive table and the logging table.  Because of this bug, I'm unable to join archive revisions with their "delete" event without matching both titles and a range of timestamps (which is slow to say the least).  This isn't just more difficult and time consuming, it's also much more error prone.  

Storing the ID of the page at time of deletion in the log_page field would resolve this issue for me.
Comment 32 Gerrit Notification Bot 2014-02-15 05:21:43 UTC
Change 113523 had a related patch set uploaded by leucosticte:
Implement way to get the ID of a deleted page from deletion logs. WikiPage::doDeleteArticleReal will tell ManualLogEntry::insert() what the page_id is, so it can be stored in log_page; then ApiQueryLogEvents will provide that data.

https://gerrit.wikimedia.org/r/113523
Comment 33 Matthew Flaschen 2014-02-15 05:26:22 UTC
I just assigned this to myself today, and I've been working on it.  In the future, please coordinate.
Comment 34 Nathan Larson 2014-02-15 05:34:02 UTC
(In reply to Matthew Flaschen from comment #33)
> I just assigned this to myself today, and I've been working on it.  In the
> future, please coordinate.

Yeah, the assignment thing is kind of a no-win situation sometimes, because when I try to coordinate, some people will say, "NO! I'm working on it!" Then six months later of people bugging them periodically, still nothing. But I'll note you down as a non-cookie-licker for future reference.
Comment 35 Gerrit Notification Bot 2014-02-15 05:44:46 UTC
Change 113523 abandoned by leucosticte:
Implement way to get the ID of a deleted page from deletion logs. WikiPage::doDeleteArticleReal will tell ManualLogEntry::insert() what the page_id is, so it can be stored in log_page; then ApiQueryLogEvents will provide that data.

Reason:
matt's writing his own patch

https://gerrit.wikimedia.org/r/113523
Comment 36 Gerrit Notification Bot 2014-02-15 07:14:59 UTC
Change 113525 had a related patch set uploaded by Mattflaschen:
WIP: Store the page_id in the logging table for deletions.

https://gerrit.wikimedia.org/r/113525
Comment 37 Matthew Flaschen 2014-02-20 22:24:56 UTC
I no longer consider this a draft, and would appreciate reviews.  The commit should explain itself pretty well in the commit message and release notes, but here are a couple notes:

* This is the first usage of log_page in ApiQueryLogEvents. It always pulls the pageid out of log_page for deletions. 

Before, as Brion noted, it would use the page_id as of query time (if the page even exists then), which was wrong, since it was unrelated to the deletion action.  When the page didn't exist at query time (which of course is common), it would use 0 since the join found nothing.
* Special:Log works fine; it does not affect which links are red.
Comment 38 Gerrit Notification Bot 2014-05-23 23:15:58 UTC
Change 113525 merged by jenkins-bot:
Store page_id in logging table for deletions and make queryable

https://gerrit.wikimedia.org/r/113525

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links