Last modified: 2014-01-03 16:08:21 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T24390, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 22390 - Purge foreign pages using an image/media file where this data is available
Purge foreign pages using an image/media file where this data is available
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
GlobalUsage (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Aaron Schulz
: platformeng
: 22073 42582 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-02-05 03:15 UTC by Andrew Garrett
Modified: 2014-01-03 16:08 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Andrew Garrett 2010-02-05 03:15:53 UTC
When GlobalUsage is available, and a new version of a file is uploaded, in theory we could do some magic to purge the pages using that image on foreign wikis (in Squid at the very least).
Comment 1 Bryan Tong Minh 2010-03-20 23:07:46 UTC
*** Bug 22073 has been marked as a duplicate of this bug. ***
Comment 2 Andre Klapper 2012-12-19 17:04:16 UTC
*** Bug 42582 has been marked as a duplicate of this bug. ***
Comment 3 Bawolff (Brian Wolff) 2013-02-07 16:38:45 UTC
So just to clarify this bug - the bad that happens:
*Person moves a file at commons. The url of the media file now changes. Pages on client wiki that uses that file will be broken until such a time as the pages get re-rendered. (An alternative or complementary solution would be to give an HTTP redirect for those old urls. This would make life nicer for hotlinkers)
*Person uploads a new version of an image file with different dimensions. Because file urls only have the width in them, the height of the corresponding thumb changes. However the height attribute on the <img> tag won't change until next time that page gets re-rendered.
*Slightly separate issue but related (This was bug 22073): User edits the description page on commons, we would want to purge the memcache entries for this description page (Slightly complicated because we don't know what languages this page has been cached in [varies by userlanguage], but we could probably take a good guess based on image usage).

For the first two points, we need to somehow do the equivalent of a cross-wiki HTMLCacheUpdate of the globalusage tables. This is the more serious issue imo. I'm increasing the priority to normal since this actively causes broken images in articles (albeit temporarily). On a third party wiki host, using foreign repos but not 404-image-render-handlers, this would probably cause even more serious breakage.

For the third point (which is kind of a separate issue), we need to clear some memcache entries (Somehow figuring out which ones are appropriate. Clearing content language of where image is used is probably good enough for now), and possibly squid/varnish cache.

A further clarification point, to be explicit, is that this bug does *not* involve showing old images being to the user, only broken images.
Comment 4 Bawolff (Brian Wolff) 2013-08-20 20:39:06 UTC
(In reply to comment #3)
> So just to clarify this bug - the bad that happens:
> *Person moves a file at commons. The url of the media file now changes. Pages
> on client wiki that uses that file will be broken until such a time as the
> pages get re-rendered. (An alternative or complementary solution would be to
> give an HTTP redirect for those old urls. This would make life nicer for
> hotlinkers)

I'm sure there's a more specific bug, but I can't find it, so I should mention I submitted a patch for the http redirect thing https://gerrit.wikimedia.org/r/80135

That does not solve this bug (only makes it a little less severe). We still need to solve this bug for the below case:

> *Person uploads a new version of an image file with different dimensions.
> Because file urls only have the width in them, the height of the
> corresponding
> thumb changes. However the height attribute on the <img> tag won't change
> until
> next time that page gets re-rendered.
Comment 5 Bawolff (Brian Wolff) 2013-09-08 14:17:21 UTC
(In reply to comment #4)

> 
> I'm sure there's a more specific bug, but I can't find it, so I should
> mention
> I submitted a patch for the http redirect thing
> https://gerrit.wikimedia.org/r/80135
> 

I forget, that that won't make the full sized image url redirect. Oh well, still a step in the right direction. Most of the issues are with thumbnails anyways.
Comment 6 Gerrit Notification Bot 2013-11-26 07:14:14 UTC
Change 97659 had a related patch set uploaded by Aaron Schulz:
Added support for purging backlinks in the wiki farm

https://gerrit.wikimedia.org/r/97659
Comment 7 Gerrit Notification Bot 2013-12-05 18:14:02 UTC
Change 97659 merged by jenkins-bot:
Added support for purging backlinks in the wiki farm

https://gerrit.wikimedia.org/r/97659
Comment 8 Bawolff (Brian Wolff) 2013-12-05 18:16:20 UTC
(In reply to comment #7)
> Change 97659 merged by jenkins-bot:
> Added support for purging backlinks in the wiki farm
> 
> https://gerrit.wikimedia.org/r/97659

Woo! Thanks Aaron. I guess we should not mark this bug as fixed until the added setting is enabled on commons.
Comment 9 Gerrit Notification Bot 2013-12-12 21:30:29 UTC
Change 101106 had a related patch set uploaded by Aaron Schulz:
Cross-wiki backlink purging for commons file changes

https://gerrit.wikimedia.org/r/101106
Comment 10 Gerrit Notification Bot 2013-12-17 19:07:43 UTC
Change 101106 merged by jenkins-bot:
Cross-wiki backlink purging for commons file changes

https://gerrit.wikimedia.org/r/101106
Comment 11 Jean-Fred 2013-12-23 09:02:15 UTC
To split files, sysops need to delete it, undelete part of the history, rename, and undelete the rest. I always supposed this never made a big fuss because the 
articles using the file were not regenerated right away. Is this use case still safe?
Comment 12 Bawolff (Brian Wolff) 2013-12-23 09:14:01 UTC
(In reply to comment #11)
> To split files, sysops need to delete it, undelete part of the history,
> rename,
> and undelete the rest. I always supposed this never made a big fuss because
> the 
> articles using the file were not regenerated right away. Is this use case
> still
> safe?

Should be fine. If the file is used by more than 200,000 pages on a single wiki, we don't purge the pages using it on that wiki. In any case, I imagine most cases where you do this sort of thing are for files used on less than 500 pages, which would be an inconsequential amount of pages to purge.

As a technical point, the pages in question aren't actually regenerated immediately - what actually happens is they're marked as needing to be regenerated next time someone visits them.
Comment 13 Jean-Fred 2013-12-23 09:22:59 UTC
(In reply to comment #12) 
> In any case, I imagine most cases where you do this sort of thing
> are for files used on less than 500 pages, which would be an
> inconsequential amount of pages to purge.

But these pages (eg high-profile Wikipedia articles) containing a temporarily deleted file will be regenerated with a red link, right? Splitting is definitely not a long process but it may take a few minutes ; I just hope this will not cause editing communities to understandably come with pitches and forks to the sysop who defaced their article for a few minutes :-)
Comment 14 Bawolff (Brian Wolff) 2013-12-23 09:30:20 UTC
(In reply to comment #13)
> (In reply to comment #12) 
> > In any case, I imagine most cases where you do this sort of thing
> > are for files used on less than 500 pages, which would be an
> > inconsequential amount of pages to purge.
> 
> But these pages (eg high-profile Wikipedia articles) containing a temporarily
> deleted file will be regenerated with a red link, right? Splitting is
> definitely not a long process but it may take a few minutes ; I just hope
> this
> will not cause editing communities to understandably come with pitches and
> forks to the sysop who defaced their article for a few minutes :-)

Well the job queue isn't instant, and may take a couple minutes to get to the page, but ignoring that - This is just about regenerating those page's html. The image itself disappears the moment you delete it (and always has) since its url at upload.wikimedia.org goes away. The difference now would be instead of an <img> tag that doesn't render, the page might have a redlink for the image (Assuming the job queue is fast enough) for a couple minutes.

If people didn't notice and complain previously when you did this sort of thing, I don't think they'll start noticing now.
----

As an aside, the main thing that pops to mind here is we need a better mechanism for splitting histories :)
Comment 15 Jean-Fred 2013-12-26 13:30:00 UTC
(In reply to comment #14)
> The image itself disappears the moment you delete it (and always has) since
> its url at upload.wikimedia.org goes away.

Oh, does it? I had always assumed the thumb was cached for a while but I never actually checked that. Thanks for the information :)

> The difference now would be instead of
> an <img> tag that doesn't render, the page might have a redlink for the image
> (Assuming the job queue is fast enough) for a couple minutes.

> If people didn't notice and complain previously when you did this sort of
> thing, I don't think they'll start noticing now.

Perfect then ; thanks Brian for reassuring me :-)

> As an aside, the main thing that pops to mind here is we need a better
> mechanism for splitting histories :)

We sure do :-)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links