Last modified: 2014-05-14 22:20:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T37367, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 35367 - 404 for old file versions on Wikimedia Commons with empty archive name
404 for old file versions on Wikimedia Commons with empty archive name
Status: NEW
Product: Wikimedia
Classification: Unclassified
Media storage (Other open bugs)
unspecified
All All
: Low major with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 41320 56218 60766 (view as bug list)
Depends on: 54776 24417 39615
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-20 21:15 UTC by emijrp
Modified: 2014-05-14 22:20 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description emijrp 2012-03-20 21:15:19 UTC
Hi all;

I'm trying to download Wikimedia Commons, but I have found some errors. For
example:
* oi_archive_name is empty for this file 
http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg#filehistory
* link is broken and you get an empty file
http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory

Are you aware of these errors in old files? Is this going to be fixed?

Regards,
emijrp
Comment 1 Sam Reed (reedy) 2012-03-20 21:29:53 UTC
(In reply to comment #0)
> Hi all;
> 
> I'm trying to download Wikimedia Commons, but I have found some errors. For
> example:
> * oi_archive_name is empty for this file 
> http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg#filehistory
> * link is broken and you get an empty file
> http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory
> 
> Are you aware of these errors in old files? Is this going to be fixed?
> 
> Regards,
> emijrp

It can only be fixed if said files exist in some backup/similar
Comment 2 Aaron Schulz 2012-03-20 21:31:33 UTC
It may still be on NFS, I've seen this in various places.
Comment 3 emijrp 2012-03-20 21:37:08 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > Hi all;
> > 
> > I'm trying to download Wikimedia Commons, but I have found some errors. For
> > example:
> > * oi_archive_name is empty for this file 
> > http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg#filehistory
> > * link is broken and you get an empty file
> > http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory
> > 
> > Are you aware of these errors in old files? Is this going to be fixed?
> > 
> > Regards,
> > emijrp
> 
> It can only be fixed if said files exist in some backup/similar

There are more errors like those ones, I didn't make a comprehensive list.
Comment 4 Platonides 2012-03-20 22:18:57 UTC
There are more bugs like this.
Comment 5 Mark A. Hershberger 2012-03-23 03:03:45 UTC
Just came across this on http://commons.wikimedia.org/wiki/File:Pyrenees_relief_map_with_rivers-fr.svg
Comment 6 Nemo 2013-09-29 08:45:07 UTC
Bawolff, do you have suggestions on how to break down this bug in actionable items?
We probably need the following:
1) some maintenance script to list files with each of the problems in question (oi_archive_name empty, archived versions linking "404 Not Found" etc.),
2) scripts or whatever to correct the wrong metadata (where that's the problem) or look for missing files in NFS and restore them,
3) bug to track the need to do something about the leftovers.

I'm downloading all the Commons files with emijrp's script, so we already have huge lists of suspects, e.g. https://archive.org/download/wikimediacommons-201208/2012-08-check.txt
Comment 7 Nemo 2013-09-29 08:45:30 UTC
(Data loss -> critical.)
Comment 8 Bawolff (Brian Wolff) 2013-09-29 13:30:13 UTC
Well the easiest to find would be everything select oi_name,  oi_timestamp from oldimage where oi_archive_name = ''; this could be done by anyone with labs

After that one can look in the thumbnail log. From what I've seen of it, its full of line about thumbnail failed due to missing src path (this seems to be the main cause of failing png thumbnails now that vips has removed the size limit on that format)

As an aside, It'd be nice if we graphed number of missing files somewhere in ganglia. Ancedotally it seems like there are more of them then there used to be. It would be good to get real stats on this very scary problem.
Comment 9 Bawolff (Brian Wolff) 2013-10-05 02:49:50 UTC
Btw, one probable cause of recent incidents may have been fixed - see bug 54736

See also related bug 54776
Comment 10 Aaron Schulz 2014-05-14 21:26:48 UTC
*** Bug 60766 has been marked as a duplicate of this bug. ***
Comment 11 Aaron Schulz 2014-05-14 22:17:15 UTC
*** Bug 41320 has been marked as a duplicate of this bug. ***
Comment 12 Aaron Schulz 2014-05-14 22:19:44 UTC
*** Bug 56218 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links