Last modified: 2010-02-02 10:26:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T21476, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 19476 - OOM on getting metadata for some OGG files (metadata reading hits memory_limit)
OOM on getting metadata for some OGG files (metadata reading hits memory_limit)
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
OggHandler (Other open bugs)
unspecified
All All
: Normal major (vote)
: ---
Assigned To: Tim Starling
:
: 19870 20801 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-02 12:28 UTC by Andrew Garrett
Modified: 2010-02-02 10:26 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
patch to use ffmpeg2theora for metadata (3.28 KB, patch)
2009-11-03 23:39 UTC, Michael Dale
Details

Description Andrew Garrett 2009-07-02 12:28:20 UTC
Test case attached.

Steps to reproduce:

1. Open eval.php, and create an OggHandler object.
2. Set your memory limit below 50M
3. Call $OggHandler->getMetadata( null, '/path/to/test/case' );

Result:
PHP dies with OOM.

This is occurring on Wikimedia sites for *some* files with uncached metadata.

I did some research and debugging, and it always seems to die in _decodePageHeader, in File/Ogg.php. It seems to try and list the streams (which, in theory there should only be 5 or 6 of), storing the data as it goes. It then runs through the streams to generate aggregate data.

Using COUNT_RECURSIVE and no memory_limit, I counted the number of pieces of stream information stored in _streamList for the test case, and for the featured media of the day, which happened to be [[File:Eichmann_trial_news_story.ogg]]

> $h = new OggHandler; $m = $h->getMetadata( null, '/Users/andrew/En-The_Raven-wikisource.ogg' )
Class File_Ogg not found; skipped loading
Memory used: 50356180
Size of _streamList is 398175

> $h = new OggHandler; $m = $h->getMetadata( null, '/Users/andrew/Eichmann_trial_news_story.ogg' );
Class File_Ogg not found; skipped loading
Memory used: 7901476
Size of _streamList is 10662


RECOMMENDED RESOLUTION:

It makes the most sense to resolve this by aggregating whatever data is needed to be aggregated as the stream list is generated, rather than at the end.
Comment 1 Andrew Garrett 2009-07-02 12:29:39 UTC
Adding the attachment failed. The test case is available at http://en.wikisource.org/wiki/Media:En-The_Raven-wikisource.ogg
Comment 2 Platonides 2009-07-03 22:13:16 UTC
OOM can also happen within exif_read_data for jpegs with lengthy exif data.
Comment 3 Tim Starling 2009-08-04 05:49:25 UTC
*** Bug 19870 has been marked as a duplicate of this bug. ***
Comment 4 Brion Vibber 2009-09-25 16:57:27 UTC
*** Bug 20801 has been marked as a duplicate of this bug. ***
Comment 5 Brion Vibber 2009-09-25 16:57:56 UTC
Bumping this up from an enhancement...
Comment 6 Brion Vibber 2009-09-25 18:59:52 UTC
*** Bug 20811 has been marked as a duplicate of this bug. ***
Comment 7 Nemo 2009-10-01 21:48:43 UTC
I'm still experiencing the same problem described in bug 20811, also with a DjVu file (it's 40 MB, this one: http://www.archive.org/details/VocabolarioAccademiciCruscaEdi3Vol3).
Comment 8 Mike.lifeguard 2009-10-19 18:08:04 UTC
(In reply to comment #4)
> *** Bug 20801 has been marked as a duplicate of this bug. ***
> 

On this bug, note that even Special:WhatLinksHere/File:... fails:

http://meta.wikimedia.org/wiki/Special:WhatLinksHere/Image:Screencast_-_Spam_blacklist_introduction_and_COIBot_reports_-_small.ogg

No metadata should need to be loaded here at all, not even duration, which is apparently "needed" for the image description page. Same for pages where large files are linked from - they don't need file metadata, so shouldn't try to get it.

As well, if this metadata is so expensive to get we run out of memory, then it should be stored so it only needs to be done once on upload.
Comment 9 Andrew Garrett 2009-10-19 18:09:54 UTC
(In reply to comment #8)
> 
> As well, if this metadata is so expensive to get we run out of memory, then it
> should be stored so it only needs to be done once on upload.
> 

It is stored, but it obviously can't be if the processing failed.
Comment 10 Mike.lifeguard 2009-10-19 18:10:27 UTC
+mdale in case he can help :)
Comment 11 Platonides 2009-10-19 20:15:15 UTC
What about using -if available- an external program for that?
That would provide a more grained memory method. And wouldn't kill the whole page.
Comment 12 Michael Dale 2009-10-19 20:39:33 UTC
I recommend we use ffmpeg2theora --info command. It outputs the data in JSON and does seeking to the end of the file to get the duration (so is much faster) than oggz info type command that does a linear scan of the file and outputs non-structured data that would have to be parsed. Also ffmpeg2theora is a static binary so should be easier to deploy. I will create a patch. 
Comment 13 Michael Dale 2009-10-26 06:50:53 UTC
I created the patch to call out to ffmpeg2theora in r57933. But ffmpeg2theora does not list the offset time. So we have to "fix" the ffmpeg to ogg demuxer to know about stream offsets or use a different tool. 

Irregardless we should fix the php fallback solution to be less memory heavy. 
Comment 14 Michael Dale 2009-10-27 21:31:09 UTC
jan has patched ffmpeg2tehora, freed has deployed it, I will shortly push the updated ffmpeg2theora time grabbing code to deployment.
Comment 15 Michael Dale 2009-11-03 23:39:17 UTC
Created attachment 6748 [details]
patch to use ffmpeg2theora for metadata

here is a patch for the wmf-deployment branch. I never got clarity from anyone if we can push this out or not?
Comment 16 Nemo 2009-11-15 08:20:04 UTC
Ehm, can you apply the patch? I haven't been able to upload a file on Commons for two months, now... 
Comment 17 Michael Dale 2009-11-15 08:45:43 UTC
yea it would be good to get this applied and or review it and let me know what has to be changed.
Comment 18 Tim Starling 2009-12-30 08:57:10 UTC
Fixed in r60492.
Comment 19 Nemo 2010-02-02 09:47:12 UTC
We are at r61846 (https://wikitech.wikimedia.org/?diff=24985 ) but I still have the same problem described in bug 20811#c0 .
Comment 20 Alexandre Emsenhuber [IAlex] 2010-02-02 10:07:33 UTC
(In reply to comment #19)
> We are at r61846
This is the version of /branches/wmf-deployment, not /trunk/phase3; this doesn't mean that r60492 has been deployed yet.
Comment 21 Nemo 2010-02-02 10:26:43 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > We are at r61846
> This is the version of /branches/wmf-deployment, not /trunk/phase3; this
> doesn't mean that r60492 has been deployed yet.

Thank you. Sorry.
Anyway, the bug for djvu seems resolved at least for some files, see 20811#c6 .

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links