Last modified: 2014-02-12 23:40:01 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T23937, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 21937 - mwdumper uses too much memory
mwdumper uses too much memory
Status: REOPENED
Product: Utilities
Classification: Unclassified
mwdumper (Other open bugs)
unspecified
PC Windows XP
: Normal enhancement (vote)
: ---
Assigned To: Brion Vibber
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-12-24 00:14 UTC by Tisza Gergő
Modified: 2014-02-12 23:40 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tisza Gergő 2009-12-24 00:14:04 UTC
I tried to run the GUI version of the newest revision (r60229) of mwdumper under Java 6 update 17 on an Intel Core i7 with 3,25G RAM and WinXP SP3, and it gave this error:

Exception in thread "Thread-8" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.StringCoding.safeTrim(Unknown Source)
at java.lang.StringCoding.access$300(Unknown Source)
at java.lang.StringCoding$StringEncoder.encode(Unknown Source)
at java.lang.StringCoding.encode(Unknown Source)
at java.lang.String.getBytes(Unknown Source)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:493)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:603)
at com.mysql.jdbc.ByteArrayBuffer.writeStringNoNull(ByteArrayBuffer.java:544)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1638)
at com.mysql.jdbc.Connection.execSQL(Connection.java:2972)
at com.mysql.jdbc.Connection.execSQL(Connection.java:2902)
at com.mysql.jdbc.Statement.execute(Statement.java:529)
at org.mediawiki.importer.SqlServerStream.writeStatement(SqlServerStream.java:25)
at org.mediawiki.importer.SqlWriter.flushInsertBuffer(SqlWriter.java:195)
at org.mediawiki.importer.SqlWriter.bufferInsertRow(SqlWriter.java:184)
at org.mediawiki.importer.SqlWriter15.writeRevision(SqlWriter15.java:68)
at org.mediawiki.importer.PageFilter.writeRevision(PageFilter.java:67)
at org.mediawiki.dumper.ProgressFilter.writeRevision(ProgressFilter.java:56)
at org.mediawiki.importer.XmlDumpReader.closeRevision(XmlDumpReader.java:346)
at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:204)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)

According to the Java docs, default max heap size is 3/4 of the physical memory, that is, around 800M. Since a single revision is at most 2M, there is no reason for mwdumper to require that much space. (It ran on the huwiki full history dump, directly writing to the database.)
Comment 1 Tisza Gergő 2009-12-24 00:18:46 UTC
After manually raising the max heap size, it ran smoothly, unlike the older versions available from download.wikimedia.org which didn't even start. Is there any reason to recommend the broken old versions instead of a current one? ([[mw:MWDumper]] points to a third version attached in a bug report, which also didn't seem to work.)
Comment 2 Diederik van Liere 2011-01-29 20:16:13 UTC
The solution seems to be to increase the size of the heap as explained on http://www.mediawiki.org/wiki/Manual:MWDumper#Troubleshooting

I'll mark this bugs as Resolved and Worksforme, if the bugreporter feels that this is still an issue then please reopen the bug.
Comment 3 Bawolff (Brian Wolff) 2011-01-29 20:18:30 UTC
As a bigger question though - why does it need so much memory? Doesn't it interpert the dumps a little at a time, and thus shouldn't need all that much memory?
Comment 4 Tisza Gergő 2011-01-29 21:49:58 UTC
(In reply to comment #2)
> The solution seems to be to increase the size of the heap as explained on
> http://www.mediawiki.org/wiki/Manual:MWDumper#Troubleshooting

Yeah, I'm probably aware of that, since I was the one who added it there :)

The point, as Bawolff said, is that MWDumper should not need a default heap size of ~1GB when the largest revision is below 2MB. Either there is a memory leak, or something is done really inefficiently.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links