Last modified: 2013-06-18 13:26:46 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15721, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13721 - Data too long for column 'rev_comment'
Data too long for column 'rev_comment'
Status: RESOLVED FIXED
Product: Utilities
Classification: Unclassified
mwdumper (Other open bugs)
unspecified
All All
: Normal critical (vote)
: ---
Assigned To: JulesWinnfield-hu
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-12 20:31 UTC by Mohamed Magdy
Modified: 2013-06-18 13:26 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
truncate comment at 255 Bytes (1.87 KB, patch)
2010-04-04 20:57 UTC, Umherirrender
Details

Description Mohamed Magdy 2008-04-12 20:31:06 UTC
C:\dumper>java -client -classpath mwdumper.jar;mysql-connector-java-3.1.14/mysql
-connector-java-3.1.14-bin.jar org.mediawiki.dumper.Dumper "--output=mysql://127
.0.0.1/wikiar?user=usr&password=pass" "--format=sql:1.5" "D:\arwiki
-20080405-pages-articles.xml.bz2"
1,000 pages (25.65/sec), 1,000 revs (25.65/sec)
2,000 pages (20.713/sec), 2,000 revs (20.713/sec)
3,000 pages (24.385/sec), 3,000 revs (24.385/sec)
4,000 pages (24.352/sec), 4,000 revs (24.352/sec)
5,000 pages (25.293/sec), 5,000 revs (25.293/sec)
Exception in thread "main" java.io.IOException: com.mysql.jdbc.MysqlDataTruncati
on: Data truncation: Data too long for column 'rev_comment' at row 809
        at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
        at org.mediawiki.dumper.Dumper.main(Unknown Source)

mysql->5.0.50-enterprise-gpl-nt

C:\dumper>java -showversion
java version "1.6.0_04"
Java(TM) SE Runtime Environment (build 1.6.0_04-b12)
Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing)
Comment 1 Brion Vibber 2008-04-14 19:29:38 UTC
Can you double-check that the proper encoding's being used?

The most compatible case is probably to use the binary schema. You may or may not have troubles with other modes.
Comment 2 Mohamed Magdy 2008-04-15 15:24:01 UTC
I have tried all three:
# Backwards-compatible UTF-8
# Experimental MySQL 4.1/5.0 UTF-8
# Experimental MySQL 4.1/5.0 binary
But I get the exact error (I drop the db then reinstall mw).. does mwdumper has some encoding schema setting I should change?
Comment 3 Mohamed Magdy 2009-03-23 17:40:53 UTC
I found that the error isn't from mwdumper but from the data dumps. the problem is that it is trying to put too much data and the column type is small. when i changed rev_comment from tinyblob to blob..it imported without errors. should it be changed in mediawiki or what?
Comment 4 Christopher Sahnwaldt 2009-10-16 23:40:27 UTC
As a workaround until the dump is fixed, mwdumper should make sure that a comment is at most 255 bytes long and truncate it if necessary. I implemented this fix and checked it in at http://dbpedia.svn.sourceforge.net/viewvc/dbpedia?view=rev&revision=1771 . Seems to fix that problem for me. Feel free to copy that code back to mediawiki if you want.
Comment 5 Brion Vibber 2009-10-20 00:09:12 UTC
Ahhhh ok I think I see the base issue -- if a 2-byte or 3-byte char is cut off at the 255-byte boundary when stored, it becomes an invalid char. The XML dump outputter runs UTF-8 validation and turns the bad char into a valid U+FFFD ... which is 3 bytes of UTF-8, over the 255-char limit again.

Yeah, this should be fixed in our DB and MediaWiki should be smarter about truncation, but in the meantime it should be easy to make mwdumper smarter for this too.
Comment 6 Umherirrender 2010-04-04 20:57:09 UTC
Created attachment 7263 [details]
truncate comment at 255 Bytes

It also works when you append

&jdbcCompliantTruncation=false

to the --output parameter.

But I have also add a patch to truncate the comment. Based on the implementation of Christopher Sahnwaldt (comment 4).
Comment 7 Christopher Sahnwaldt 2010-04-07 18:54:17 UTC
Minor gripe: the patch uses String.isEmpty(), which was only added in JDK 1.6. Maybe use String.length() == 0 instead, so MWDumper still compiles under 1.5.
Comment 8 JulesWinnfield-hu 2012-10-31 01:17:18 UTC
gerrit Ieff7eba1
Comment 9 Andre Klapper 2012-11-02 10:54:04 UTC
This doesn't suddently a blocker after 3 years of existence... :)
Comment 10 JulesWinnfield-hu 2012-11-02 16:56:49 UTC
(In reply to comment #9)
Why it isn't? Mwdumper can't be used to import dumps because of this bug.
Comment 11 Umherirrender 2012-11-03 17:28:58 UTC
See comment 6 for a workaround
Comment 12 JulesWinnfield-hu 2012-11-03 21:33:44 UTC
(In reply to comment #11)
Unfortunately it works only for the jdbc connector, and it's not a solution for the sql output, is it?
Comment 13 Umherirrender 2012-11-05 17:34:26 UTC
(In reply to comment #12)
> (In reply to comment #11)
> Unfortunately it works only for the jdbc connector, and it's not a solution for
> the sql output, is it?

Yes, that is true. For the raw sql this is not a solution.
Comment 14 JulesWinnfield-hu 2012-11-05 21:00:30 UTC
(In reply to comment #11)
> See comment 6 for a workaround
Didn't work for me. Still gives Data too long for column 'rev_comment'.
Comment 15 JulesWinnfield-hu 2012-11-14 13:37:13 UTC
Gerrit change Ic078f6ee.
Comment 16 Lupo 2012-12-20 07:31:31 UTC
Chane Ic078f6ee is merged now.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links