Last modified: 2014-08-04 15:51:03 UTC
I run mwdumper on enwiki-20100622-pages-articles.xml, it crashes after 310.000 pages with: ERROR 1064 (42000) at line 6034: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''{{Infobox CanadianMP | name= Dean Allison\n| image = \n| term_start=October 4, ' at line 1 java -jar mwdumper.jar --format=sql:1.5 enwiki-20100622-pages-articles.xml --filter=latest | mysql -u user -ppassword -D wikidb --default-character-set=utf8 -f Please advise, I couldn't find a workaround for this. Thank you! Vasile Ceteras
Sounds like a bad character escaping. Adding to my list to test when I get an updated data set downloaded on my big test comp...
The jar from http://download.wikimedia.org/tools/ is out of sync with trunk. Try to load the source from trunk and run with that. See r12972 for the escaping fix.
Vasile Ceteras: Is this still an issue if you load the source from trunk?
I'm really sorry, but I'm not a Java developer, so I couldn't compile the source. I've got the svn checkout, but ant can't compile it. I'm running Centos 6, 64bit. svn checkout http://svn.wikimedia.org/svnroot/mediawiki/trunk/mwdumper/ ... Checked out revision 115751. cd mwdumper ant ... compile: [javac] Compiling 39 source files to /home/vceteras/Downloads/mwdumper/mwdumper/bin [javac] /home/vceteras/Downloads/mwdumper/mwdumper/src/org/mediawiki/dumper/gui/DumperGui.java:253: annotations are not supported in -source 1.4 [javac] (use -source 5 or higher to enable annotations) ... I edit build.xml at line 19: source="1.5"' and line 20 target="1.5" . Now I get this at the end: "[javac] 100 errors" . Lots of packages do not exist, and two "cannot find symbol" errors. I apologise for the late answer, I've been terribly busy and couldn't get any of my co-workers to help either. It would be really great if someone could compile mwdumper for everyone who wants to use it without learning java.