Last modified: 2008-01-04 09:49:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 12059 - Database collation settings of SMW sometimes do not match MW's
Database collation settings of SMW sometimes do not match MW's
Product: MediaWiki extensions
Classification: Unclassified
Semantic MediaWiki (Other open bugs)
PC Linux
: Normal critical (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2007-11-21 17:14 UTC by Michael W. Dietrich
Modified: 2008-01-04 09:49 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Michael W. Dietrich 2007-11-21 17:14:49 UTC
When running "php SMW_refreshData.php -v" whilst upgrading from 0.7.x (version from svn) after some pages have been processed properly I get the following error message and processing stops.

(116) Processing page with ID 1393 ...
Es gab einen Syntaxfehler in der Datenbankabfrage.
Die letzte Datenbankabfrage lautete: „ SELECT  DISTINCT `m3WDBpage`.page_title as title, `m3WDBpage`.page_namespace as namespace, `m3WDBpage`.page_id as id  FROM `prop9`, `m3WDBpage` INNER JOIN `m3WDBsmw_relations` AS rel8 ON rel8.subject_id=`m3WDBpage`.page_id LEFT JOIN `m3WDBredirect` AS rd10 ON rd10.rd_from=rel8.object_id  WHERE ((`m3WDBpage`.page_namespace='0') OR (`m3WDBpage`.page_namespace='6')) AND (prop9.title=rel8.relation_title AND ((rel8.object_title='Msg' AND rel8.object_namespace=0) OR (rd10.rd_title='Msg' AND rd10.rd_namespace=0)))  ORDER BY `m3WDBpage`.page_title ASC  LIMIT 51  “ aus der Funktion „<tt>SMW::getQueryResult</tt>“.
MySQL meldete den Fehler: „<tt>1267: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (latin1_general_ci,IMPLICIT) for
 operation '=' (localhost)</tt>“.  

If this can't be fixed I've to step back to a bakup version of the database and to 0.7.

As I'm aware, that the german language version of SMW is Not used to often i doubt that this bug will be critical to users of other language versions but german.
Comment 1 Markus Krötzsch 2007-11-23 09:08:11 UTC
This is the general "illegal collation" issue that some people get when importing DB-dumps. In short, MediaWiki does not set any collation to its tables, and neither does SMW. But sometimes a DB-import creates unwnated collation information for some of the tables (typically for the MediaWiki part), while the rest does not have the collations set. To fix this, you need to set the collation information for all tables/columns/dbs in the same way, see for details.

There is currently no automatic fix for that problem, but it is not a specific SMW1.0 software issue (the same things can happen in SMW0.7).

To find out about the collations of one table, use the mysql command "SHOW CREATE TABLE `tablename`;" To set the collation of one specific field, use something like "ALTER TABLE `tablename` CHANGE `subject_title` `subject_title` varchar(255) character set latin1 NOT NULL default '' collate latin1_bin;" where the rest of the declaration (varchar...) should be the same as before.
Comment 2 Markus Krötzsch 2007-11-23 09:30:17 UTC
Another, possibly simpler, workaround is the following:

* Check out the collation MediaWiki has now in your DB by using e.g. "SHOW
CREATE TABLE categorylinks;"

* Set this as a default collation for your database, e.g. "ALTER DATABASE yourdatabase CHARACTER SET latin1 COLLATE latin1_bin;"

* Delete all SMW tables, currently this is "DROP TABLE smw_relations, smw_attributes, smw_longstrings, smw_nary, smw_subprops, smw_specialprops, smw_nary_relations, smw_nary_attributes, smw_nary_longstrings;" You can always findout the names of all current tables by running SMW_setup.php  (or by pressing the Special:SMWAdmin button in the wiki).

* Then setup the tables again, running "php SMW_setup.php" (or using SMWAdmin)

* Then populate the tables again, running "php SMW_refreshdata.php -vp" and then "php SMW_refreshdata.php -v".

This should do the trick, but it obviously will break the wiki for some short time as long as the tables are gone. If all fails, switching off inline queries temporarily (see settings documented in SMW_GlobalSettings.php) should prevent major errors in wiki usage (i.e. normal pages will display, queries will remain empty).
Comment 3 Michael W. Dietrich 2007-11-23 17:11:23 UTC
Well everything works out well with your solution Markus - so thanks so far (probably you want to correct the typo in php SMW_refreshdata.php -vp and php SMW_refreshdata.php -v to php SMW_refreshData.php -vp and php SMW_refreshData.php -v as people under unixish environments might wonder why they can't just Drag/Drop/Execute your solutions commands directly from the web to a local console.


when Executing the "final" php SMW_refreshData.php -v Everythng works out well until suddenly.... (see console copy below).

(238) Processing page with ID 1525 ...
(239) Processing page with ID 1526 ...

Warning: DOMDocument::loadXML(): Input is not proper UTF-8, indicate encoding !
Bytes: 0xE4 0x69 0x73 0x63 in Entity, line: 6 in /var/wiki2/includes/Parser.php on line 2967
Parser::preprocessToDom generated invalid XML
#0 /var/wiki2/includes/Parser.php(3021): Parser->preprocessToDom('----?als [[Test...')
#1 /var/wiki2/includes/Parser.php(915): Parser->replaceVariables('----?als [[Test...')
#2 /var/wiki2/includes/Parser.php(295): Parser->internalParse('----?als [[Test...')
#3 /var/wiki2/extensions/SemanticMediaWiki/maintenance/SMW_refreshData.php(99): Parser->parse('----?als [[Test...', Object(Title),
 Object(ParserOptions), true, true, 2265)
#4 {main}    
Comment 4 Michael W. Dietrich 2007-11-26 07:14:41 UTC
Hi Markus,

heres another QS issue with  this

When trying to resolve the above mentioned probs myself I tried out this:

s-ddf-wiki:/var/wiki2/maintenance# php SMW_refreshData.php --help

Refreshing all semantic data in the database!
Processing pages from ID 0 to ID 2969 ... 

I thik it qould be better style if a script (even only a maintanance script) would write out some info instead of processing itself when called with the --help commandline parameter. Just a gusee

Comment 5 Markus Krötzsch 2007-12-28 14:38:24 UTC
I have now implemented two further features that should close this bug completely:

(1) SMW uses the "binary" keyword for creating tables, just as MediaWiki does. Not doing this before may have caused some collation differences between the two under certain conditions.

(2) The refresh script now provides an automated way to recreate all SMW database tables (just as in the manual solution above). The parameter to achive such a "full" refresh is called "-f". An example run is as follows:

%  php SMW_refreshData.php -vpf  &&  php SMW_refreshData.php -v

Note that a second refresh must be triggered, just as in the earlier solution. You can of course split both runs into two commands, and combine them with start (-s) and end (-e) settings to process the pages in chunks or in parallel.

This should close *this* bug (which is apparently not related to the last two comments that reopened it). Final comments to the above remarks:

* Re:#3: try using the -s and -e parameters to skip problematic pages in refreshing. Update them manually through the wiki. Check whether the reported problem relates to SMW at all (the error seems to originate in Parser.php, not in SMW).

* Re:#4: I agree, but cannot make myself care too much about it. Note that most (all?) other MediaWiki maintenance scripts behave just the same.

Feel free to file new bugs if you find that either of these issues needs further discussion.
Comment 6 Michael W. Dietrich 2008-01-03 17:30:40 UTC
After *some more investigation* in this it firstly seems that the behaviour that led to reopening this bug was discovered in smw with some 27xxx Build of mediawiki software and was somewhat healed with a 29xxx build of mediawiki from late December 2007. After installing another new build of smw 29xxx from Jan 2008 I still discovered some pages not visible in IE but only in firefox.

*FINDING THAT* a Teamplate named "ask" used in those pages lead to very long output which was refering to the content that was put ou merely nonsens in this versions of smw and mediawiki (as I do not want to doubt that it had some sense in former versions)

"FIXING THE WIKI* was as simple as opening the Teamplate ask *in edit mode* (because it did also not open in some Browsers because of the output length) e.g. by calling (where is the url of the wiki and m3WDB is a link to index.php) *AND* writing "noinclude" tags around the templates content. This made ALL Wikipages showable and editable again and did give the chance to correct those error symptoms described above.

Thanks for helping and I agree *Issue resolved/fixed*

Just let me comment on 

<cit>* Re:#4: I agree, but cannot make myself care too much about it. Note that most
(all?) other MediaWiki maintenance scripts behave just the same.</cit>

It was becaus MediaWiki behaved so less self explantory that you wrote smw - am I right? So what do we need semantics for if we do not even document our source code better that others do. *To suggest a solution*: If you where so kind sending me a explanation of all parameters implemented in the maintanance scripts I will go to the smw svn, check out the scripts and write those few lines  giving the /?-call some more documentative charme - What do you think about that?

regards and hapy new year
Comment 7 Markus Krötzsch 2008-01-04 09:49:53 UTC
Good to hear that the other problems are gone now. We SMW developers strongly discourage the use of universal "ask" templates (i.e. templates that query for everything in order to not have to use custom queries on different pages), but they tend to reappear at least on In most cases, special-purpose tempaltes with only one or two relevant queries are much more useful.

<cit> If you where so kind sending me a explanation of all parameters implemented in the maintanance scripts I will go to the smw svn, check out the scripts and write those few
lines giving the /?-call some more documentative charme - What do you think
about that?</cit>

Now I begin to understand the problem. The parameters are already explained *within* the scripts as part of the initial comments in each of them. If this explanation is not sufficient, feel free to ask via the mailinglists! Of course, having a --help parameter would not hurt either, and possibly one can even format the scripts in such a way that the current initial comment text is also the returned help text (i.e. bcomes an initial string definition instead of a comment, reducing effort for maintaining the docu). So feel free to do that change if you consider it helpful.

Thanks, and a late Happy New Year as well!


Note You need to log in before you can comment on or make changes to this bug.