Last modified: 2012-08-04 20:48:37 UTC
Hello, all, I was recently installing MW at a site after a long time again. Platform: CentOS MediaWiki: 1.15.1 Database: PostgreSQL 8.1.11 While configuring it, it choke when I set $wgMaxCredits to 3 in LocalSettings.php. The exact error is attached below. The PostgreSQL error message made it apparent immediately that GROUP BY was not going to be the thing to solve what getCredits() was supposed to do, so I fixed the query to use DISTINCT ON. Here's the patch: $hideBit = Revision::DELETED_USER; // username hidden? - $sql = "SELECT {$userTable}.*, MAX(rev_timestamp) as timestamp + $sql = "SELECT + DISTINCT ON (user_id) + {$userTable}.*, + rev_timestamp as timestamp FROM $revTable LEFT JOIN $userTable ON rev_user = user_id WHERE rev_page = $pageId AND rev_user != $user AND rev_deleted & $hideBit = 0 - GROUP BY rev_user, rev_user_text, user_real_name - ORDER BY timestamp DESC"; + ORDER BY user_id, timestamp DESC"; if($limit > 0) { $sql .= ' LIMIT '.$limit; } if($offset > 0) { $sql .= ' OFFSET '.$offset; } The sad thing is, it won't work with MySQL, of course, and I have no idea on how you MW guys circumvent such stuff. This is the original error message: A database error has occurred Query: SELECT mwuser.*, MAX(rev_timestamp) as timestamp FROM revision LEFT JOIN mwuser ON rev_user = user_id WHERE rev_page = 6 AND rev_user != 2 AND rev_deleted & 4 = 0 GROUP BY rev_user, rev_user_text, user_real_name ORDER BY timestamp DESC Function: Article::getContributors Error: 1 ERROR: column "mwuser.user_id" must appear in the GROUP BY clause or be used in an aggregate function Backtrace: #0 /var/www/wiki/includes/db/Database.php(616): DatabasePostgres->reportQueryError('ERROR: column ...', 1, 'SELECT mwuser.*...', 'Article::getCon...', false) #1 /var/www/wiki/includes/Article.php(715): Database->query('SELECT mwuser.*...', 'Article::getCon...') #2 /var/www/wiki/includes/Credits.php(103): Article->getContributors() #3 /var/www/wiki/includes/Credits.php(65): Credits::getContributors(Object(Article), 2, true) #4 /var/www/wiki/includes/SkinTemplate.php(371): Credits::getCredits(Object(Article), 3, true) #5 /var/www/wiki/includes/OutputPage.php(968): SkinTemplate->outputPage(Object(OutputPage)) #6 /var/www/wiki/includes/Wiki.php(345): OutputPage->output() #7 /var/www/wiki/index.php(117): MediaWiki->finalCleanup(Array, Object(OutputPage)) #8 {main}
-need-review, -patch : Patch is not viable since it doesn't work on MySQL by the submitters own admissions -testme : This is used for old bugs when we aren't sure are still bugs
Oops, sorry about the testme tag... I guess I didn't read carefully enough. :-/ Anyways, having some modest interest in this being fixed, I'm going to persevere, ;-) The other two options would be to: - use named columns from mwuser and group by each and every one of them or - switch the join tables and make the other table a subselect - afterall, the query wants user data and just uses revisions as a means to weed records out Both (especially the other one, which is better in terms of performance, but more questionable in terms of compatibility, see query plans below) work in MySQL 4.1 and newer as well (http://dev.mysql.com/doc/refman/4.1/en/unnamed-views.html). Forgot to mention yesterday, both changes also fix for 18078 - it's very related. Hope this helped in any way. Kind regards, Grega -- The two queries: a) group by SELECT m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount, MAX(r.rev_timestamp) AS timestamp FROM $userTable AS m LEFT JOIN $revTable AS r ON r.rev_user = m.user_id WHERE r.rev_page = $pageId AND r.rev_user != $user AND r.rev_deleted & $hideBit = 0 GROUP BY m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount ORDER BY timestamp DESC b) subselect SELECT m.*, r.timestamp FROM $userTable AS m JOIN (SELECT rev_user, MAX(rev_timestamp) AS timestamp FROM $revTable WHERE rev_page = $pageId AND rev_user != $user AND rev_deleted & $hideBit = 0 GROUP BY rev_user ORDER BY timestamp DESC) AS r ON m.user_id = r.rev_user; What the PostgreSQL planner thinks of the two: wiki=# explain analyze select m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount, max(r.rev_timestamp) as timestamp from mwuser m left join revision r on r.rev_user = m.user_id where r.rev_page = 3 and r.rev_user != 1 and r.rev_deleted & 4 = 0 group by m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount order by timestamp desc; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Sort (cost=7.50..7.51 rows=1 width=112) (actual time=0.361..0.363 rows=1 loops=1) Sort Key: (max(r.rev_timestamp)) Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=7.48..7.49 rows=1 width=112) (actual time=0.344..0.346 rows=1 loops=1) -> Nested Loop (cost=0.00..7.47 rows=1 width=112) (actual time=0.068..0.296 rows=6 loops=1) Join Filter: (m.user_id = r.rev_user) -> Seq Scan on revision r (cost=0.00..6.33 rows=1 width=12) (actual time=0.031..0.114 rows=6 loops=1) Filter: ((rev_user <> 1) AND (rev_page = 3) AND (((rev_deleted)::integer & 4) = 0)) -> Seq Scan on mwuser m (cost=0.00..1.06 rows=6 width=104) (actual time=0.003..0.012 rows=6 loops=6) Total runtime: 0.521 ms (10 rows) wiki=# explain analyze select m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount, r.timestamp from mwuser m join (select rev_user, max(rev_timestamp) as timestamp from revision where rev_page = 3 and rev_user != 1 and rev_deleted & 4 = 0 group by rev_user order by timestamp desc) r on m.user_id = rev_user; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=6.39..7.48 rows=1 width=112) (actual time=0.244..0.250 rows=1 loops=1) Hash Cond: (m.user_id = r.rev_user) -> Seq Scan on mwuser m (cost=0.00..1.06 rows=6 width=104) (actual time=0.013..0.024 rows=6 loops=1) -> Hash (cost=6.37..6.37 rows=1 width=12) (actual time=0.191..0.191 rows=1 loops=1) -> Subquery Scan r (cost=6.36..6.37 rows=1 width=12) (actual time=0.181..0.186 rows=1 loops=1) -> Sort (cost=6.36..6.36 rows=1 width=12) (actual time=0.176..0.178 rows=1 loops=1) Sort Key: (max(revision.rev_timestamp)) Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=6.34..6.35 rows=1 width=12) (actual time=0.155..0.160 rows=1 loops=1) -> Seq Scan on revision (cost=0.00..6.33 rows=1 width=12) (actual time=0.027..0.122 rows=6 loops=1) Filter: ((rev_user <> 1) AND (rev_page = 3) AND (((rev_deleted)::integer & 4) = 0)) Total runtime: 0.385 ms (12 rows)
Applied a conditional group by in r59123
Need a lot more columns than the ones in the example: rolled back the change until I have time to dig in again.
*** Bug 22107 has been marked as a duplicate of this bug. ***
Added dependency to Bug 384 for tracking.
*** Bug 28193 has been marked as a duplicate of this bug. ***
Fixed in r87129.
Reverted in r87152: causes failures like this (seen when attempting to log in) SkinTemplate::makeTalkUrlDetails given invalid pagename User: Backtrace: #0 /var/www/trunk/includes/SkinTemplate.php(691): SkinTemplate->makeTalkUrlDetails('User:') #1 /var/www/trunk/includes/SkinTemplate.php(495): SkinTemplate->buildPersonalUrls(Object(OutputPage)) #2 /var/www/trunk/includes/OutputPage.php(1906): SkinTemplate->outputPage(Object(OutputPage)) #3 /var/www/trunk/includes/Wiki.php(402): OutputPage->output() #4 /var/www/trunk/index.php(146): MediaWiki->finalCleanup() #5 {main}
Done again in r87164.
Most of r87164 doesn't seem to have anything obvious to do with the query & postgres, but seems to be retooling User so it doesn't bother fetching full row data during later processing on the results, if you only ask for fields that were already provided. Could perhaps do with some regression test cases on things like this?
Grega Bremec, are you still running into this problem with MediaWiki 1.17.0?
This still exists up to 1.19 2011-09-26 22:00:48 UTC ERROR: function first(text) does not exist at character 102 2011-09-26 22:00:48 UTC HINT: No function matches the given name and argument types. You might need to add explicit type casts. 2011-09-26 22:00:48 UTC STATEMENT: SELECT /* WikiPage::getContributors 65.65.116.130 */ rev_user as user_id,rev_user_text AS user_name,FIRST(user_real_name) AS user_real_name,MAX(rev_timestamp) AS timestamp FROM "revision" LEFT JOIN mwuser ON ((rev_user = user_id)) WHERE rev_page = '1' AND (rev_user != 1) AND ((rev_deleted & 4) = 0) GROUP BY rev_user,rev_user_text ORDER BY timestamp DESC
As reported by hashar in bug 33686: Fixed in trunk by r98222. Richard, you can manually apply the patch to fix the issue (aka replace FIRST() by MIN()). Merged in REL1_18 with r108767. And indeed, as of r113993 I can't reproduce it with PostgreSQL anymore.