Last modified: 2012-08-04 20:48:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T23196, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 21196 - $wgMaxCredits breaks MW on PostgreSQL
$wgMaxCredits breaks MW on PostgreSQL
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
1.20.x
All All
: Low major with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: need-unittest
: 22107 28193 (view as bug list)
Depends on:
Blocks: postgres
  Show dependency treegraph
 
Reported: 2009-10-19 23:19 UTC by Grega Bremec
Modified: 2012-08-04 20:48 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Grega Bremec 2009-10-19 23:19:10 UTC
Hello, all,

I was recently installing MW at a site after a long time again.

Platform: CentOS
MediaWiki: 1.15.1
Database: PostgreSQL 8.1.11

While configuring it, it choke when I set $wgMaxCredits to 3 in LocalSettings.php. The exact error is attached below.

The PostgreSQL error message made it apparent immediately that GROUP BY was not going to be the thing to solve what getCredits() was supposed to do, so I fixed the query to use DISTINCT ON. Here's the patch:

                $hideBit = Revision::DELETED_USER; // username hidden?
 
-               $sql = "SELECT {$userTable}.*, MAX(rev_timestamp) as timestamp
+               $sql = "SELECT
+                               DISTINCT ON (user_id)
+                               {$userTable}.*,
+                               rev_timestamp as timestamp
                        FROM $revTable LEFT JOIN $userTable ON rev_user = user_id
                        WHERE rev_page = $pageId
                        AND rev_user != $user
                        AND rev_deleted & $hideBit = 0
-                       GROUP BY rev_user, rev_user_text, user_real_name
-                       ORDER BY timestamp DESC";
+                       ORDER BY user_id, timestamp DESC";
 
                if($limit > 0) { $sql .= ' LIMIT '.$limit; }
                if($offset > 0) { $sql .= ' OFFSET '.$offset; }

The sad thing is, it won't work with MySQL, of course, and I have no idea on how you MW guys circumvent such stuff.

This is the original error message:

A database error has occurred
Query: SELECT mwuser.*, MAX(rev_timestamp) as timestamp
FROM revision LEFT JOIN mwuser ON rev_user = user_id
WHERE rev_page = 6
AND rev_user != 2
AND rev_deleted & 4 = 0
GROUP BY rev_user, rev_user_text, user_real_name
ORDER BY timestamp DESC
Function: Article::getContributors
Error: 1 ERROR: column "mwuser.user_id" must appear in the GROUP BY clause or be used in an aggregate function

Backtrace:

#0 /var/www/wiki/includes/db/Database.php(616): DatabasePostgres->reportQueryError('ERROR: column ...', 1, 'SELECT mwuser.*...', 'Article::getCon...', false)
#1 /var/www/wiki/includes/Article.php(715): Database->query('SELECT mwuser.*...', 'Article::getCon...')
#2 /var/www/wiki/includes/Credits.php(103): Article->getContributors()
#3 /var/www/wiki/includes/Credits.php(65): Credits::getContributors(Object(Article), 2, true)
#4 /var/www/wiki/includes/SkinTemplate.php(371): Credits::getCredits(Object(Article), 3, true)
#5 /var/www/wiki/includes/OutputPage.php(968): SkinTemplate->outputPage(Object(OutputPage))
#6 /var/www/wiki/includes/Wiki.php(345): OutputPage->output()
#7 /var/www/wiki/index.php(117): MediaWiki->finalCleanup(Array, Object(OutputPage))
#8 {main}
Comment 1 Chad H. 2009-10-20 16:29:45 UTC
-need-review, -patch : Patch is not viable since it doesn't work on MySQL by the submitters own admissions
-testme : This is used for old bugs when we aren't sure are still bugs
Comment 2 Grega Bremec 2009-10-20 19:36:46 UTC
Oops, sorry about the testme tag... I guess I didn't read carefully enough. :-/

Anyways, having some modest interest in this being fixed, I'm going to persevere, ;-)

The other two options would be to:

- use named columns from mwuser and group by each and every one of them or
- switch the join tables and make the other table a subselect - afterall, the query wants user data and just uses revisions as a means to weed records out

Both (especially the other one, which is better in terms of performance, but more questionable in terms of compatibility, see query plans below) work in MySQL 4.1 and newer as well (http://dev.mysql.com/doc/refman/4.1/en/unnamed-views.html).

Forgot to mention yesterday, both changes also fix for 18078 - it's very related.

Hope this helped in any way.

Kind regards,
Grega

--

The two queries:

a) group by

SELECT m.user_id, m.user_name, m.user_real_name, m.user_email,
       m.user_editcount, MAX(r.rev_timestamp) AS timestamp
    FROM $userTable AS m
    LEFT JOIN $revTable AS r ON r.rev_user = m.user_id
    WHERE r.rev_page = $pageId AND
          r.rev_user != $user AND
          r.rev_deleted & $hideBit = 0
    GROUP BY m.user_id, m.user_name, m.user_real_name,
             m.user_email, m.user_editcount
    ORDER BY timestamp DESC

b) subselect

SELECT m.*, r.timestamp
    FROM $userTable AS m
    JOIN (SELECT rev_user, MAX(rev_timestamp) AS timestamp
              FROM $revTable
              WHERE rev_page = $pageId AND
	            rev_user != $user AND
                    rev_deleted & $hideBit = 0
             GROUP BY rev_user
             ORDER BY timestamp DESC) AS r
    ON m.user_id = r.rev_user;

What the PostgreSQL planner thinks of the two:

wiki=# explain analyze select m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount, max(r.rev_timestamp) as timestamp from mwuser m left join revision r on r.rev_user = m.user_id where r.rev_page = 3 and r.rev_user != 1 and r.rev_deleted & 4 = 0 group by m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount order by timestamp desc;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=7.50..7.51 rows=1 width=112) (actual time=0.361..0.363 rows=1 loops=1)
   Sort Key: (max(r.rev_timestamp))
   Sort Method:  quicksort  Memory: 25kB
   ->  HashAggregate  (cost=7.48..7.49 rows=1 width=112) (actual time=0.344..0.346 rows=1 loops=1)
         ->  Nested Loop  (cost=0.00..7.47 rows=1 width=112) (actual time=0.068..0.296 rows=6 loops=1)
               Join Filter: (m.user_id = r.rev_user)
               ->  Seq Scan on revision r  (cost=0.00..6.33 rows=1 width=12) (actual time=0.031..0.114 rows=6 loops=1)
                     Filter: ((rev_user <> 1) AND (rev_page = 3) AND (((rev_deleted)::integer & 4) = 0))
               ->  Seq Scan on mwuser m  (cost=0.00..1.06 rows=6 width=104) (actual time=0.003..0.012 rows=6 loops=6)
 Total runtime: 0.521 ms
(10 rows)

wiki=# explain analyze select m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount, r.timestamp from mwuser m join (select rev_user, max(rev_timestamp) as timestamp from revision where rev_page = 3 and rev_user != 1 and rev_deleted & 4 = 0 group by rev_user order by timestamp desc) r on m.user_id = rev_user;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=6.39..7.48 rows=1 width=112) (actual time=0.244..0.250 rows=1 loops=1)
   Hash Cond: (m.user_id = r.rev_user)
   ->  Seq Scan on mwuser m  (cost=0.00..1.06 rows=6 width=104) (actual time=0.013..0.024 rows=6 loops=1)
   ->  Hash  (cost=6.37..6.37 rows=1 width=12) (actual time=0.191..0.191 rows=1 loops=1)
         ->  Subquery Scan r  (cost=6.36..6.37 rows=1 width=12) (actual time=0.181..0.186 rows=1 loops=1)
               ->  Sort  (cost=6.36..6.36 rows=1 width=12) (actual time=0.176..0.178 rows=1 loops=1)
                     Sort Key: (max(revision.rev_timestamp))
                     Sort Method:  quicksort  Memory: 25kB
                     ->  HashAggregate  (cost=6.34..6.35 rows=1 width=12) (actual time=0.155..0.160 rows=1 loops=1)
                           ->  Seq Scan on revision  (cost=0.00..6.33 rows=1 width=12) (actual time=0.027..0.122 rows=6 loops=1)
                                 Filter: ((rev_user <> 1) AND (rev_page = 3) AND (((rev_deleted)::integer & 4) = 0))
 Total runtime: 0.385 ms
(12 rows)
Comment 3 Greg Sabino Mullane 2009-11-15 23:31:51 UTC
Applied a conditional group by in r59123
Comment 4 Greg Sabino Mullane 2009-11-16 13:41:36 UTC
Need a lot more columns than the ones in the example: rolled back the change until I have time to dig in again.
Comment 5 Alexandre Emsenhuber [IAlex] 2010-01-15 14:58:12 UTC
*** Bug 22107 has been marked as a duplicate of this bug. ***
Comment 6 Stoyan Angelov 2010-01-15 15:28:21 UTC
Added dependency to Bug 384 for tracking.
Comment 7 Alexandre Emsenhuber [IAlex] 2011-03-22 20:15:55 UTC
*** Bug 28193 has been marked as a duplicate of this bug. ***
Comment 8 Alexandre Emsenhuber [IAlex] 2011-04-29 18:28:02 UTC
Fixed in r87129.
Comment 9 Brion Vibber 2011-04-30 00:00:18 UTC
Reverted in r87152: causes failures like this (seen when attempting to log in)

SkinTemplate::makeTalkUrlDetails given invalid pagename User:

Backtrace:

#0 /var/www/trunk/includes/SkinTemplate.php(691): SkinTemplate->makeTalkUrlDetails('User:')
#1 /var/www/trunk/includes/SkinTemplate.php(495): SkinTemplate->buildPersonalUrls(Object(OutputPage))
#2 /var/www/trunk/includes/OutputPage.php(1906): SkinTemplate->outputPage(Object(OutputPage))
#3 /var/www/trunk/includes/Wiki.php(402): OutputPage->output()
#4 /var/www/trunk/index.php(146): MediaWiki->finalCleanup()
#5 {main}
Comment 10 Alexandre Emsenhuber [IAlex] 2011-04-30 14:20:46 UTC
Done again in r87164.
Comment 11 Brion Vibber 2011-06-08 00:30:51 UTC
Most of r87164 doesn't seem to have anything obvious to do with the query & postgres, but seems to be retooling User so it doesn't bother fetching full row data during later processing on the results, if you only ask for fields that were already provided.

Could perhaps do with some regression test cases on things like this?
Comment 12 Sumana Harihareswara 2011-09-26 15:43:58 UTC
Grega Bremec, are you still running into this problem with MediaWiki 1.17.0?
Comment 13 OverlordQ 2011-09-26 22:02:14 UTC
This still exists up to 1.19

2011-09-26 22:00:48 UTC ERROR:  function first(text) does not exist at character 102
2011-09-26 22:00:48 UTC HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
2011-09-26 22:00:48 UTC STATEMENT:  SELECT /* WikiPage::getContributors 65.65.116.130 */  rev_user as user_id,rev_user_text AS user_name,FIRST(user_real_name) AS user_real_name,MAX(rev_timestamp) AS timestamp  FROM "revision" LEFT JOIN mwuser ON ((rev_user = user_id))  WHERE rev_page = '1' AND (rev_user != 1) AND ((rev_deleted & 4) = 0)  GROUP BY rev_user,rev_user_text ORDER BY timestamp DESC
Comment 14 Marcin Cieślak 2012-03-16 03:15:02 UTC
As reported by hashar in bug 33686:

Fixed in trunk by r98222. Richard, you can manually apply the patch to fix the
issue (aka replace FIRST() by MIN()).

Merged in REL1_18 with r108767.

And indeed, as of r113993 I can't reproduce it with PostgreSQL anymore.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links