Last modified: 2011-03-13 18:04:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 9096 - Job-queue gets multiple identical entries / db load too high
Job-queue gets multiple identical entries / db load too high
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
1.9.x
All All
: Lowest minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-25 14:11 UTC by Gunter Schmidt
Modified: 2011-03-13 18:04 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gunter Schmidt 2007-02-25 14:11:23 UTC
When you edit a template, the job-queue gets filled. If you edit the same template multiple times within 
a short period of time, the job-queue will have identical entries.

This does not result in an error, but in unnecessary database load.

I propose, the job table should be checked, if there are already entries with the same "job_cmd, 
job_namespace and job_title".
Since the job queue is updated after the edit is finished, there should not be a problem with entries 
that do exist during check, but are already completed on update of the table.

The benefit would be:
* less db load
* faster updates on long queues, because they are considerably shorter 

While writing this wikipedia en has a jobqueue of 82,305 entries, so I make it a high priority bug.
Comment 1 Gunter Schmidt 2007-02-25 14:54:24 UTC
I am sorry, I just found, that the cleanup-jobs seems to check this. So it will delete all similar jobs with just one step.

Thus the DB-load is not significantly higher than if it would have been not inserted in the first place.

The only drawback is, that you do not see the actual length of the jobqueue. One could make a group statement in 
SpecialStatistics.php.

Comment 2 Gunter Schmidt 2007-02-25 18:24:53 UTC
One would need to add:

= $numJobs = $dbr->selectField( 'job', 'COUNT(*)', '', $fname );
+ $numJobsGrouped = $dbr->selectField( 'job', 'count(DISTINCT `job_title`,`job_cmd`,`job_namespace`)', '', $fname);

= $wgLang->formatNum( $images ),
+ $wgLang->formatNum( $numJobsGrouped )

in SpecialStatistics.php

and add some text to the Message: Sitestatstext

I am not sure about the database load of count DISTINCT on large Systems, so it might not be a good idea.

Another possible SELECT would be SELECT COUNT(*) AS C FROM `job` WHERE `job_id` IN (SELECT (`job_id`) FROM `job` GROUP BY 
`job_cmd`,`job_namespace`, `job_title`).
Comment 3 Rob Church 2007-02-25 19:13:01 UTC
We could just add a unique index on those three columns and use an INSERT IGNORE
when stuffing rows into the job queue, but I'd like another opinion on whether
or not the duplicates are, in fact, causing load that we need to be worried about.
Comment 4 Brion Vibber 2007-03-05 20:11:32 UTC
The duplicates are used because the original checking on add was very expensive
(the inserts must be very fast, while the processing can take as long as it needs).

An INSERT IGNORE might not do too bad, though, dunno.
Comment 5 Tim Starling 2007-05-19 20:54:58 UTC
I didn't use a unique index in the original code because I imagined that at some stage in the future, we may want to add job types that require execution of duplicates. For example, a job type with no attached title, defined entirely by the last few bytes of a large job_params blob, would create duplicates in a (job_cmd,job_namespace,job_title) key. The current method is good enough for now, although I would like to switch to a specialised non-MySQL data structure at some stage.

-- Tim Starling

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links