Last modified: 2011-03-13 18:04:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T11096, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 9096 - Job-queue gets multiple identical entries / db load too high


Summary:	Job-queue gets multiple identical entries / db load too high

Status:	RESOLVED WONTFIX

Product:	MediaWiki
Classification:	Unclassified
Component:	Database (Other open bugs)
Version:	1.9.x
Hardware:	All All

Importance:	Lowest minor (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2007-02-25 14:11 UTC by Gunter Schmidt
Modified:	2011-03-13 18:04 UTC (History)
CC List:	1 user (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Gunter Schmidt 2007-02-25 14:11:23 UTC

When you edit a template, the job-queue gets filled. If you edit the same template multiple times within 
a short period of time, the job-queue will have identical entries.

This does not result in an error, but in unnecessary database load.

I propose, the job table should be checked, if there are already entries with the same "job_cmd, 
job_namespace and job_title".
Since the job queue is updated after the edit is finished, there should not be a problem with entries 
that do exist during check, but are already completed on update of the table.

The benefit would be:
* less db load
* faster updates on long queues, because they are considerably shorter 

While writing this wikipedia en has a jobqueue of 82,305 entries, so I make it a high priority bug.

Comment 1 Gunter Schmidt 2007-02-25 14:54:24 UTC

I am sorry, I just found, that the cleanup-jobs seems to check this. So it will delete all similar jobs with just one step.

Thus the DB-load is not significantly higher than if it would have been not inserted in the first place.

The only drawback is, that you do not see the actual length of the jobqueue. One could make a group statement in 
SpecialStatistics.php.

Comment 2 Gunter Schmidt 2007-02-25 18:24:53 UTC

One would need to add:

= $numJobs = $dbr->selectField( 'job', 'COUNT(*)', '', $fname );
+ $numJobsGrouped = $dbr->selectField( 'job', 'count(DISTINCT `job_title`,`job_cmd`,`job_namespace`)', '', $fname);

= $wgLang->formatNum( $images ),
+ $wgLang->formatNum( $numJobsGrouped )

in SpecialStatistics.php

and add some text to the Message: Sitestatstext

I am not sure about the database load of count DISTINCT on large Systems, so it might not be a good idea.

Another possible SELECT would be SELECT COUNT(*) AS C FROM `job` WHERE `job_id` IN (SELECT (`job_id`) FROM `job` GROUP BY 
`job_cmd`,`job_namespace`, `job_title`).

Comment 3 Rob Church 2007-02-25 19:13:01 UTC

We could just add a unique index on those three columns and use an INSERT IGNORE
when stuffing rows into the job queue, but I'd like another opinion on whether
or not the duplicates are, in fact, causing load that we need to be worried about.

Comment 4 Brion Vibber 2007-03-05 20:11:32 UTC

The duplicates are used because the original checking on add was very expensive
(the inserts must be very fast, while the processing can take as long as it needs).

An INSERT IGNORE might not do too bad, though, dunno.

Comment 5 Tim Starling 2007-05-19 20:54:58 UTC

I didn't use a unique index in the original code because I imagined that at some stage in the future, we may want to add job types that require execution of duplicates. For example, a job type with no attached title, defined entirely by the last few bytes of a large job_params blob, would create duplicates in a (job_cmd,job_namespace,job_title) key. The current method is good enough for now, although I would like to switch to a specialised non-MySQL data structure at some stage.

-- Tim Starling

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links