Last modified: 2013-09-27 09:37:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47007, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45007 - Update special pages more frequently to account for bad runs
Update special pages more frequently to account for bad runs
Status: RESOLVED DUPLICATE of bug 53227
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
wmf-deployment
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-14 16:28 UTC by Malafaya
Modified: 2013-09-27 09:37 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Malafaya 2013-02-14 16:28:49 UTC
Recently, the special pages update jobs have been having some trouble in actually finishing their work.
About 50% of the times, the jobs are terminated by some fatal error (no pattern, from the reasons I've been told), either because there's a stubborn wiki whose database tables grew too big or a bad update has been put live just when the jobs were still running.
To account for these bad runs, I would like to suggest running the jobs every 1, 1.5 or 2 days, instead of the current 3. If a wiki is intended, my specific case applies to pt.wiktionary.
Thank you.
Comment 1 Malafaya 2013-02-14 17:10:36 UTC
MaxSem said the job run yesterday (13th) when according to the 3 days schedule it should only have run today (14th; last run was 11th at 00:00UTC).
It finished with an error:

/home/wikipedia/logs/norotate/updateSpecialPages.log:

Fatal error: Call to a member function getText() on a non-object in /usr/local/apache/common-local/php-1.21wmf9/extensions/MobileFrontend/includes/MobileContext.php on line 273

Log file modified: 2013-02-13 05:17:24.374378000 +0000
Comment 2 db [inactive,noenotif] 2013-02-14 20:27:17 UTC
Job should automatic report this in the server admin log, than some people can see the errors and maybe fix it. LocalisationUpdate is reporting success, maybe this job can do that also.
Comment 3 Malafaya 2013-02-15 18:19:52 UTC
Fixing the problem usually happens shortly after the error is thrown. But that won't fix the special pages update, which will have to wait at least until the next run (3 more days, if the next run happens to be successful).
Comment 4 Daniel Zahn 2013-08-20 00:32:12 UTC
< Danny_B> update of special pages is off now?
< Danny_B> or the periods have been prolonged?
..
< mutante>  monthday => "*/3"
< mutante> hour => 5
..
< mutante> ./manifests/misc/maintenance.pp
< mutante> class misc::maintenance::update_special_pages

< Danny_B> so it doesn't run obviously
< Danny_B> last update: 13. 8. 2013, 14:15

< mutante> command => "flock -n /var/lock/update-special-pages /usr/local/bin/update-special-pages > /home/wikipedia/logs/norotate/updateSpecialPages.log 2>&1",
< mutante> uhm, yeah, i don't know about the commandline

< Reedy> Never happy
< Danny_B> anyway, in case it would be helpful to track down the issue - cs wikis lack the update

site.pp

 < mutante> 1178     # Wrong log file location
 < mutante> 1179     class { misc::maintenance::update_special_pages: enabled => true }
 < mutante> 2762     # Broken cron jobs moved back to hume:
< mutante> 2765     class { misc::maintenance::update_special_pages: enabled => false }



< mutante> so, the enabled one is on hume in site.pp
< mutante> not on the new host terbium

< mutante> !createbug

< mutante> cat: /home/wikipedia/logs/norotate/updateSpecialPages.log: No such file or directory
Comment 5 Sam Reed (reedy) 2013-08-23 14:10:12 UTC
reedy@hume:/home/wikipedia/log/norotate$ flock -n /var/lock/update-special-pages /usr/local/bin/update-special-pages > /home/wikipedia/logs/norotate/updateSpecialPages.log 2>&1
reedy@hume:/home/wikipedia/log/norotate$
Comment 6 Gerrit Notification Bot 2013-08-23 14:12:31 UTC
Change 80560 had a related patch set uploaded by Reedy:
Maintenance scripts should be run as Apache

https://gerrit.wikimedia.org/r/80560
Comment 7 Gerrit Notification Bot 2013-08-23 14:13:43 UTC
Change 80560 abandoned by Reedy:
Maintenance scripts should be run as Apache

Reason:
user => "apache",

https://gerrit.wikimedia.org/r/80560
Comment 8 Sam Reed (reedy) 2013-08-23 14:22:02 UTC
I'm not sure how just running it more frequently would make it any more likely to complete successfully. You're just going to make more fails more frequently.

Ideally, if it dies doing one wiki, this shouldn't stop execution on every other subsequent wiki (which has been an issue in the past)
Comment 9 Sam Reed (reedy) 2013-08-23 14:27:56 UTC
Monitor the current manual run via http://noc.wikimedia.org/~reedy/updateSpecialPages.log
Comment 10 Malafaya 2013-08-30 10:35:52 UTC
(In reply to comment #8)
> I'm not sure how just running it more frequently would make it any more
> likely
> to complete successfully. You're just going to make more fails more
> frequently.
> 
More fails are likely. But running every 3 days makes it quite frequent to have 6 or 9 days without a special page update. Right now it's been 6 days and the Wanted Categories page hasn't been updated at pt.wiktionary (I think the last update was your manual run). And I'm betting it won't be today either. So, another 3 days will have to pass for another go.
Comment 11 Nemo 2013-09-27 07:36:37 UTC
(In reply to comment #8)
> I'm not sure how just running it more frequently would make it any more
> likely
> to complete successfully. You're just going to make more fails more
> frequently.

So, closing this as a duplicate of bug 53227: let's keep one bug per issue, not one per proposed way to address it. Bug 53227 also shows that the diagnosis behind this proposal is probably wrong, because failures seem consistent rather than occasional, when there are failures.

> Ideally, if it dies doing one wiki, this shouldn't stop execution on every
> other subsequent wiki (which has been an issue in the past)

This is maybe worth a separate bug? If the scripts can't be improved easily, it should be rather easy to make the cronjobs more atomic.

*** This bug has been marked as a duplicate of bug 53227 ***
Comment 12 Malafaya 2013-09-27 09:33:54 UTC
If the runs become more reliable than in the past then surely this doesn't make much sense anymore. Let's go with bug 53227 for now.
Comment 13 Malafaya 2013-09-27 09:37:19 UTC
P.S.:

> Bug 53227 also shows that the diagnosis
> behind this proposal is probably wrong...

When I submitted this bug in February, bug 53227 was still not an issue at that time. The constant (bad) live updates were the problem then (see Comment #1).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links