Last modified: 2013-03-17 12:19:15 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20255, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 18255 - Backup systems (tracking)
Backup systems (tracking)
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal major with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://wikitech.wikimedia.org/view/Ba...
: ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-30 12:40 UTC by elubarsky
Modified: 2013-03-17 12:19 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description elubarsky 2009-03-30 12:40:32 UTC
I think there should be a bug to track updates to the docs and eventual implementations.

IMHO we need to make sure text & images are protected from hardware faults, accidental software errors (which can replicate..), and even malicious deletion/corruption by someone gaining access through a security breach.
Comment 1 Brion Vibber 2009-04-22 16:48:18 UTC
Assigning fun sysadmin task to Fred, adding CCs for Rob and Tomasz.

The backup procedures/status chart at http://wikitech.wikimedia.org/view/Backup_procedures needs to be audited; it hasn't been updated in several months. Some services have been moved to new servers, and automatic backup jobs might not have been updated; others were last seen still pending full offsite backups, or had only ad-hoc procedures.

We want to make sure all the empty or red spots are filled in and cleaned up, and that we have a good idea how to recover from loss of any one of these services.
Comment 2 Tomasz Finc 2009-04-22 17:09:29 UTC
I'd love to have a status page for the backups that's updated regularly. Should be easy enough with a call to the mediawiki write api upon success or failure.
Comment 3 Brion Vibber 2009-04-22 17:22:00 UTC
Amen, brother!

(Should this also be integrated to Nagios monitoring/alerts?)

Comment 4 Fred Vassard 2009-04-29 17:49:39 UTC
I am currently evaluating a free solution called Amanda to properly take care of backups.
I believe this solution would also offer easy notification of successes / failures and enable us to have the dashboard Tomasz mentioned.
Also 'waiting' on storage to be available, which should theoretically arrive once I have had enough time to work through Amanda's configuration setup.
More to come! 

--Fred.
Comment 5 elubarsky 2009-07-05 05:04:17 UTC
Considering how valuable and high-quality this data is (and the massive effort the community has put into it), I'm rather concerned that there haven't been many concrete updates to this bug and pretty much none at all to http://wikitech.wikimedia.org/view/Backup_procedures.. From that page, images and OTRS data seem to be quite vulnerable even though they're are obviously very important to the project. Could we at least have some re-assurance?
Comment 6 Andrew Garrett 2009-07-29 16:39:01 UTC
Adding 'tracking' keyword.
Comment 7 Mark A. Hershberger 2011-03-06 21:27:13 UTC
Giving half of Fred's old bugs to Ashar since I trust him to get it done or reassign if he doesn't have time.
Comment 8 Antoine "hashar" Musso (WMF) 2011-03-10 12:29:22 UTC
hexmode>
Tasks coming to mind:
- which item needs to be backuped this bug report provides some
- identify for each item:
-- a backup solution (rsync, ftp, NAS, ..), incremental, snapshots, both?
-- off-site, off-foundation requirements
-- backup frequency (daily, weekly, monthly).
-- retention duration (month, year ...)
- make sure the backup tasks are documented and monitored
- make sure operations know where the documents are and train them
- actually test backup procedures and data from time to time on a test server. This is  to ensure the tools and procedures are up-to date. There is nothing worse than a bad backup (data filled with zeroes for examples) or wasting 2 hours looking for the documentation / hacking scripts around.

I believe this should be raised to CT woo and assigned to an operation program manager.

Wikitech has some documentations:
https://wikitech.wikimedia.org/view/Backup_procedures
Comment 10 Antoine "hashar" Musso (WMF) 2011-10-17 12:43:00 UTC
Reseting this bug to default assignee. There is not really anything I could do, that is an operation team issue.
Comment 11 Andre Klapper 2012-11-10 16:47:06 UTC
Not sure why this report was marked as "tracking" if nothing has been ever tracked here (no dependency bug reports), plus scope unclear: What would be needed to get this fixed? Sounds rather like a continuous task (not fixable).
Comment 12 Nemo 2012-11-10 18:41:57 UTC
(In reply to comment #11)
> Not sure why this report was marked as "tracking" if nothing has been ever
> tracked here (no dependency bug reports), 

Probably, because we never found someone bothering to create/track the relevant subtasks.
Comment 13 Antoine "hashar" Musso (WMF) 2013-03-16 20:51:50 UTC
Backup system is not tracked in Bugzilla anymore. Ops team use RT and has different set of tool to plan their backup strategy, this bug report is hence rather useless and I am closing it.
Comment 14 elubarsky 2013-03-17 06:23:00 UTC
Hi Antoine, could you post a link to where this is now tracked?
Comment 15 Andre Klapper 2013-03-17 11:32:02 UTC
This report was meant as a "tracking" report, but instead random different requests were added in this bug report, instead of marking them as dependencies. So to me it's already unclear what you'd actually like to "track".

FYI, RT is located at https://rt.wikimedia.org/ (refer to bug 30413 for potential meta-discussions about RT itself).
Comment 16 elubarsky 2013-03-17 12:19:15 UTC
A few years ago the situation at https://wikitech.wikimedia.org/wiki/Backup_procedures wasn't nearly as good as it is now. In particular there weren't proper disconnected off-site backups for text & especially images. So I started this report "to make sure text & images are protected from hardware faults, accidental software errors (which can replicate..), and even malicious deletion/corruption by someone gaining access through a security breach."

It's great to see that much progress has been made (thanks to everyone at Wikimedia for that!), but would you say that the risks in my original concern are now minimised as much as possible? I think matters relating to the progress of backup systems should be kept public since they are obviously of concern to Wikipedia editors who'd like their considerable efforts to not get lost.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links