Last modified: 2014-09-23 22:33:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T22079, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 20079 - Provide a better means of status update delivery in WMF error message
Provide a better means of status update delivery in WMF error message
Status: PATCH_TO_REVIEW
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Normal enhancement with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 16043 20083
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-05 21:37 UTC by Martin Peeks
Modified: 2014-09-23 22:33 UTC (History)
16 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Martin Peeks 2009-08-05 21:37:37 UTC
See also: https://bugzilla.wikimedia.org/show_bug.cgi?id=16043   (this blocks it if anything)

Great change has taken place in #wikipedia with regards to opping practices. 
It remains difficult to manage the channel during times of downtime, especially
with little or no support from sysadmins (if the channel gets particularly
hectic, while +m might not be warranted, it is impossible to read both the
channel and #wikimedia-tech).

A far better solution than Mike suggests is that the Wikimedia sysadmins go to
the effort of creating some easy, quick to update and accessible method of
telling users what is going on.  Not many people use and are familiar with IRC
- and I'd expect that for 90% of people who see the "site are down" message,
their usual next step would be to (ironically!) visit wikipedia to see what IRC
means!  It therefore serves very few users as a means of providing status
updates.

It would be relatively trivial for someone to create (yet another) IRC bot for
#wikimedia-tech which could write comments given to it to either a blog or
something like twitter (and thus an RSS feed).  This would be accessible to
many many more users affected by Wikimedia downtime.

An IRC channel is no longer fit for purpose.
Comment 1 Platonides 2009-08-05 22:17:19 UTC
Where would you place that status page so it doesn't get "slashdotted" on a wikipedia outage?

A long time ago, there was an external page serving for that, which was taken down on wikipedia 
failures. Now wikipedia traffic is orders of magnitude greater.

An appropiate place to set the messages could be the toolserver (if WM-DE is ok with it), 
independent but nearby. However, it only makes sense if the source of the problem isn't in esams 
itself!
I can't think a scenary where the squids present the error message, the toolserver is not 
accesible and which isn't trivially solved by rerouting to tampa. Nonetheless I feel there might 
be an unsuspected problem there.
Comment 2 Martin Peeks 2009-08-05 23:26:54 UTC
(In reply to comment #1)
> Where would you place that status page so it doesn't get "slashdotted" on a
> wikipedia outage?
>
> A long time ago, there was an external page serving for that, which was taken
> down on wikipedia 
> failures. Now wikipedia traffic is orders of magnitude greater.
> 
> An appropiate place to set the messages could be the toolserver (if WM-DE is ok
> with it), 
> independent but nearby. However, it only makes sense if the source of the
> problem isn't in esams 
> itself!
> I can't think a scenary where the squids present the error message, the
> toolserver is not 
> accesible and which isn't trivially solved by rerouting to tampa. Nonetheless I
> feel there might 
> be an unsuspected problem there.

toolserver did cross my mind.  Alternatively, use a completely seperate service such as a hosted blog or, as an increasing number of services do, use twitter.  

Alternatively, could the "Site down" notice be modified such that it draws a short status string from somewhere and presents it to users?

Comment 3 Derk-Jan Hartman 2009-12-19 13:14:09 UTC
message bot already spews to twitter. http://twitter.com/wikimediatech
Comment 4 Platonides 2009-12-19 15:44:47 UTC
It would be more consistent with our values linking to http://identi.ca/wikimediatech instead of twitter.
Still, I don't think the Server admin log is appropiate as a general status information.
Make a feed with #wikipedia-tech topic? :)
Comment 5 FT2 2011-05-25 14:30:31 UTC
IRC is useful for many people. So are identi.ca and twitter, client (or app) and browser based. We should provide a few routes, not just one. There's no need not to tell people about IRC as one of those. If anything's up IRC will surely get to know of it. I was in #wikipedia on 24 May and demands weren't unreasonable, posted a message there now and then, people got the idea. Easy.


Suggestion:

   <Standard and user-friendly generic error message>

   If this persists more than a few minutes, the current status and updates can be viewed at:

     * IRC: <channel details>
            <http://web link> (web based)
     * identi.ca: <details>
     * Twitter: wikimedia-network-status
                <http://search.twitter.com/search?q=wikimedia-network-status>
(web based)
     * Our external status pages: <list>

   Almost-current versions of articles can be read from the following cache
websites: 

     * <list>
Comment 6 Nemo 2011-05-27 07:07:19 UTC
(In reply to comment #5)
> IRC is useful for many people. So are identi.ca and twitter, client (or app)
> and browser based. We should provide a few routes, not just one. 

That's bug 16043. This bug asks for

> some easy, quick to update and accessible method of
> telling users what is going on.

http://status.wikimedia.org/ seems the way, but sysadmins need to decide how to update it with notices.

[Link to largely unhelpful discussion, just for historical purposes: http://thread.gmane.org/gmane.org.wikimedia.foundation/52853 .]
Comment 7 Nemo 2013-06-09 20:25:29 UTC
Guillaume, Sumana, Tilman, Matthew or whoever is responsible for this: do we have *any* location right now where users can expect to find information about (if not report) current outages and technical problems, which could be linked from the error page?

As of bug 16043 comment 24, the new (varnish) error page won't have even a link to IRC, while it would be nice if it gave some directions.
status.wikimedia.org doesn't give any updates; I think even Twitter would be better than nothing, but I don't remember https://twitter.com/wikimedia consistently consistently reporting such information with less than few hours' delay.
Perhaps https://wikitech.wikimedia.org/view/Server_admin_log would be a suitable target? It's both more open for posting (hence more complete) and "moderated" (by editing the wiki). It mostly contains obscure information, but during outages the top lines will probably be about what people are looking for; informative messages can easily be made in bold.

This issue saw no progress in years... Can we find a simple solution, and who's in the position of taking a decision on the topic?
Comment 8 Tomasz W. Kozlowski 2013-06-09 20:54:12 UTC
Just post stuff /both/ to Identi.ca and Twitter (with the most popular accounts or hashtags) with a simple IRC-to-Identi.ca/Twitter bot. Won't hurt.
Comment 9 Andre Klapper 2013-06-11 18:01:21 UTC
(In reply to comment #7 by Nemo)

> Perhaps https://wikitech.wikimedia.org/view/Server_admin_log would be a
> suitable target? 

See comment 4 - It's likely too techy.

This request has some bikeshed potential - is there a scope which kind of issues should be informed about? (I probably shouldn't ask this, to keep this focused.)
Comment 10 Ken Snider 2013-07-03 21:42:06 UTC
Hello everyone,

Apologies for being late to this discussion.

Is the sort of information we are currently exposing at http://status.wikimedia.org the kind of information you are looking for? Or something else?

Thanks.
Comment 11 Nemo 2013-07-04 06:11:24 UTC
Hello Ken.

(In reply to comment #10)
> Is the sort of information we are currently exposing at
> http://status.wikimedia.org the kind of information you are looking for? Or
> something else?

Something else. status.wikimedia.org reports only the worst cases of downtime (when sites are not even accessible), for some of the services. What's needed is information on whether the sites are functioning (e.g. up, down, read only, r/w but there's a fatal if you try to save, Europe cut off) and what's being done about it.
A recent example could be https://status.github.com/messages
Comment 12 Gerrit Notification Bot 2013-11-23 22:11:15 UTC
Change 97190 had a related patch set uploaded by Nemo bis:
Add Twitter account to Varnish's error page

https://gerrit.wikimedia.org/r/97190
Comment 13 MZMcBride 2013-11-25 06:35:58 UTC
(In reply to comment #12)
> Change 97190 had a related patch set uploaded by Nemo bis:
> Add Twitter account to Varnish's error page
> 
> https://gerrit.wikimedia.org/r/97190

I think this proposed change might mistakenly give the impression that the "wikimedia" Twitter account is used to provide site status information and it's definitely not, even during actual outages and issues.
Comment 14 MZMcBride 2013-11-25 06:38:10 UTC
(In reply to comment #13)

Comment 4 notes that Twitter is not really aligned with Wikimedia's open source values, though in the time since comment 4 was made, identi.ca no longer exists, I believe. :-/
Comment 15 p858snake 2013-11-25 06:50:22 UTC
Copy from gerrit comments:

> Dzahn: didn't you mean https://twitter.com/wikimediatech instead of https://twitter.com/wikimedia ? ...snip...

I disagree with using the wikitech logs because most end users will not understand what they mean

eg: <p858snake|l> most end users will not know what "cp1002 hdd is full" or "fenari is in swap" or perhaps "exim is being stupid" means
<p858snake|l> or how that relates to why its boke
Comment 16 Tilman Bayer 2013-11-25 07:49:37 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Change 97190 had a related patch set uploaded by Nemo bis:
> > Add Twitter account to Varnish's error page
> > 
> > https://gerrit.wikimedia.org/r/97190
> 
> I think this proposed change might mistakenly give the impression that the
> "wikimedia" Twitter account is used to provide site status information and
> it's
> definitely not, even during actual outages and issues.


It's definitely been used for major outages, see e.g.:

https://twitter.com/Wikimedia/status/232469652691894272
https://twitter.com/Wikimedia/status/232519974663643136
https://twitter.com/Wikimedia/status/350485792956755968
https://twitter.com/Wikipedia/status/398888528039276544 (was retweeted by @wikimedia, too)

Since mid-2011, Twitter has been listed as a communications tool for such cases at 
https://wikitech.wikimedia.org/wiki/Incident_response#Communicating_with_the_public .

Of course it's a matter of judgment how severe an incident needs to be to be reported on @wikimedia. Issues that don't affect a lot of users, or short outages, may indeed not be covered there. The wording in the patch ("You may be able to get further information in Wikimedia's <a href="https://twitter.com/wikimedia"
>Twitter feed</a>") should be sufficiently non-committal.
Comment 17 Nemo 2013-11-25 08:13:59 UTC
It would be interesting to have a guesstimate of how many views of that error page happen to coincide with something that would trigger an update of that Twitter handle, i.e. if it's mostly seen during major outages (a lot of views in rare events) or minor ones (less views but many more events).
Since IRC was removed, the error message no longer provides any way (however hard) to get really up to date information. I don't know however if that's a goal, maybe not.
Comment 18 Erik Moeller 2013-11-25 08:27:04 UTC
I don't think it's an issue if folks check out @wikimedia from an error message and find no updates there, as long as the message is worded accordingly. The proposed message in https://gerrit.wikimedia.org/r/#/c/97190/ already says "may be", which I think is sufficient, but we could also add "in case of ongoing outages" since we'll likely never tweet something for an intermittent site issue.
Comment 19 MZMcBride 2013-12-07 17:54:09 UTC
(In reply to comment #16)
>> I think this proposed change might mistakenly give the impression that the
>> "wikimedia" Twitter account is used to provide site status information and
>> it's definitely not, even during actual outages and issues.
> 
> It's definitely been used for major outages[...]

Yes, it has been used previously. But site outages and issues happen 24/7 and I can assure you we've had many outages and large site issues of varying strengths that have gone unreported to Twitter. There's also the issue of tweets coming post-incident (see below).

> [...] see e.g.:
> 
> https://twitter.com/Wikimedia/status/232469652691894272
> https://twitter.com/Wikimedia/status/232519974663643136
> https://twitter.com/Wikimedia/status/350485792956755968
> https://twitter.com/Wikipedia/status/398888528039276544 (was retweeted by
> @wikimedia, too)

A user visits Wikipedia and sees an error page. They refresh or come back a few minutes later and the site is back. In only one of the four cases mentioned here would there have been any useful information from Twitter. In three of the four cases, the message was put out after the site issue was resolved (e.g., "Site back to normal after problems affecting logged-in users."). Any user who saw the Wikimedia error message and clicked over to Twitter would not have been provided any useful information.

If we insist on including a link to Twitter, I think it might be better to include a link such as <https://twitter.com/search?q=wikipedia+down>. That's how a user can actually determine whether the site is having issues during an actual outage.

Otherwise we will simply be directing users to a feed (@wikimedia) of "check out this project on Wikisource" or "see the Commons image of the day" when the sites are inaccessible. That doesn't seem ideal to me.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links