Last modified: 2013-06-18 15:18:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T21587, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 19587 - secure.wikimedia.org speed and status
secure.wikimedia.org speed and status
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
SSL related (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Antoine "hashar" Musso (WMF)
:
: 19588 (view as bug list)
Depends on:
Blocks: ssl
  Show dependency treegraph
 
Reported: 2009-07-08 13:42 UTC by William Allen Simpson
Modified: 2013-06-18 15:18 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
peaks during off-peak time (15.66 KB, image/png)
2009-07-19 16:12 UTC, William Allen Simpson
Details
peaks at same time each day (15.64 KB, image/png)
2009-07-20 04:59 UTC, William Allen Simpson
Details

Description William Allen Simpson 2009-07-08 13:42:57 UTC
[originally reported on wikitech]

I've been using secure for login for over a year now, and at first it seemed 
pretty good, other than the inability to switch sites easily (bug 5440).

And always editing links from secure.wikimedia.org/.../w to en.wikipedia.org/w, 
but I've gotten used to doing that extra bit by hand.

Anyway, it's just been a dog lately. During EDT daylight hours, it often 
gives an error not able to access page, especially saving.

So, I've reverted to the old practice from the days of 2005-2006, and 
mostly edit in very off-peak hours. Yet it slowed down drastically again!

Here's my test log, edits queued and ready to go, demonstrating roughly how 
long they take to come back and display:

;off hours
# 2009-07-01T06:54:45
# 2009-07-01T06:55:59 1 minute 14 seconds

;peak time
# 2009-07-01T17:05:24
# 2009-07-01T17:06:17          53 seconds
# 2009-07-01T17:06:53          36 seconds
# 2009-07-01T17:08:00 1 minute  7 seconds
# 2009-07-01T17:08:40          40 seconds
# 2009-07-01T17:09:45 1 minute  5 seconds
# 2009-07-01T17:11:49 2 minutes 4 seconds
# 2009-07-01T17:12:44          55 seconds
# 2009-07-01T17:13:49 1 minute  5 seconds
# 2009-07-01T17:15:00 1 minute 11 seconds
# 2009-07-01T17:16:10 1 minute 10 seconds

In short, sometimes as slow off-peak as peak.

Does this mean that many secure users are from Asia?

Are there too many secure users?

Is there anywhere that configuration and usage of secure is listed?
Comment 1 Brion Vibber 2009-07-13 16:58:21 UTC
Fred can you take a peek and see if we can monitor status of secure server? I haven't noticed any problems using it, nor did the load graphs look particularly unpleasant when I checked last week, but we want to make sure it's not going to crap when we're not looking.
Comment 2 Brion Vibber 2009-07-13 16:58:43 UTC
*** Bug 19588 has been marked as a duplicate of this bug. ***
Comment 3 Fred Vassard 2009-07-13 17:49:08 UTC
Secure.wikimedia.org seems to point to bart and uses apache2 to proxy the ssl connection over to the cluster.
However, bart is also the nagios monitoring server and will therefore see spikes in CPU usage from time to time, depending on the nagios scheduler.
Also, this server is very low on memory: 

[root@bart conf]# free -m
             total       used       free     shared    buffers     cached
Mem:          3550       3129        420          0        406       1118
-/+ buffers/cache:       1604       1945
Swap:         1983          0       1983

which could cause some of the issues you are seeing. 

I will enable process accounting on that server to try and get a better view as to what is going on. 

Ganglia graphs available at http://ganglia.wikimedia.org/pmtpa/?c=Miscellaneous&h=bart.wikimedia.org&m=&r=hour&s=descending&hc=4

Also note, this server is set to be decomissioned in the near future.
Comment 4 William Allen Simpson 2009-07-19 16:12:19 UTC
Created attachment 6367 [details]
peaks during off-peak time

Thank you for the ganglia link. The server list had "ssl" 
instead of secure.wikimedia.org, so I'd missed it.

I've been looking at the graphs from time to time, and 
this was a fine example.
Comment 5 William Allen Simpson 2009-07-19 16:23:57 UTC
Noting for the record that http://nagios.wikimedia.org/ has been reporting 
MEMCACHED CRITICAL - Can not connect to 10.0.2.159:11000 (Connection refused) 
for some time now....
Comment 6 William Allen Simpson 2009-07-20 04:59:48 UTC
Created attachment 6371 [details]
peaks at same time each day

For comparison between 07-19 and 07-20, has the first CPU peak at the 
same time. However, there is a 07-20 network peak at the same time as the 
second 07-19 CPU peak, indicating some kind of regular process, too.
Comment 7 Chris Wood 2011-02-04 03:30:35 UTC
Dunno if this helps, but I've noticed this problem only on these pages so far. I look at a lot of Wikipedia articles):
https://secure.wikimedia.org/wikipedia/en/wiki/Barack_Obama
https://secure.wikimedia.org/wikipedia/en/wiki/Barack
https://secure.wikimedia.org/wikipedia/en/wiki/Obama

I'm accessing Wikipedia from New Zealand. The pages seem to be perpetually inaccessible (a few days so far). Of course the non-secure pages work fine.
Comment 8 Chris Wood 2011-02-13 20:31:27 UTC
The above three pages are still inaccessible for me. Also I've found another:
https://secure.wikimedia.org/wikipedia/en/wiki/9/11
Comment 9 Bawolff (Brian Wolff) 2011-02-13 20:37:56 UTC
>https://secure.wikimedia.org/wikipedia/en/wiki/Obama

The first time i tried to access it i got a 502 error about proxy not being able to read. Second time, it went through rather quickly. Perhaps the parser cache is separate for secure and rest of everything, and that page just takes insanely long to render that it times out(?)
Comment 11 Krinkle 2011-03-02 07:35:57 UTC
(In reply to comment #10)
> Some more: https://secure.wikimedia.org/wikipedia/en/wiki/World_War_II
> https://secure.wikimedia.org/wikipedia/en/wiki/World_War_2
> https://secure.wikimedia.org/wikipedia/en/wiki/World_war_2
> https://secure.wikimedia.org/wikipedia/en/wiki/WWII
> https://secure.wikimedia.org/wikipedia/en/wiki/Ww2
> https://secure.wikimedia.org/wikipedia/en/wiki/WW2

For me too. All return "502 Proxy Error"
Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /wikipedia/en/wiki/World_War_II.

Reason: Error reading from remote server

Apache/2.2.8 (Ubuntu) mod_fastcgi/2.4.6 PHP/5.2.4-2ubuntu5.12wm1 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g Server at secure.wikimedia.org Port 443
>Statuscode:502 Bad Gateway
>Connection:Keep-Alive
>Content-Length:616
>Content-Type:text/html; charset=iso-8859-1
>Date:Wed, 02 Mar 2011 07:33:01 GMT
>Keep-Alive:timeout=1, max=100
Comment 12 Bawolff (Brian Wolff) 2011-03-02 18:16:17 UTC
That again appears to be the proxy timing out. I only get the 502 when the page was not served from the parser cache. If its served from the parser cache, it works fine from secure.

Probably the timeout on the proxy server needs to be increased (or someone could make the parser be super fast, but that's a little more difficult ;)
Comment 13 William Allen Simpson 2011-03-03 19:27:15 UTC
[tried sending this via email, trying again]

We've seen these Proxy errors before with the server overloaded.  It's
currently on singer.  But I don't see (via Ganglia) the huge cpu spikes
we used to have on bart with nagios.

However, I was just going to post to wikitech that I've been seeing
other problems from secure lately, too:

* Edits don't seem to flush the cache properly.  After noticing this
weekend, I had to action=flush a dozen pages by hand to see my article
and category changes reflected via normal access.

* It's losing the user name on edits, showing up with IP instead.  I'm
not sure this wasn't due to my user error somehow -- but it was fairly
frequent back in the old overloaded days, hadn't happened to me for a
couple of years, and just showed up again yesterday!
Comment 14 Bawolff (Brian Wolff) 2011-03-03 20:19:29 UTC
(In reply to comment #13)
> [tried sending this via email, trying again]

yeah, trying to reply by email to bugmail doesn't work.

> We've seen these Proxy errors before with the server overloaded.  It's
> currently on singer.  But I don't see (via Ganglia) the huge cpu spikes
> we used to have on bart with nagios.

If my theory is correct, its not caused by load.

[..]
> 
> * Edits don't seem to flush the cache properly.  After noticing this
> weekend, I had to action=flush a dozen pages by hand to see my article
> and category changes reflected via normal access.

There was recently some issues with the job queue (bug 27727), may be related to that (That wouldn't be secure specific though)

> 
> * It's losing the user name on edits, showing up with IP instead.  I'm
> not sure this wasn't due to my user error somehow -- but it was fairly
> frequent back in the old overloaded days, hadn't happened to me for a
> couple of years, and just showed up again yesterday!

That's a more interesting issue, I have no idea what could cause that.
Comment 15 Mark A. Hershberger 2011-03-06 21:27:14 UTC
Giving half of Fred's old bugs to Ashar since I trust him to get it done or reassign if he doesn't have time.
Comment 16 p858snake 2011-03-10 01:16:45 UTC
Resetting this back to wikibugs, and almost willing to close it.

Appears this is was assisgned back before we had status.* and the related tools and that is what was wanted. Which is now Bug 27912 to get it inculded.

And the 502 errors are also a seperate bug (bug 25271), which could probably get duped either way.
Comment 17 Antoine "hashar" Musso (WMF) 2011-03-10 07:30:17 UTC
Assigning back to me. Pending actions:

- make sure it is monitored by nagios and ganglia 
- check the peaks disappeared or either
--- move the process generating them elsewhere
--- move secure.w.o somewhere else
Comment 18 William Allen Simpson 2011-03-10 16:25:53 UTC
Regarding comment 16, I had already filed Bug 19588 on the Proxy errors, but Brion marked it as a duplicate of this bug (back in comment 2). So maybe they should be split again?
Comment 19 Chris Wood 2011-07-21 05:28:51 UTC
Merge with bug 25271?
Comment 20 Antoine "hashar" Musso (WMF) 2011-07-27 18:21:42 UTC
The Wikimedia Foundation operation team is rebuilding the HTTPS system from scratch that will solve this bug for good.

HTTPS has been enabled on test some days ago:
http://blog.wikimedia.org/2011/07/19/protocol-relative-urls-enabled-on-test-wikipedia-org/

Therefore, this bug will not be fixed since the architecture is going to be replaced.
Comment 21 Brion Vibber 2011-07-28 17:15:16 UTC
Sounds more like "almost FIXED" than WONTFIX. :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links