Last modified: 2014-05-22 17:49:02 UTC
A very big, long-term suggestion here: a real performance testing cluster. The beta cluster is currently unsuitable for testing code for many performance problems, because it is all on VMs. It generally won't tell you about cachebusting either, for several of our caching layers. Several people are working on Vagrant roles that will help developers test in more realistic environments in dev environments, with million-row databases and warm caches. But CPU and other constraints won't be realistic. So - as one can see in https://www.mediawiki.org/wiki/Performance_profiling_for_Wikimedia_code - right now, many interactions we can only predict roughly until the code hits production. It is not possible to have a testing cluster that exactly mirrors production. And of course it is the developer's responsibility to know the constraints of the production system and know how her code exercises those systems. And with heterogeneous deployment, it's *possible* to notice some problems while they're only affecting less-trafficed wikis. But some people have expressed interest in creating a more realistic environment to test/predict how efficient code will be when deployed to very-high-traffic wikis, especially with unique configurations. So: I'm adding this bug report so that others can comment, WONTFIX it, mention what parts of our performance-related environment are currently hardest to test but would be easier to mock up, etc.
CCing Aaron and Ori which are the performance engineers. I know that at least Ori raised the subject previously in previous bug report. Maybe we can investigate that after HHVM migration which takes a good share of our productivity this quarter and probably the next as well.
Added to https://www.mediawiki.org/wiki/Wikimedia_Release_and_QA_Team/Wishlist
does it make sense to bring this up in the "scrum of scrums"?
(In reply to Daniel Zahn from comment #3) > does it make sense to bring this up in the "scrum of scrums"? We can, but I think a larger conversation needs to happen between Platform/RelEng/Ops re support/team capacity. It won't be on our (RelEng's) todo list for this quarter, at least.