Last modified: 2013-08-01 08:50:02 UTC
Created attachment 11973 [details] search in UI returns nothing According to the tracking bug, addressing Search via curl() is working, but Search in the UI is not working, see screen shot.
For the last couple days the php entry points were giving an Error 500 because the Thanks extension was not in mediawiki/extensions.git (that is fixed now). Lucene search poll all the wiki via the OAI extension, that definitely served error 500 page that might have broke the search system. The two search instances are using puppetmaster::self so their puppet configuration have to be done manually. I have updated them a few hours ago. Doing a search does not work right now: http://en.wikipedia.beta.wmflabs.org/w/api.php?format=json&action=opensearch&search=F&namespace=0&suggest= Gives out: ["F",[]] Need to investigate the PHP error logs and look at the search box logs.
deployment-search01:~$ curl -x localhost:8123 http://localhost/search/enwiki/Main curl: (7) couldn't connect to host deployment-search01:~$ I have restarted lucene-search2 there
Search is working again. What is troublesome is that lucene-search2 should be restart by puppet automatically whenever it dies. I am leaving this bug open to monitor it a bit more.
The lucene process is probably killed by the OOM catcher. We need to tweak the java -Xm parameter to limit the amount of memory being used.
Both en.m.wikipedia.beta.wmflabs.org and en.m.wikipedia.org have San Francisco article: http://en.m.wikipedia.beta.wmflabs.org/wiki/San_Francisco http://en.m.wikipedia.org/wiki/San_Francisco At en.m.wikipedia.org when you enter San in search box, several search suggestions appear (wikipedia.png attachment). No search suggestions appear when the same is done at en.m.wikipedia.beta.wmflabs.org (wmflabs.png attachment).
Created attachment 12054 [details] wikipedia screenshot
Created attachment 12055 [details] wmflabs screenshot
Passing en.m.wikipedia.org Jenkins and Sauce Labs jobs: https://wmf.ci.cloudbees.com/job/_debug-MobileFrontend-template/6/ https://saucelabs.com/tests/ba82274d4aaf4449919ff77c17292969 Failing en.m.wikipedia.beta.wmflabs.org Jenkins and Sauce Labs jobs: https://wmf.ci.cloudbees.com/job/_debug-MobileFrontend-template/7/ http://saucelabs.com/jobs/a1b31beeaa2749d88335f2b94307bef8
Rewording the summary. The root cause is the java process asking for 20GB memory on an instance having 4GB. I have hacked the script locally to limit memory to 2GB. Will see how well it goes then hack the puppet class and init.d script to let us easily tweak the memory settings for lucene.
Command running right now is: /usr/bin/java -Xmx2000m -Dsun.rmi.transport.tcp.handshakeTimeout=10000 -Djava.rmi.server.codebase=file:///a/search/lucene-search/LuceneSearch.jar -Djava.rmi.server.hostname=deployment-search01 -classpath :/usr/share/java/udp2log-log4j.jar:/a/search/lucene-search/LuceneSearch.jar org.wikimedia.lsearch.config.StartupManager :)
Taking bug, raising priority. I need to fix that this week.
The deployment-search01 Icinga report is http://icinga.wmflabs.org/cgi-bin/icinga/extinfo.cgi?type=2&host=deployment-search01.pmtpa.wmflabs&service=Lucene+frontend I have restarted the lucene-search-2 service that was apparently no more listening although there has been no OOM message :-] So we have some progress!
Patches are: https://gerrit.wikimedia.org/r/#/c/59995/ convert java opts to shell variables https://gerrit.wikimedia.org/r/#/c/59996/ conf file for lucene.jobs.sh https://gerrit.wikimedia.org/r/#/c/60000/ enable conf file loading
pending ops review, updating summary to reflect that.
Peter has merged the changes and deployed them in production. I have to make sure that works fine in labs and will most probably recreate the existing instances. Most of the work has been completed, thus lowering priority.
Chad and Nik have migrated beta to CirrusSearch extension which uses an ElasticSearch backend. Hence this Lucene search bug is no more valid :-)