Last modified: 2013-08-01 08:50:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T48459, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 46459 - [OPS] lucene-search-2 uses too much memory on labs
[OPS] lucene-search-2 uses too much memory on labs
Status: RESOLVED WORKSFORME
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Antoine "hashar" Musso (WMF)
: ops
Depends on:
Blocks: 34250
  Show dependency treegraph
 
Reported: 2013-03-22 18:17 UTC by Chris McMahon
Modified: 2013-08-01 08:50 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
search in UI returns nothing (64.14 KB, image/png)
2013-03-22 18:17 UTC, Chris McMahon
Details
wikipedia screenshot (91.99 KB, image/png)
2013-04-08 18:07 UTC, Željko Filipin
Details
wmflabs screenshot (116.30 KB, image/png)
2013-04-08 18:08 UTC, Željko Filipin
Details

Description Chris McMahon 2013-03-22 18:17:03 UTC
Created attachment 11973 [details]
search in UI returns nothing

According to the tracking bug, addressing Search via curl() is working, but Search in the UI is not working, see screen shot.
Comment 1 Antoine "hashar" Musso (WMF) 2013-03-22 20:46:45 UTC
For the last couple days the php entry points were giving an Error 500 because the Thanks extension was not in mediawiki/extensions.git (that is fixed now).  Lucene search poll all the wiki via the OAI extension, that definitely served error 500 page that might have broke the search system.

The two search instances are using puppetmaster::self so their puppet configuration have to be done manually.  I have updated them a few hours ago.

Doing a search does not work right now:

http://en.wikipedia.beta.wmflabs.org/w/api.php?format=json&action=opensearch&search=F&namespace=0&suggest=

Gives out:

["F",[]]

Need to investigate the PHP error logs and look at the search box logs.
Comment 2 Antoine "hashar" Musso (WMF) 2013-03-22 21:03:29 UTC
deployment-search01:~$ curl -x localhost:8123 http://localhost/search/enwiki/Main
curl: (7) couldn't connect to host
deployment-search01:~$

I have restarted lucene-search2 there
Comment 3 Antoine "hashar" Musso (WMF) 2013-03-22 21:10:57 UTC
Search is working again.  What is troublesome is that lucene-search2 should be restart by puppet automatically whenever it dies.  I am leaving this bug open to monitor it a bit more.
Comment 4 Antoine "hashar" Musso (WMF) 2013-03-23 12:52:56 UTC
The lucene process is probably killed by the OOM catcher. We need to tweak the java -Xm parameter to limit the amount of memory being used.
Comment 5 Željko Filipin 2013-04-08 18:06:21 UTC
Both en.m.wikipedia.beta.wmflabs.org and en.m.wikipedia.org have San Francisco article:

http://en.m.wikipedia.beta.wmflabs.org/wiki/San_Francisco
http://en.m.wikipedia.org/wiki/San_Francisco

At en.m.wikipedia.org when you enter San in search box, several search suggestions appear (wikipedia.png attachment). No search suggestions appear when the same is done at en.m.wikipedia.beta.wmflabs.org (wmflabs.png attachment).
Comment 6 Željko Filipin 2013-04-08 18:07:39 UTC
Created attachment 12054 [details]
wikipedia screenshot
Comment 7 Željko Filipin 2013-04-08 18:08:05 UTC
Created attachment 12055 [details]
wmflabs screenshot
Comment 9 Antoine "hashar" Musso (WMF) 2013-04-08 20:50:11 UTC
Rewording the summary. The root cause is the java process asking for 20GB memory on an instance having 4GB.

I have hacked the script locally to limit memory to 2GB.  Will see how well it goes then hack the puppet class and init.d script to let us easily tweak the memory settings for lucene.
Comment 10 Antoine "hashar" Musso (WMF) 2013-04-08 20:51:33 UTC
Command running right now is:

/usr/bin/java -Xmx2000m -Dsun.rmi.transport.tcp.handshakeTimeout=10000 -Djava.rmi.server.codebase=file:///a/search/lucene-search/LuceneSearch.jar -Djava.rmi.server.hostname=deployment-search01 -classpath :/usr/share/java/udp2log-log4j.jar:/a/search/lucene-search/LuceneSearch.jar org.wikimedia.lsearch.config.StartupManager

:)
Comment 11 Antoine "hashar" Musso (WMF) 2013-04-08 20:52:08 UTC
Taking bug, raising priority. I need to fix that this week.
Comment 12 Antoine "hashar" Musso (WMF) 2013-04-10 06:42:00 UTC
The deployment-search01 Icinga report is http://icinga.wmflabs.org/cgi-bin/icinga/extinfo.cgi?type=2&host=deployment-search01.pmtpa.wmflabs&service=Lucene+frontend

I have restarted the lucene-search-2 service that was apparently no more listening although there has been no OOM message :-] So we have some progress!
Comment 13 Antoine "hashar" Musso (WMF) 2013-04-20 13:13:40 UTC
Patches are:

https://gerrit.wikimedia.org/r/#/c/59995/ convert java opts to shell variables
https://gerrit.wikimedia.org/r/#/c/59996/ conf file for lucene.jobs.sh 
https://gerrit.wikimedia.org/r/#/c/60000/ enable conf file loading
Comment 14 Antoine "hashar" Musso (WMF) 2013-04-24 11:26:49 UTC
pending ops review, updating summary to reflect that.
Comment 15 Antoine "hashar" Musso (WMF) 2013-05-13 20:12:41 UTC
Peter has merged the changes and deployed them in production. I  have to make sure that works fine in labs and will most probably recreate the existing instances.

Most of the work has been completed, thus lowering priority.
Comment 16 Antoine "hashar" Musso (WMF) 2013-08-01 08:50:02 UTC
Chad and Nik have migrated beta to CirrusSearch extension which uses an
ElasticSearch backend.  Hence this Lucene search bug is no more valid :-)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links