Last modified: 2013-09-26 15:06:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54906, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52906 - User interface elements should not show up in CirrusSearch search result excerpts
User interface elements should not show up in CirrusSearch search result exce...
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nik Everett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-16 00:22 UTC by Sumana Harihareswara
Modified: 2013-09-26 15:06 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
"[edit | edit source]" in the search results snippet (22.38 KB, image/png)
2013-08-16 00:41 UTC, Sumana Harihareswara
Details

Description Sumana Harihareswara 2013-08-16 00:22:52 UTC
1. Search test2wiki for    environment   : https://test2.wikipedia.org/w/index.php?search=environment&title=Special%3ASearch

2. Notice that the first result has, as the text snippet:

Geography
making maps Countries of the world Natural environment[edit | edit source] Climate Soil Rivers Rocks

3. Click through to https://test2.wikipedia.org/wiki/Geography and notice that "[edit | edit source]" is not in the real text of the article.


I think CirrusSearch should not be displaying "[edit | edit source]" in the text snippets in the search results.
Comment 1 Sumana Harihareswara 2013-08-16 00:40:43 UTC
Another repro case, slightly different:

Search for "Valiant" https://test2.wikipedia.org/w/index.php?search=valiant&title=Special%3ASearch and you'll see the result "Blooper", with the text excerpt being:

"A Bug's Life, Toy Story 2, Monsters, Inc., and Valiant. Contents 1 The "blooper" in pop culture 1"

The words after "Valiant" are part of the table of contents of the page.
Comment 2 Sumana Harihareswara 2013-08-16 00:41:12 UTC
Created attachment 13104 [details]
"[edit | edit source]" in the search results snippet
Comment 3 Nik Everett 2013-08-16 01:25:21 UTC
Try this one:  https://test2.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=edit+source&fulltext=Search

The action item here:  remove the edit links and any other automatically added text from the page before dropping it into the search backend.  Also, remove the able of contents if possible.  I'm pretty sure the edit links and their ilk are super high priority but I'm not sure of the priority on the table of contents.
Comment 4 Sumana Harihareswara 2013-08-16 23:55:39 UTC
https://test2.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=video+sorry&fulltext=Search gets me a link to https://test2.wikipedia.org/wiki/Birch_beer that includes the excerpt:

a heart as big as a whale. Also: enjoy this video! Sorry, your browser either has JavaScript disabled

So that's one more automatically added bit of text to remove from the search corpus.
Comment 5 Nik Everett 2013-08-20 14:13:11 UTC
I've pushed a fix to gerrit:  https://gerrit.wikimedia.org/r/#/c/80018/

I'll set this bug to PATCH_TO_REVIEW once I push some regression tests to review as well.
Comment 6 Nik Everett 2013-08-20 14:40:58 UTC
Tests: https://gerrit.wikimedia.org/r/#/c/80021/

I forgot to include the bug number in the commit messages but these links should help.
Comment 7 MZMcBride 2013-08-20 16:28:54 UTC
Tweaked the summary a bit. Older summary included: "[edit | edit source]", ToC text, & "JS disabled" warning. Genericized this to user interface elements and clarified that this is a CirrusSearch-specific issue.
Comment 8 Gerrit Notification Bot 2013-08-21 11:43:29 UTC
Change 80018 had a related patch set uploaded by TTO:
Remove parts of rendered page from search.

https://gerrit.wikimedia.org/r/80018
Comment 9 Gerrit Notification Bot 2013-08-23 14:31:21 UTC
Change 80018 merged by jenkins-bot:
Remove parts of rendered page from search.

https://gerrit.wikimedia.org/r/80018
Comment 10 Nik Everett 2013-09-26 15:06:42 UTC
Live and working.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links