Last modified: 2014-09-23 23:30:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17218, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15218 - LinkSearch results should use as much of the path as is provided, not simply search by domain
LinkSearch results should use as much of the path as is provided, not simply ...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
unspecified
All All
: Low normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch, patch-reviewed
: 32671 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-17 20:23 UTC by Mike.lifeguard
Modified: 2014-09-23 23:30 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
linksearch documentation enhancements (674 bytes, patch)
2009-02-23 18:46 UTC, Dan Jacobson
Details
improving MessagesEn.php documentation for linksearch (798 bytes, patch)
2011-11-10 05:49 UTC, Sumana Harihareswara
Details

Description Mike.lifeguard 2008-08-17 20:23:35 UTC
So you have a wildcard on the front, which is great.

But on the other end, if you have some path after the domain name, it is ignored: [[Special:Linksearch/*.linkedin.com/in/nguyenta]] is equivalent to [[Special:Linksearch/*.linkedin.com]], but shouldn't be. It should list only the links which follow the /in/nguyenta path (like Special:Prefixindex).
Comment 1 Brion Vibber 2008-08-20 17:44:56 UTC
Currently we can't cleanly do both a subdomain wildcard *and* a path prefix, because of the way the index is done.

A URL like this:

  http://sub.example.com/path/file.html

gets transformed to this indexable form:

  http://com.example.sub./path/file.html

This lets us do a subdomain-wildcard search:

  LIKE 'http://com.example.%'

or a path prefix search:

  LIKE 'http://com.example.sub./path/%'

Both of these are a straight prefix match, which is very efficiently indexed.

But if we wanted to search both a subdomain wildcard *and* a path, our query would look like this:

  LIKE 'http://com.example.%/path/%'

This can get us some efficient lookups for the first wildcard, but then the second wildcard has to be matched within the results, potentially very slow depending on the number of matches. For instance assuming there are lots of links to other Wikipedia pages, a search like this:

  LIKE 'http://org.wikipedia.%/w/api.php%'

would end up being very inefficient, since we wouldn't get the indexing boost on the path part.

Now in theory at least a good query optimizer might be able to speed that up a lot, but my impression is that MySQL isn't that smart about it right now and would just not bother touching the indexes for the second wildcard.
Comment 2 Mike.lifeguard 2008-08-20 18:10:07 UTC
So if you specify the prefix you can search within the path as well? That I did not know, and will make life much easier. Still, it'd be nice to have both I guess.
Comment 3 Raimond Spekking 2008-09-26 07:06:57 UTC
Extensions is now part of MediaWiki core (1.14alpha) -> changing product and component
Comment 4 Brion Vibber 2008-12-19 02:41:32 UTC
De-assigning this from me. Would be nice, but don't know a good clean way to index it in straight MySQL.
Comment 5 Dan Jacobson 2009-02-23 18:46:25 UTC
Created attachment 5850 [details]
linksearch documentation enhancements
Comment 6 Dan Jacobson 2009-02-23 18:48:26 UTC
Darn, if one makes an attchement, the comments that one types into this box here are blown away.

I was trying to say
Why not just let * by itself work. I want to find all links on my
small wiki, why force me to use one query for each possible TLD?!

Then part of my attachment wouldn't be needed.
Comment 7 p858snake 2011-04-30 00:08:58 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 8 Sumana Harihareswara 2011-11-10 05:49:53 UTC
Created attachment 9406 [details]
improving MessagesEn.php documentation for linksearch

This is just jidanni's patch, but rebased against current trunk.

-'linksearch-text'  => 'Wildcards such as "*.wikipedia.org" may be used.<br />
-Supported protocols: <tt>$1</tt>',
+'linksearch-text'  => 'Wildcards such as "*.wikipedia.org" may be used. Need at least a TLD, e.g., *.org<br />
+Supported protocols: <tt>$1</tt> (but don\'t enter them below!)',
Comment 9 Mark A. Hershberger 2011-11-16 21:45:49 UTC
r103386
Comment 10 Jesús Martínez Novo (Ciencia Al Poder) 2014-04-17 16:23:36 UTC
*** Bug 32671 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links