Last modified: 2014-09-23 23:30:51 UTC
So you have a wildcard on the front, which is great. But on the other end, if you have some path after the domain name, it is ignored: [[Special:Linksearch/*.linkedin.com/in/nguyenta]] is equivalent to [[Special:Linksearch/*.linkedin.com]], but shouldn't be. It should list only the links which follow the /in/nguyenta path (like Special:Prefixindex).
Currently we can't cleanly do both a subdomain wildcard *and* a path prefix, because of the way the index is done. A URL like this: http://sub.example.com/path/file.html gets transformed to this indexable form: http://com.example.sub./path/file.html This lets us do a subdomain-wildcard search: LIKE 'http://com.example.%' or a path prefix search: LIKE 'http://com.example.sub./path/%' Both of these are a straight prefix match, which is very efficiently indexed. But if we wanted to search both a subdomain wildcard *and* a path, our query would look like this: LIKE 'http://com.example.%/path/%' This can get us some efficient lookups for the first wildcard, but then the second wildcard has to be matched within the results, potentially very slow depending on the number of matches. For instance assuming there are lots of links to other Wikipedia pages, a search like this: LIKE 'http://org.wikipedia.%/w/api.php%' would end up being very inefficient, since we wouldn't get the indexing boost on the path part. Now in theory at least a good query optimizer might be able to speed that up a lot, but my impression is that MySQL isn't that smart about it right now and would just not bother touching the indexes for the second wildcard.
So if you specify the prefix you can search within the path as well? That I did not know, and will make life much easier. Still, it'd be nice to have both I guess.
Extensions is now part of MediaWiki core (1.14alpha) -> changing product and component
De-assigning this from me. Would be nice, but don't know a good clean way to index it in straight MySQL.
Created attachment 5850 [details] linksearch documentation enhancements
Darn, if one makes an attchement, the comments that one types into this box here are blown away. I was trying to say Why not just let * by itself work. I want to find all links on my small wiki, why force me to use one query for each possible TLD?! Then part of my attachment wouldn't be needed.
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Created attachment 9406 [details] improving MessagesEn.php documentation for linksearch This is just jidanni's patch, but rebased against current trunk. -'linksearch-text' => 'Wildcards such as "*.wikipedia.org" may be used.<br /> -Supported protocols: <tt>$1</tt>', +'linksearch-text' => 'Wildcards such as "*.wikipedia.org" may be used. Need at least a TLD, e.g., *.org<br /> +Supported protocols: <tt>$1</tt> (but don\'t enter them below!)',
r103386
*** Bug 32671 has been marked as a duplicate of this bug. ***