Last modified: 2014-11-19 10:24:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T12808, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 10808 - Introduce Special:Suffixindex
Introduce Special:Suffixindex
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
unspecified
All All
: Low enhancement with 4 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/wiki/Wikipedi...
: schema-change
: 2168 11395 21864 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-08-05 22:07 UTC by Eep²
Modified: 2014-11-19 10:24 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Eep² 2007-08-05 22:07:19 UTC
Is this possible? Some requests/discussions about it (and the problems caused by NOT having it):

http://meta.wikimedia.org/wiki/MediaWiki_Ideal#Suffixindex
http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_%28disambiguation_pages%29/Archive_29#All_pages_beginning_with...

Of course, wildcard search (such as implemented in http://www.mediawiki.org/wiki/Extension:Wildcard_search ) would probably work too, but it would be nice if MediaWiki had that functionality by default.
Comment 1 Brion Vibber 2007-08-08 17:40:10 UTC
No DB support for doing this efficiently.
Comment 2 Eep² 2007-08-08 17:42:41 UTC
Well, how about changing the DB then? :)
Comment 3 Eep² 2007-09-04 11:22:06 UTC
Using Dynamic Page List (http://www.mediawiki.org/wiki/Extension:DynamicPageList ), a simple single-line query can do this (and MUCH more): {{#dpl:namespace=|titlematch=%{{PAGENAME}}}}

Why can't MediaWiki do this by default? Doesn't seem like it would be that hard to do...
Comment 4 Brion Vibber 2007-09-04 19:12:48 UTC
Because it's very expensive, requiring a scan of up to several million page entries. It's only efficient on a very small wiki.
Comment 5 Eep² 2007-09-04 21:02:33 UTC
I just don't get how a prefix index can somehow be magically OK yet a suffix index is soo expensive? Limit the results to x per query or something if it's THAT expensive. There are most definitely (and certainly) ways of limiting database queries so they're not so "expensive"...
Comment 6 Brion Vibber 2007-09-05 14:26:35 UTC
A prefix index is just an index. Because of the way indexes work in the database, doing an exact-prefix match comes "for free".

We didn't have to do *anything* to get the prefix index. It was already there.

Limiting results would not have any useful effect -- to be efficient, you have to limit the amount of data you search through. And that requires having a sensible index for it.


To do a _suffix_ index, we'd have to:

1) Create a second indexed field with a *reversed* copy of the content

2) Populate that field throughout the entire database -- an expensive one-time setup operation which will require a few hours' work, careful coordination, and some downtime for the site.

3) Keep that field around forever.

While it's possible to do it, there's not a lot of interest in it and it's a minor feature at best for a big change, so you'll probably not see people tripping over themselves to implement it right away.
Comment 7 Eep² 2007-09-07 12:20:25 UTC
On the way to a proper and decent search engine then?
Comment 8 Rob Church 2007-09-19 11:55:46 UTC
*** Bug 11395 has been marked as a duplicate of this bug. ***
Comment 9 MZMcBride 2009-04-20 01:35:52 UTC
Re-opening this. There are legitimate use cases for this (like Wiktionary, cf. bug 17999). Nothing's really holding this up except a developer willing to implement it (i.e., no reason to leave this as LATER).
Comment 10 百楽兎 2009-12-16 11:19:48 UTC
*** Bug 21864 has been marked as a duplicate of this bug. *** 

Comment 11 百楽兎 2009-12-16 11:20:46 UTC
*** Bug 21864 has been marked as a duplicate of this bug. ***
Comment 12 Dan Jacobson 2010-11-20 08:50:37 UTC
From an outsider's point of view it looks like an oversight to only have [[Special:PrefixIndex]].

On it you could put mention the reason for no complimentary [[Special:SuffixIndex]] there for people to see...

Tons of times when I wanted to use it...

"Let's find all the "... Agency"s on my wiki", etc.
Comment 13 Casey Brown 2010-11-20 23:41:50 UTC
(In reply to comment #12)
> "Let's find all the "... Agency"s on my wiki", etc.

...or "pages in Chinese (PAGE/zh)". :-)
Comment 14 MZMcBride 2011-05-17 18:50:46 UTC
I wonder if the page_props table could be used for this. The current schema has the following unique index:

CREATE UNIQUE INDEX /*i*/pp_page_propname ON /*_*/page_props (pp_page,pp_propname);

If a second index were added on (pp_propname, pp_value), you could store the reversed page title with a "suffixindex" key fairly easily, couldn't you? Except that pp_value is a blob... bleh.
Comment 15 Nemo 2011-07-14 11:19:53 UTC
Because someone (like me) could look for this feature here, I leave here a link to a Toolserver tool that can be used to get such titles (thank you MZMcBride), for instance http://toolserver.org/~mzmcbride/yanker/?db=enwiktionary_p&list=pages&category=&page=&pattern=.%2Bps%24&namespace=0&limit=5000&sort=&wrapper=%241
Comment 16 Bawolff (Brian Wolff) 2011-09-11 04:02:40 UTC
So say we want a list of All things ending with "ed" - would you want that list in alphabetical order? (If the answer is yes this would be really difficult to do, if no then less difficult). Since if you want it in alphabetical order we'd basically need an index sorting it in both directions since first we jump to everything ending in "ed", and then we need to sort from the other end (normal direction) of the word. Otherwise in our suffix search (for say the suffix "ed") we'd have sorting like "yielded, zigzagged, recovered, edited, amortized" (Note how the third last letter of each word is alphabetically ordered)


I suppose one could create a table that for each page name had a partially reversed page name, and to what point its reversed, for all possible points-

For example for the page Fred (with page_id 1) we'd have:

page_id | page_reversed_point | page_reversed
--------+---------------------+--------------
 1      | 1                   | dFre
 1      | 2                   | edFr
 1      | 3                   | redF
 1      | 4                   | Fred
--------+---------------------+--------------

So each page would have up to 255 entries depending on how long the title is. (because the user could search for a suffix between 1 to 255 letters long). However, having such a table would be INSANE ;)


Thus unless SuffixIndex would be useful with results in non-alphabetical order (which seems unlikely) this should be wontfixed as unfeasible imo.
Comment 17 Nemo 2011-09-11 08:55:23 UTC
(In reply to comment #16)
> Thus unless SuffixIndex would be useful with results in non-alphabetical order
> (which seems unlikely) this should be wontfixed as unfeasible imo.

Alphabetical order doesn't seem necessary at all to me.
Comment 18 MZMcBride 2011-09-11 19:11:08 UTC
(In reply to comment #16)
> Thus unless SuffixIndex would be useful with results in non-alphabetical order
> (which seems unlikely) this should be wontfixed as unfeasible imo.

I don't think it's that infeasible to add a page.page_title_reversed column and index it (possibly in a page_title_reversed,page_title pair).
Comment 19 Bawolff (Brian Wolff) 2011-09-11 19:23:20 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > Thus unless SuffixIndex would be useful with results in non-alphabetical order
> > (which seems unlikely) this should be wontfixed as unfeasible imo.
> 
> I don't think it's that infeasible to add a page.page_title_reversed column and
> index it (possibly in a page_title_reversed,page_title pair).

My point was that would result in a non-alphabeticaly ordered suffix list. If that's ok then it is potentially feasible to do, if a non-alphabetical list is not acceptable, it would not be possible to reasonably do this (As far as I can tell).
Comment 20 Dan Jacobson 2011-09-14 01:38:52 UTC
Just add an SQL "ORDER BY" clause at the end.
Comment 21 Max Semenik 2011-09-14 10:02:39 UTC
(In reply to comment #20)
> Just add an SQL "ORDER BY" clause at the end.

And it will be sorted by reverse title. Not quite useful.
Comment 22 Chong-Dae Park 2012-11-23 02:52:41 UTC
*** Bug 2168 has been marked as a duplicate of this bug. ***
Comment 23 Janus Troelsen 2013-08-27 00:58:45 UTC
There is at least some DB support for doing this efficiently.

In PostgreSQL, since 9.1, there is a built-in function for reversing strings: http://www.postgresql.org/docs/9.2/static/functions-string.html

In 9.0 and earlier, the function can be built in PL/pgSQL (search for it :)

It is also possible to build indexes on expressions at least since 7.4: http://www.postgresql.org/docs/7.4/static/indexes-expressional.html (didn't find any older documentation)
Comment 24 Bawolff (Brian Wolff) 2013-08-27 02:02:39 UTC
(In reply to comment #23)
> There is at least some DB support for doing this efficiently.
> 
> In PostgreSQL, since 9.1, there is a built-in function for reversing strings:
> http://www.postgresql.org/docs/9.2/static/functions-string.html
> 
> In 9.0 and earlier, the function can be built in PL/pgSQL (search for it :)
> 
> It is also possible to build indexes on expressions at least since 7.4:
> http://www.postgresql.org/docs/7.4/static/indexes-expressional.html (didn't
> find any older documentation)

We use mysql (technically mariadb) not postgress
Comment 25 Nemo 2013-08-27 09:25:57 UTC
(In reply to comment #24)
> We use mysql (technically mariadb) not postgress

It would be nice if PostgreSQL support also had some things *more* in MediaWiki, isn't it possible to add a feature depending on the used DB?
Comment 26 Bawolff (Brian Wolff) 2013-08-27 16:46:59 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > We use mysql (technically mariadb) not postgress
> 
> It would be nice if PostgreSQL support also had some things *more* in
> MediaWiki, isn't it possible to add a feature depending on the used DB?

Its possible, but something we don't generally do (in core anyways. Extensions can do what they want).

Typically our db compatibility policy is that it must work on mysql, as mysql is our primary supported db platform.

-----

Adding a fancy index like that in postgress, is probably not that much different from adding an extra field in mysql with the title reversed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links