Last modified: 2014-09-23 19:54:28 UTC
DidYouMean is designed for the English Wiktionary to automate the use of the {{see}} template there which links articles whose titles differ only by capitalisation, use of diacritics, spaces, hyphenation, apostrophes, etc. It adds two metadata tables which are maintained by hooks in all places where articles can be created, renamed, or deleted. Metadata is kept only for non-redirects in the main namespace. A list of links to "similar" articles is added to all articles pages in view mode and also to the 'nogomatch' and 'noarticletext' pages.
Created attachment 3075 [details] source for DidYouMean extension source for DidYouMean extension
Created attachment 3076 [details] DidYouMean extension diff for mainline code Hooks for 'noarticletext' and SpecialUndelete
Created attachment 3077 [details] DidYouMean diff for the extension itself The code for the extension and its installer
Since en.wiktionary.org (and presumably others) have [Appendix:Names] and all those name entries, were you planning on adding any other name-oriented normalizing to this? Or is SOUNDEX the next phase?
Handling appendices would require parsing whole pages which is more complex than just parsing the {{see}} template. Soundex turned out to be a lot more promiscuous than I expected. It seemed to only take into account the first part of the words resuling in enormous lists of matching words for each word and not being as alike as you'd expect. Metaphone should be better but I couldn't get the library to work in the account you gave me. I'd been thinkig about anagrams and textonyms next but a) they are language-dependent, and b) they require parsing and replacing whole sections of articles which as often as not are not in any well-defined format. Another idea is to scan all redlinks and possibly blue links except that they won't have canonical casing and there is no easy way to sort the wheat from the chaff akin to ignoring redirects in article space.
Well, I meant for the resulting main namespace entries, not taking apart the Appendices themselves.
Please add wikibugs-l@wikipedia.org to the CC list when you assign the bugs.
First impressions are that this is quite a neat little extension and could have great potential use. The "did you mean" message itself needs to be more obtrusive - think coloured boxes - it's almost invisible on a search results page.
Thanks Rob. The idea was that on the English Wiktionary it will just look like what we've already been doing for ages without all the manual labour. Once it's out there people should modify it to do something bigger on the search page, and maybe not ignore redirects for Wikipedia like it does for wiktionary.
Created attachment 3144 [details] DidYouMean extension diff for mainline code * Fixed return value at 'noarticletext' * Use new hook in SpecialUndelete instead of my own
Created attachment 3145 [details] DidYouMean diff for the extension itself * Fix broken installer * Use new SpecialDelete hook instead of my own
Created attachment 3168 [details] extension diff with changes suggested by Brion Added table prefix in .sql file Added addQuotes and tableName calls to constructed queries
Created attachment 3197 [details] extension diff with changes suggested by Tim Starling * All functions and variables are now prefixed with wfDym- * The database lookup is now done inside the parser hook
Created attachment 3198 [details] Fixed extension diff Fixed a regression that slipped in.
Committed the current version to extensions in r19837 to make it a little easier to work with updates while testing.
A few notes on current state of the extension... Setup: * Should use update hooks so the table can get installed by standard update.php * install.php should be replaced with a script that simply allows rebuilding the normalization entries Caching: * 'see also' bits embedded into pages won't be automatically updated when the page is already cached. For cache-correctness, it'll need to look up affected pages on addition/removal of normalization entries and schedule them for purges (and, possibly, link refresh) Internationalization: * It's hardcoded for particular English templates, which seems a bit icky. In general I'm not too comfortable with the way it messes about with the text of pages as they're parsed. A totally separate 'similar pages' UI component might be cleaner. *shrug*
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Marking "reviewed" as the extension has been reviewed by Brion in comment 16.
I've removed DidYouMean from https://www.mediawiki.org/wiki/Review_queue until the author responds to comment 16 .
Andrew Dunbar: Resetting the assignee and status of this issue because there has been no progress in the last years. Feel free to take it again when you are actually planning to fix this. Thanks.