Last modified: 2014-09-23 22:36:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T4336, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 2336 - Automatic links to internal wiki articles based on patterns
Automatic links to internal wiki articles based on patterns
Status: REOPENED
Product: MediaWiki extensions
Classification: Unclassified
Extensions requests (Other open bugs)
unspecified
All All
: Low enhancement with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch, patch-reviewed
: 4886 7015 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-05 18:41 UTC by jediarchives11
Modified: 2014-09-23 22:36 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
First draft of extension (4.71 KB, text/plain)
2005-07-25 16:18 UTC, Gregory Szorc
Details
Second draft of extension (4.91 KB, text/plain)
2006-01-06 02:29 UTC, jediarchives11
Details
Version 0.3 (4.97 KB, text/plain)
2006-01-07 06:18 UTC, jediarchives11
Details
Version 0.45 (5.61 KB, text/plain)
2007-06-25 17:04 UTC, jediarchives11
Details

Description jediarchives11 2005-06-05 18:41:56 UTC
When a user is editing a page, would there be a way to have the system automatically 
check for related articles.  Ex.  A user adds the word physics, but doesn't link it 
because he doesn't know that there is an article about physics.  This automatic 
linking, the system would check the newly edited part of the article and see if any 
of the words match article names and change the word to a link to that article.
Comment 1 lɛʁi לערי ריינהארט 2005-06-05 20:08:35 UTC
changed subject from "automatic linking" to "automatic wikification"
Comment 2 Ævar Arnfjörð Bjarmason 2005-06-05 20:09:42 UTC
Changed the product to MediaWiki extensions.
Comment 3 Gregory Szorc 2005-07-25 16:18:24 UTC
Created attachment 741 [details]
First draft of extension

First draft of automatic wikification extension.  It needs some work in the
regular expression arena.  It is designed to work with the 1.5 db layout.
Comment 4 jediarchives11 2005-07-26 05:11:55 UTC
Wow.  Someone decided to make this extension.  Thanks.  I'm not that knowledgable with coding 
unfortunately, but I will try to help as much as I can.  I've already read the code and thanks 
to your comments have been able to understand a majority of it.  I already have a couple 
comments that I hope will help.  Does it allow you to do custom namespaces with the extension?  
And how many sql queries will this generate?  I know my website provider has a limit of # of 
sql queries per user per hour.  Also, if you would like to test on a wiki (other than the CWRU 
wiki that you run) you can use my wiki if you'd like, which should be up soon.  
Comment 5 Gregory Szorc 2005-07-26 12:58:22 UTC
The good news about this extension is that it only generates 1 SQL query.  The
bad news about the SQL query is that it can be massive, depending on the size of
the article being saved.  However, this giant query only gets executed when a
page is actually saved.

There are currently some limitations to the extension.  The primary limitations
are the poorly written regular expressions.  As it stands, the replacement
regular expression is the worst.  It will replace text, but will mess up
formatting in the process.  In addition, the script does not yet support
namespaces other than the main namespace.  This change should be trivial,
however.  Functions for generating links to internal topics can be found in
'includes/Title.php' (I believe).

Also, automatic wikification, although it sounds cool, has some drawbacks.  When
I ran it on some test articles on http://wiki.case.edu, it would convert common
words like "case" to links because "Case" is the shorthand name of my
university.  Unless I am mistaken, the MediaWiki hook system does not allow you
to return the text from the pre-save hook (which this extension is) and have the
user verify it.

If this extension is to become used in production environments, it will need
some attending by those with more experience with regular expressions than I. 
Once those problems are fixed, I will attend to fixing the other issues.
Comment 6 jediarchives11 2005-07-26 14:08:36 UTC
Sorry, but what exactly do you mean by "regular expressions"?  Also, to reduce the size of the 
query, would it be helpful to have the extension determine what is different between the new 
and old versions before scanning for links?  That won't add links to articles created since 
the last time the entire article was scanned, so maybe not.  And you're right, it might be 
best to have readers check it before it adds the links.  If that's not possible, something 
else would have to be done.
Comment 7 Gregory Szorc 2005-07-26 15:49:36 UTC
A regular expression is a method to match text patterns.  They are a very powerful tool.  See http://en.wikipedia.org/wiki/Regular_expression for more info.

Finding a diff between versions and then doing the substitution would be very difficult.  You would have to extract the old contents, run a search on the new terms, and somehow do a string replace on the 
autowikified links only in the new text.  The last part seems a bit challenging.

In all honesty, I think it would be more beneficial to spend time writing thorough documentation on creating links than working on this extension.  When it comes to creating content, humans will always be able 
to do a better job than computers.  Automatic wikification, although cool, will not always be perfect.

An alternative to investigate would be a tool run by experienced wiki users that scans articles for possible links and prompts whether to change the text into a link.
Comment 8 jediarchives11 2005-08-19 03:23:45 UTC
I don't know why I didn't think of this before, but could an exclude list / key / 
attribute / column / whatever be the solution to at least one problem.  For 
example, make the "case" article exempt from automatic wikification.   This can be 
done whatever way makes it easiest to code.  This would eliminate one major 
problem of words with multiple meanings being turned into links when they 
shouldn't be.
Comment 9 jediarchives11 2005-10-19 01:56:34 UTC
I feel that if someone who knows more coding could work on this, it could be made 
much better.
Comment 10 JDPorter 2005-10-19 02:04:19 UTC
I will help with the regexes.
Comment 11 jediarchives11 2005-11-16 16:30:30 UTC
Not sure how the regex stuff is going, but I have another question. Is it possible to 
run this just once through the database by running the file on the internet, or does 
it have to be done when pages are saved?  What would I have to change to get it to 
work that way?
Comment 12 jediarchives11 2005-12-17 22:08:52 UTC
I've been working on getting this to run for all pages in the main namespace at 
one time, and it has become very confusing and frustrating.  If ANYONE can help 
out that knows MediaWiki and PHP, their help would be greatly appreciated.  Thanks.
Comment 13 jediarchives11 2006-01-06 02:29:31 UTC
Created attachment 1266 [details]
Second draft of extension

The second draft fixed a bug in the first draft that would take out the space
before the word that is linked.  Also, the extension does not seem to be
linking phrases, although it should.  Hopefully I will figure out how to make
an exclude list soon.
Comment 14 Filip Maljkovic [Dungodung] 2006-01-06 15:57:18 UTC
Didn't you guys think about the possibility where this autowikification tool
links to too many articles. Let's face it, en: wikipedia is a big one and there
are a lot of articles about lots of different stuff. The result of this can be
almost totally blue text. This is somewhat unwanted. On the other hand, small
wikis have little articles and this could barely help. All in all, I think it's
a good idea, but it needs human control, IMO. After all, let's not forget that
red links aren't bad in small wikis - they are, conversely, helpful and good for
the project, but red links aren't a part of this extension, so I'll shut up. :)
Comment 15 jediarchives11 2006-01-06 19:42:50 UTC
Yes, I never expected this to be used on en: wikipedia.  It would be used on small to 
medium wikis, mainly to add links to things that the author didn't know about.  I do 
agree, however, that there should be some human control.  Maybe instead of 
automatically adding the links, instead just suggesting them and allowing the user to 
choose which to include.  

Soon, I will upload a new version, one that now includes a way to exclude pages.  For 
example, my wiki's about page kept linking.  With the exclude list, you can add the 
word "about" as a word to exclude.  You can also use this to keep the number of links 
down.

Lastly, you're right, red links are good, but like said, this extension doesn't do 
anything with them.
Comment 16 jediarchives11 2006-01-07 06:18:08 UTC
Created attachment 1270 [details]
Version 0.3

This new version includes a way to exclude words by modifying the $excludelist
array.	Eventually, you will be able to set this in localsettings.php.	Also,
linking phrases now works correctly.

Unfortunately, new bugs have been discovered.  The extension will not link the
last word in the article and words with periods or commas (ex. home,) will not
be handled correctly.
Comment 17 Melancholie 2006-01-25 12:40:13 UTC
For the German Wikipedia there is a wikifier on:

http://217.160.138.71/development/wikipedia/wikify/

It works fine and could exemplify for this request.
Comment 18 jediarchives11 2006-01-25 18:44:21 UTC
Unfortunately I don't speak German, so if there is anyone that could translate 
this to English or provide the code (with English comments) here, that would be 
great.
Comment 19 Brion Vibber 2006-02-06 02:31:46 UTC
*** Bug 4886 has been marked as a duplicate of this bug. ***
Comment 20 Rob Church 2006-02-21 04:04:53 UTC
A note on running this on all pages once; if well-written, then a wrapper around
the code could be provided in the form of a custom maintenance script which
could rip through all article pages.
Comment 21 Filip Maljkovic [Dungodung] 2006-08-15 15:35:52 UTC
*** Bug 7015 has been marked as a duplicate of this bug. ***
Comment 22 Daniel 2006-08-16 07:23:53 UTC
Is there a possibility to add functionality of including wikification only from 
a "whitelist" (nothing else would be considered). 

I mean the contrast to posting #16: $includelist.

Thanks for support!
Comment 23 jediarchives11 2007-06-25 17:04:49 UTC
Created attachment 3826 [details]
Version 0.45
Comment 24 Sumana Harihareswara 2011-12-23 18:03:03 UTC
Adding "need-review" keyword to indicate extension awaits review.  jediarchives11, you might want to check whether an extension like this already exists (look on mediawiki.org) - if it doesn't, you should probably update your extension to work with MediaWiki as it is now, and then follow these instructions: https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment
Comment 25 Max Semenik 2012-04-20 21:11:19 UTC
Rm patch-needs-review - never going to be deployed on WMF for non-technical reasons. Technically, works only with $wgDBprefix = 'wiki', uses raw SQL, pegs master with requests perfectly suitable for slaves.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links