Last modified: 2014-09-24 00:14:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T33015, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 31015 - Convert between English language variants in display of pages
Convert between English language variants in display of pages
Status: NEW
Product: MediaWiki
Classification: Unclassified
Language converter (Other open bugs)
unspecified
All All
: Low enhancement with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n, javascript
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-19 22:38 UTC by jjk
Modified: 2014-09-24 00:14 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description jjk 2011-09-19 22:38:25 UTC
Submitted by Jeff Kinz

This is an idea to settle US vs UK spelling issue on WP.

Overview: Support both, have browser show the one desired by the viewer.


 How it would work: For authoring pages:

    AS = alternate spelling(s)

    A notation like {UK:colour|US:color} for page input and editing. 
    
    It can support as many alternate spellings,(AS), per word, as needed. 

    AS words that are not in AS notation could be detected when changes
    are submitted, and converted to AS notation automatically.

    Its possible no AS notation is needed at all.  If servers can detect
    all AS words upon submission, and set a flag on the page or set AS
    notation on the AS words, then the server can present the preferred
    spelling of the word on the page at output time**. (see below)


 How it might work: For Displaying: 

    Two possibilities.

    First method:  The WP server determines which AS the viewer wants
    and generates the page with that version of the spelling.

    Second mothod: Javascript in the page looks up the viewer's
    preference and alters the document to have the matching spelling.


    Summary:  Using both may be the most cost effective.

    1. Javascript in page looks for a preference cookie. Displays the page
       using the selected spelling style. 

    2. If there is no cookie yet, the Javascript displays the AS words 
       as clickable.  If viewer clicks on the word, a select spelling 
       style dialog is shown 


    2.  Determine preference
    3.  Set a cookie to last until end of viewers current session(s).
    4.  Page content is based on cookie preference. 


RISK DUE to IGNORANCE- 
    Are there any English words which have two meanings, but only one
    of those meanings has UK/US alternate spelling?

    By this I mean the following: Assume a word 'A' which has two
    different meanings: A-1 and A-2. 

    For meaning A-1, A is spelled 'A' in both US and UK spelling.  
    For meaning A-2 the UK spelling is still 'A'
    For meaning A-2 the US spelling is "A#". 
    

    If word A is in a page, meaning A-1, and that page is processed for
    display with US alternate spelling, the A is changed to A# thereby
    changing the meaning from A-1 to A-2.

    I don't know if any such words exist, but they may.  If any such words 
    exist then this idea cannot be used.  The problem of determining semantic
    word meanings from context is only partially solved by Bayesian analysis 
    or hidden Markov chains. And both are expensive to calcuate while neither
    produces human level quality answers.


    If no such words exist, then this solution is a viable one. 

    A word with one spelling but multiple meanings: "read" . It can mean
    "I will read the manual." or it can mean "I have read the manual."
    The first is pronounced like "reed".  The second is pronounced like
    "red".

    One word, one spelling, two different meanings, two different
    pronunciations.

    Worse, the phrase "I read the book" can use either meaning of the
    word. So programatically deciding which meaning of a word the
    sentence is using is not workable.  In this case it doesn't matter
    which meaning is used because both are spelled the same.  But that
    may not be true for all AS words.


    Here is a contrived example of a word with 2 meanings, whose spelling
    changes in one UK/US spelling style:

    Assume the word "blew" has two meanings, both are past tense.

    #1 - To strike, hit.  "I blew him down."       (I struck him down)
    #2 - blowing air.     "I blew across the cup." (I breathed across the cup)

    Assume that in US spelling meaning #2 is spelled bloo while meaning
    #1 is spelled the same way, "blew", in both UK and US.

    In converting #1 above from UK to US spelling the meaning would change:

    "I blew him down"     ( I struck him down )
    "I bloo him down."    ( I breathed him down )   {have a mint, fella!}
 
    Because the server cannot determine which meaning the UK version has,
    it cannot accurately determine which word to display for a US page.
     

    Conclusion: 

    If it can be determined that there are no English words that 
    fit the scenario above, this idea can be used. Otherwise, it cannot. 

** Or a bot can scan the WP database and set the flag on any
    pages with any AS words on them.

-- 
This email partially created with "Dragon Naturally Speaking" speech
recognition system.  A tool I'm proud to have worked on.
Note: the email may have incorrectly transcribed content.

Jeff Kinz, Emergent Design.    "Carpe Diem!"

"Piscis Carpe" ->"Fish the Seize"
Comment 1 Robin Pepermans (SPQRobin) 2011-09-19 22:44:34 UTC
This could probably be done by making a LanguageConverter for English (I had actually made this some time ago, but there might be issues with the fallback system). 
In that case JavaScript is not needed for this.
Comment 2 Thor Malmjursson 2011-09-19 22:45:29 UTC
Seems like an exceptionally bright idea to settle one of the longest running
age old arguments on Wiki - "Whose spelling is right?" - Personally, I don't
mind since English is English, and I regularly mix the two anyhow.  I'm voting
for this.
Comment 3 jjk 2011-09-19 23:03:48 UTC
One additional note:  

If editors enter the UK/US alternates spelling for each instance of an AS word, then this idea can be used even if there are English words which produce the scenario described under RISK DUE to IGNORANCE
Comment 4 jjk 2011-09-19 23:11:21 UTC
And another: 

Another way to defeat the RDI (Risk Due to Ignorance) scenario is to scan each change submission for new instances of AS words and generate a dialog box for the editor to select the correct alternate spellings.   

If the meaning of the word being used has no alternate spelling, then the editor  selects "no alternate" and that instance is not tagged as an AS word so the RDI scenario never happens. 

If the meaning of the word being used does have an alternate spelling across the pond, then the editor selects that alternate from the list and that alternate is kept in the page so the AS selection code has a human expert decision to relay on. Once again - no RDI scenario.
Comment 5 Helder 2011-09-26 23:30:44 UTC
You may also want to take a look on the following proposal of using the LanguageConverter on Wikisources for modernization of old texts:
http://wikisource.org/wiki/Wikisource:Scriptorium/Archives/Jan_2010_-_Dec_2010#Using_LanguageConverter_syntax_at_Wikisources

There is also an JavaScript being used on some Wikisources for modernization of old texts. See
* [[MediaWiki talk:Modernisation.js]]
* [[fr:s:Wikisource:Scriptorium/Janvier_2011#New_version_of_script_for_modernization]]
It was also adapted so that it could be used as a "Language Converter" on Portuguese Wikipedia (since bug 26121 is still open).

There are instructions for using it as a user script to deal with English variants on English Wikipedia. See:
[[Wikipedia:WikiProject_User_scripts/Scripts/Language_Converter]]
Comment 6 Alex Monk 2013-07-28 02:51:19 UTC
(In reply to comment #0)
>     Two possibilities.
> 
>     First method:  The WP server determines which AS the viewer wants
>     and generates the page with that version of the spelling.

I don't think this can be done because of caching.
Comment 7 Liangent 2013-07-28 10:01:04 UTC
(In reply to comment #6)
> (In reply to comment #0)
> >     Two possibilities.
> > 
> >     First method:  The WP server determines which AS the viewer wants
> >     and generates the page with that version of the spelling.
> 
> I don't think this can be done because of caching.

This is already done on zhwiki.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links