Last modified: 2010-05-15 16:02:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T16539, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 14539 - linktrail for digraphs wih apostrophe or grave accent
linktrail for digraphs wih apostrophe or grave accent
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
1.13.x
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-14 14:45 UTC by AlefZet
Modified: 2010-05-15 16:02 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description AlefZet 2008-06-14 14:45:56 UTC
[[:en:Karakalpak language]] uses digraphs with apostrophes like A', N', O', U' 
[[:en:Uzbek language]] uses digraphs with gravis like G`, O` and apstrophe (') as separated letter.
See [[:en:Alphabets derived from the Latin]]

r36253 introduced $linkTrail = '/^(\'?\p{L&}+)(.*)$/usD'; that works as well for many languages but Karakalpak and Uzbek. 

[[a'bc]]de becomes <u>a'bcde</u>
[[abc]]'de   =>    <u>abc'de</u>
[[abc]]d'e   =>    <u>abcd</u>'e rather than <u>abcd'e</u>

[[a`bc]]de becomes <u>a`bcde</u>
[[abc]]`de   =>    <u>abc</u>`de rather than <u>abc`de</u>
[[abc]]d`e   =>    <u>abcd</u>`e rather than <u>abcd`e</u>

I am not expert on regular expressions. What may create correct regex?

May be need introduce a new variable with specifed punctuation characters (dependent to language) treated as letters/letter elements?
Comment 1 Daniel Friesen 2008-06-15 18:32:54 UTC
Wait? What is the issue?

A) [[Foo]]Bar's does not link the 's currently.
B) ` should be included as a punctuation character.
C) [[Abc]]'de includes the 'de as part of the link but it shouldn't in Karaklpak or Uzbek.

A ToDo of mine was to move the definition of the flat character pattern for things that match a letter into a constant. That way we can create slightly altered $linkTrails for some languages which need special exceptions, like ยป in Ba which can't be included cause it would break other languages, and also to simplify the creation of more complex regexes.

If the issue is A), that's a ToDo of mine, the more complex regex noted above was one which uses the character classes twice to allow for a single inclusion of ', but not restricted to the start.

If the issue is B), that's the reason why I created [http://lists.wikimedia.org/pipermail/wikitech-l/2008-June/038323.html this thread] in wikitech-l and it would be helpful there to create a good list, and also get input on what characters should be common and what ones shouldn't. (The intent is to make linkTrails as language independent as possible).

If the issue is C), then I can fix that by working on the note above, and creating a definition for those two languages but without the punctuation.
Comment 2 AlefZet 2008-06-17 11:52:32 UTC
Issue mentioned above. Please read carefully

[[abc]]d'e   =>    <u>abcd</u>'e (incorrect) rather than <u>abcd'e</u> (correct)

[[abc]]`de   =>    <u>abc</u>`de (incorrect) rather than <u>abc`de</u> (correct)
[[abc]]d`e   =>    <u>abcd</u>`e (incorrect) rather than <u>abcd`e</u> (correct)

Solutions may be
a. include ' and ` as characters in regex, or
b. introduce special variable in Messages file per language that adding above characters to regex (when need)
Comment 3 Daniel Friesen 2008-07-04 13:18:55 UTC
Oh right, need to close this one.
Bug 14655 ended up causing me to remove the feature of ' being included inside of the LinkTrail in r36693.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links