Last modified: 2010-05-15 16:02:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 14539 - linktrail for digraphs wih apostrophe or grave accent
linktrail for digraphs wih apostrophe or grave accent
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2008-06-14 14:45 UTC by AlefZet
Modified: 2010-05-15 16:02 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description AlefZet 2008-06-14 14:45:56 UTC
[[:en:Karakalpak language]] uses digraphs with apostrophes like A', N', O', U' 
[[:en:Uzbek language]] uses digraphs with gravis like G`, O` and apstrophe (') as separated letter.
See [[:en:Alphabets derived from the Latin]]

r36253 introduced $linkTrail = '/^(\'?\p{L&}+)(.*)$/usD'; that works as well for many languages but Karakalpak and Uzbek. 

[[a'bc]]de becomes <u>a'bcde</u>
[[abc]]'de   =>    <u>abc'de</u>
[[abc]]d'e   =>    <u>abcd</u>'e rather than <u>abcd'e</u>

[[a`bc]]de becomes <u>a`bcde</u>
[[abc]]`de   =>    <u>abc</u>`de rather than <u>abc`de</u>
[[abc]]d`e   =>    <u>abcd</u>`e rather than <u>abcd`e</u>

I am not expert on regular expressions. What may create correct regex?

May be need introduce a new variable with specifed punctuation characters (dependent to language) treated as letters/letter elements?
Comment 1 Daniel Friesen 2008-06-15 18:32:54 UTC
Wait? What is the issue?

A) [[Foo]]Bar's does not link the 's currently.
B) ` should be included as a punctuation character.
C) [[Abc]]'de includes the 'de as part of the link but it shouldn't in Karaklpak or Uzbek.

A ToDo of mine was to move the definition of the flat character pattern for things that match a letter into a constant. That way we can create slightly altered $linkTrails for some languages which need special exceptions, like ยป in Ba which can't be included cause it would break other languages, and also to simplify the creation of more complex regexes.

If the issue is A), that's a ToDo of mine, the more complex regex noted above was one which uses the character classes twice to allow for a single inclusion of ', but not restricted to the start.

If the issue is B), that's the reason why I created [ this thread] in wikitech-l and it would be helpful there to create a good list, and also get input on what characters should be common and what ones shouldn't. (The intent is to make linkTrails as language independent as possible).

If the issue is C), then I can fix that by working on the note above, and creating a definition for those two languages but without the punctuation.
Comment 2 AlefZet 2008-06-17 11:52:32 UTC
Issue mentioned above. Please read carefully

[[abc]]d'e   =>    <u>abcd</u>'e (incorrect) rather than <u>abcd'e</u> (correct)

[[abc]]`de   =>    <u>abc</u>`de (incorrect) rather than <u>abc`de</u> (correct)
[[abc]]d`e   =>    <u>abcd</u>`e (incorrect) rather than <u>abcd`e</u> (correct)

Solutions may be
a. include ' and ` as characters in regex, or
b. introduce special variable in Messages file per language that adding above characters to regex (when need)
Comment 3 Daniel Friesen 2008-07-04 13:18:55 UTC
Oh right, need to close this one.
Bug 14655 ended up causing me to remove the feature of ' being included inside of the LinkTrail in r36693.

Note You need to log in before you can comment on or make changes to this bug.