Last modified: 2005-11-15 08:04:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2563, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 563 - Broken Unicode links to Wikipedia at Commons
Broken Unicode links to Wikipedia at Commons
Product: Wikimedia
Classification: Unclassified
Interwiki links (Other open bugs)
All All
: High normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: utf8
: 645 (view as bug list)
Depends on:
Blocks: unicode
  Show dependency treegraph
Reported: 2004-09-23 01:05 UTC by Paweł Dembowski
Modified: 2005-11-15 08:04 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Paweł Dembowski 2004-09-23 01:05:20 UTC
There's a problem at Wikimedia Commons with linking to Wikipedia articles with
Unicode characters. For example, a link to [[w:pl:Stanis%C5%82aw Lem]] links to
[[w:pl:Stanis]] instead. External links like
[ Stanis%C5%82aw Lem]] can of course be
used, but it'd be better if they were no-arrow interwiki links.
Comment 1 Brion Vibber 2004-10-05 01:03:49 UTC
*** Bug 645 has been marked as a duplicate of this bug. ***
Comment 2 Brion Vibber 2004-10-05 01:04:58 UTC
From duplicate bug 645:

Dear friends,

at the 
first remark named "link to [[w:ro:Discuţie 
Utilizator:Gangleri]] ~ [[w:ro:User_talk:Gangleri]]" ilustrates 
that some special characters as "ţ" cause problems in InterWiki 

If this issue is already known please let me know where to read 
about such "exceptions".

It maight be somehow related to too because it 
shows the same effect.

Best regards Reinhardt
Comment 3 Brion Vibber 2004-10-05 01:06:16 UTC
Looks like a problem with the way the redirect is handled through
Comment 4 Jim Scarborough 2004-10-10 19:32:58 UTC has a link "[[Juraj Beneš]]"
which displays correctly on that page as "Juraj Beneš" links incorrectly to
"Juraj Beneš" ("Juraj_Bene%C5%A1").   
Comment 5 lɛʁi לערי ריינהארט 2004-11-09 21:11:32 UTC
I made a central place [[Wikipedia:Invalid article names]] in 
[[:Category:Wikipedia maintenance]] where such broken links can be listed.
This is an equivalent to [[Wikipedia:Duplicate articles]]. I know some 
others. I suppose they have been created with previous versions of WikiMedia 
Regards Reinhardt
Comment 6 lɛʁi לערי ריינהארט 2004-11-09 21:53:46 UTC
Notes: some weeks ago I made some remarks at 
rks#Invalid links (lists)]]. You may read some other related sections as well.

The problem is both the existence of pages which do not complay wit UTF-8 (?) 
and the usage of such links in en.wikipedia and directed to en.wikipedia from 
other projects.

It should be much easyer to verify the namespace then using a bot to scan for 
such links inside the articles, talks, categories ... .
Regards ~~~~ (Reinhardt)
Comment 8 lɛʁi לערי ריינהארט 2004-11-15 01:10:42 UTC
Dear friends,

a) Do we need a seccond test environment?
b) Should InterWiki translation should be be fixed in 1.4-cvs?
shows that we are testing on an Unicode environment.

This means that issues related to InterWiki translations trough 
en.wikipedia CAN NOT BE TESTED HERE.

My proposal:

Please make an environment and make the translations 
from [[test:]] to [[xx:]] as [[:fr:]], [[:pl:]], [[:ro:]], [[:ru:]], 
[[:he:]], [[:ja:]], [[:bg:]] ... trough because 
problems regarding translations to those targets (except [[:fr:]]?) are 

There are a lot of bugs which can be included then:
- bug 563,
- translations related to the anchor part of a link, where the anchor 
contains special characters as (, ), ", ', Unicode characters and so on.
- ???

Regards Reinhardt
Comment 9 lɛʁi לערי ריינהארט 2004-11-15 11:34:58 UTC
I was thinking on this again. It seems that three test environments are 
needed in order to fix this / to emulate all combinations of final 

B through which translations are made;
A and C in order to test:

- A translated trough B to C
- C translated trough B to A
- A translated trough B to B
- C translated trough B to C
- B translated to A
- B translated to C

If B will be a UTF-8 environment as it is now in 1.3.7. B should be an UTF-
8 too.

Regards Reinhardt
Comment 10 Kjell ANDRÉ 2005-01-06 14:00:15 UTC
(In reply to comment #3)
> Looks like a problem with the way the redirect is handled through

When testing a few variants of interwiki links you can see that:

: [[w:pl:Stanis%C5%82aw Lem]] and [[:en:pl:Stanis%C5%82aw Lem]] does not work.
: but [[:pl:Stanis%C5%82aw Lem]] and [[de:pl:Stanis%C5%82aw Lem]] does.

Obviously the first (or only) prefix determines how the name is interpreted. If it is a language prefix for a UTF8 
based wiki it works since the name is valid for that wiki, if it is a a prefix for a Latin-1 based wiki, the name is 
not valid for that wiki and is instead cut of at the invalid character and hence a broken link. 

To correct this, the normalization of names must be based on the last (or only) language prefix, not the first.

Best regards, Kjell ANDRÉ

Comment 11 Brion Vibber 2005-02-27 09:59:56 UTC
Now that bug 65 is fixed this was pretty easy to add on top. Checked in and put live.

Note You need to log in before you can comment on or make changes to this bug.