Last modified: 2014-08-28 00:19:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T33838, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 31838 - zh-tw still gives simplified Chinese in link titles on Facebook and Google
zh-tw still gives simplified Chinese in link titles on Facebook and Google
Status: REOPENED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
http://news.gmane.org/group/gmane.sci...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-20 03:43 UTC by Dan Jacobson
Modified: 2014-08-28 00:19 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dan Jacobson 2011-10-20 03:43:07 UTC
Others will have to fill in the bug details as all this is over my head.
All I know is
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/56069
says
> How sad that the first answer here is a "Not our problem :-)!"...
so it must be our problem.
Comment 1 Mark A. Hershberger 2011-10-20 15:35:53 UTC
Resolving invalid until someone points out on this bug what, specifically, we are supposed to do.
Comment 2 Christian Neubauer 2011-10-20 17:24:43 UTC
The suggestion in the thread was to not include rel="canonical" on zh-tw or zh-hk pages since they don't really fit the definitions listed here:  https://www.google.com/support/webmasters/bin/answer.py?answer=189077.  Specifically, M. Williamson said:

 Umm what the link actually says is this:

 "This is recommended in the following scenarios:
 - You translate only the template of your page, such as the navigation and
 footer, and keep the bulk of your content in a single language. This is
 common on pages that feature user-generated content.
 - Your page targets users in multiple regions (for example, en-us, en-uk,
 and en-ie), but each regional version differs only in small details, such as
 the currency used."

 Neither of these are true; the entire contents of the whole page are
 different (therefore the first scenario does not apply), and Simplified vs.
 Traditional is a non-trivial difference not at all analogous to "small
 details such as the currency used" (therefore the second scenario does not
 apply either).
Comment 3 Daniel Friesen 2011-10-20 18:14:54 UTC
Those are "recommendations", pure guidelines, and they are not an exhaustive list of precisely when that should be used.
The intention of that list is essentially to tell you that it's incorrect to use the hreflang pattern if the entire contents are human translated. In other words, they're saying that it's incorrect to use this pattern to point rel=canonical to en.wp, hreflang=de to de.wp, hreflang=ja to ja.wp, etc... because the content on various pages is not guaranteed to actually be the same thing because it's written by different communities and because manual updates can be desynced. Google does not want to send users two different pages when one is manually translated and may be out of date.
Conversion of language scripts used by the whole content is NOT mentioned on that page in a way to imply anything without asking directly.
Comment 4 Dan Jacobson 2011-10-21 03:18:57 UTC
All I know is showing Taiwan users Simplified Chinese previews on
Facebook links no matter how hard one tries to avoid creating them is
probably just as bad as showing Indian Hindi users Pakistani Urdu
previews, even though the languages might sound the same, when the users
see the wrong "alphabet' they say "I'm not going to click on that, that's
meant for people who live in a different region". They will not take
more than one second to decide and will click elsewhere.
Comment 5 Mark 2011-10-21 08:49:29 UTC
The point is that any link to the "canonical" version of the page has a relatively good chance of being partly unintelligible to readers. The entire contents of the page are different; as I stated in my e-mail, these are non-trivial differences. Imagine if linking to an English Wikipedia article displayed a link to a page in French...
Comment 6 Dan Jacobson 2011-10-22 01:19:53 UTC
At least be sure the two countries involved have at least established diplomatic relations.

Else it wipes out all the effort of the smaller country to edit Wikipedia if their hard work is shown as representing the larger country.
Comment 7 Daniel Friesen 2011-10-22 02:25:24 UTC
I should make something clear.
Doing a search for links from other sites to zh.wp gives me various forms of links:
- Some pointing to http://zh.wikipedia.org/wiki/*
- Some pointing to http://zh.wikipedia.org/zh/*
- Some pointing to http://zh.wikipedia.org/zh-tw/*
- Some pointing to http://zh.wikipedia.org/zh-hant/*
- etc...

If rel=canonical is removed, search engines will begin to consider the same article to be a different article depending on it's link. And that includes /wiki/ being considered a different article from the same article in the default variant.

This can have a very negative effect on zh.wp in searches. Because of the /wiki/ and default variant separation zh.wp may be penalized for appears to be duplicate content. Users may end up finding unintuitive multiple results in searches. zh.wp pages may start to have lower ranks in search engines as people link to different variants and paths in the url causing the incoming ranking for a single page to be split amongst multiple versions of itself giving them all lower rank.

This will also have a negative effect on the results users see. Instead of being based on what language a user has selected, search results will be based purely on what rankings pages have. In other words if most people link to a /zh/ page a user visiting Google with zh-tw will see search results linking to /zh/ because the /zh-tw/ page doesn't have as much ranking.


Some possible actions:
- Remove rel=canonical and accept the number of issues this will cause for zh.wp. - This will require a discussion on zh.wp and community consensus. As this change is already possible with MW this bug will be closed as WORKSFORME and if the zh.wp community achieves a consensus a shell bug to change the setting of $wgCanonicalLanguageLinks on zh.wp can be opened.
- Facebook and other search engines implementing rel=canonical can implement support for the rel=alternate hreflang= so that they always serve urls relevant to the visitor's language. This bug will be closed as INVALID and someone can open a bug report in some place relevant.
- We could implement a Content-Language based variant redirection on the /wiki/ path so visiting users will be redirected automatically to their variant. - There may however be some technical reasons this may not be possible. Users following links from sites that specific a specific variant may still end up on a variant they don't want.
- Implementing og:url pointing to the current variant's url may be possible. Facebook and Google's +1 button both seam to implement support for this so it's a possibility. - This however will mess with opengraph and have a similar effect as the search engine effect within the number of shares/likes and +1s a pae has (ie: If 25 people like one article on zh.wp, 3 of them doing so on the zh-tw page while the rest do so on the zh page. A user capable of seeing how many likes a page got will see 3 likes when visiting the zh-tw page when in reality that page got 25 likes). This will also not fix any issue in search engines.


By the way. rel=canonical was added by r75617 as an alternative option because there seamed to be issues with trying to implement conditional redirection. Relevant bug 21672.
Comment 8 Dan Jacobson 2011-10-22 03:23:52 UTC
I even selected Google's "Only Traditional Chinese Results".

And what do I get? Wikipedia's Simplified Chinese title, however with
Traditional Chinese content.

So fix those atrocious titles!

http://www.google.com.tw/search?q=趙少康&hl=zh-TW&ie=UTF-8&oe=UTF-8&prmd=ivns&source=lnt&tbs=lr:lang_1zh-TW&lr=lang_zh-TW&sa=X&ei=iDWiTuyIMKfgmAX75e2fCQ&ved=0CAkQpwUoAg

      + 網路
      + 所有中文網頁
-->   + 繁體中文網頁
      + 台灣的網頁
      + 外文網頁翻譯版

繁體中文網頁

搜尋結果

 1. 趙少康- 维基百科,自由的百科全书

    您公開 +1 了這個項目。 復原

    趙少康(1950年11月16日-),生於台灣,中華民國前政治人物,在政壇上有「政治金童」之稱。曾代表新黨參選1994年台北市長選舉。
    現為著名媒體人。父親為河南涉縣( ...

    生平 - 家庭 - 相關條目 - 参考資料
    zh.wikipedia.org/zh-tw/趙少康 - 頁庫存檔 - 類似內容
Comment 9 Daniel Friesen 2011-11-07 18:56:37 UTC
Side note, relevant bug explicitly asking for that rel=alternate hreflang=* appears to be bug 27362.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links