Last modified: 2014-11-16 00:08:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T36919, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 34919 - Language conversion is not applied in documents delivered by the Collection extension
Language conversion is not applied in documents delivered by the Collection e...
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Collection (Other open bugs)
unspecified
All All
: Normal major with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
Depends on: 41716
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-03 01:22 UTC by Ziyuan Yao
Modified: 2014-11-16 00:08 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Корисник:Никола Смоленски/Collection bugs.pdf (42.95 KB, application/pdf)
2014-09-25 20:28 UTC, Nemo
Details

Description Ziyuan Yao 2012-03-03 01:22:43 UTC
After the fixing of Bug 33430, the Chinese Wikipedia community says there is still another problem that prevents them from adopting the latest MediaWiki version that provides PDF/ebook creation for the Chinese Wikipedia.

This remaining problem is, because wiki text of the Chinese Wikipedia is a mix of both simplified and traditional Chinese (mainlanders tend to contribute edits in simplified Chinese, while Taiwanese / Hong Kongese tend to contribute in traditional Chinese), it needs to be converted to all-simplified or all-traditional before being displayed or made into PDFs.
Comment 1 Liangent 2012-03-03 04:33:52 UTC
Language converter is not only used on zhwiki.
Comment 2 Volker Haas 2012-03-05 08:20:18 UTC
Is the conversion to all-simplified of all-traditional done for "regular" display in the browser - and therefore only a problem with the PDFs at the moment? If that is the case: 

* how is the conversion done for the browser
* can someone provide a minimal example with simplified and traditional chinese
* what would be a good start to read in order to understand the problematic of simplified vs. traditional chinese and conversion methods
Comment 3 Ziyuan Yao 2012-03-05 08:40:29 UTC
The Chinese Wikipedia itself already has a simplified <-> traditional Chinese automatic conversion tool for displaying. It is explained here:

http://meta.wikimedia.org/wiki/Automatic_conversion_between_simplified_and_traditional_Chinese

An example of the conversion in action:

Simplified: http://zh.wikipedia.org/zh-cn/%E4%BA%94%E4%BB%A3%E5%8D%81%E5%9B%BD

Traditional: http://zh.wikipedia.org/zh-tw/%E4%BA%94%E4%BB%A3%E5%8D%81%E5%9B%BD
Comment 4 Liangent 2012-03-05 09:14:41 UTC
(In reply to comment #2)
> Is the conversion to all-simplified of all-traditional done for "regular"
> display in the browser - and therefore only a problem with the PDFs at the
> moment? If that is the case: 
> 
> * how is the conversion done for the browser
> * can someone provide a minimal example with simplified and traditional chinese
> * what would be a good start to read in order to understand the problematic of
> simplified vs. traditional chinese and conversion methods

Technically the language conversion process is done after the normal parsing process. This means if you parse the article in your own way (to generate PDF) you have to apply conversion to your parser result manually. Note that the current converter (in languages/LanguageConverter.php) is just designed to convert HTML.
Comment 5 Ziyuan Yao 2012-03-05 09:18:53 UTC
I'm sure there are many PHP-based simplified/traditional Chinese conversion libraries.
Comment 6 Liangent 2012-03-05 09:21:17 UTC
(In reply to comment #5)
> I'm sure there are many PHP-based simplified/traditional Chinese conversion
> libraries.

mwlib (the wikitext parser & PDF generator used by Extension:Collection) is not written by PHP. Besides you have to consider conversion markups such as -{}-.
Comment 7 Volker Haas 2012-03-05 09:58:08 UTC
The conversion script doesn't exactly look trivial: http://svn.wikimedia.org/doc/LanguageConverter_8php_source.html

Does anybody have an idea how to get the conversion done without the need to reimplement the language converter in python suitable for mwlib?
Comment 8 Ziyuan Yao 2012-03-05 10:02:17 UTC
Google for an existing python-based conversion library?
Comment 9 Ralf Schmitt 2012-03-05 10:05:23 UTC
or just ask for patches?
Comment 10 Ziyuan Yao 2012-03-05 10:08:23 UTC
Google Translate also offers simp. <-> trad. Chinese conversion. Maybe call its API?
Comment 11 Liangent 2012-03-05 11:17:25 UTC
(In reply to comment #10)
> Google Translate also offers simp. <-> trad. Chinese conversion. Maybe call its
> API?

Even in LanguageConverter.php, more code is used to do, for example, parsing conversion markup, grabbing proper parts to convert, reading on-site conversion table, handle page links etc., than actually convert the text.
Comment 12 Ziyuan Yao 2012-03-07 04:44:20 UTC
I increasingly believe, such features should better be implemented on the client side, e.g. a "site to pdf ebook" program that converts a given site (blog, wiki, pages of certain depth from a start page, etc.) to a pdf.
Comment 13 Ziyuan Yao 2012-03-07 04:45:23 UTC
If you do it too "back end"-wise, you have to much processing in the middle, like this chinese conversion thing.
Comment 14 Volker Haas 2012-03-07 07:41:14 UTC
The problem with the "client-side" approach is that every client needs to re-implement these specific features (like the simple/traditional conversion).

If we ever use HTML as the base for PDF rendering this problem will be solved as long as MediaWiki takes care of the transformation. In the meantime I'd happily accept a patch for the problem, but I lack the time to implement the simple/traditional conversion.
Comment 15 Ziyuan Yao 2012-03-07 07:52:07 UTC
(In reply to comment #14)
> The problem with the "client-side" approach is that every client needs to
> re-implement these specific features (like the simple/traditional conversion).

No, because simple/traditional conversion is already taken care of by the Chinese Wikipedia on the server side.

> 
> If we ever use HTML as the base for PDF rendering this problem will be solved
> as long as MediaWiki takes care of the transformation. In the meantime I'd
> happily accept a patch for the problem, but I lack the time to implement the
> simple/traditional conversion.

That's exactly why I think third-party client-side or browser-side pdf/ebook creation solutions would provide what PrediaPress hasn't provided.
Comment 16 Tian-Jian "Barabbas" Jiang 2012-09-08 02:29:51 UTC
FYI, before LanguageConverter.php, there's a quick'n'dirty trail of LanguageZh.php: https://bugzilla.wikimedia.org/show_bug.cgi?id=5343
Comment 18 Nemo 2014-09-25 20:28:44 UTC
Created attachment 16595 [details]
Корисник:Никола Смоленски/Collection bugs.pdf

Serbian test case PDF as produced by [[mw:OCG]]/rdf2latex/new PDF rendering.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links