Last modified: 2014-10-17 11:43:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 68922 - Many non-latin fonts don't cover the latin character set
Many non-latin fonts don't cover the latin character set
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Collection (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: C. Scott Ananian
: i18n
Depends on:
Blocks: 28206
  Show dependency treegraph
 
Reported: 2014-07-31 15:47 UTC by C. Scott Ananian
Modified: 2014-10-17 11:43 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2014-07-31 15:47:05 UTC
The XeLaTeX backend for the new OCG PDF renderer does not fallback if the font selected for a given language does not contain a given codepoint.  For many possible fonts for a given language, the latin code pages are not included.  This makes page numbers, dates, citation numbers, and even bullets in lists render as tofu (blank square boxes).

The latex renderer should keep track of which code pages are present in a font, and add explicit font-switch commands to the output when needed.  (Including redefining the command used for bullets in lists to ensure it is rendered in a latin font.)

Ideally we could automatically generate coverage tables from a font.  But at *least* we should treat the Latin codepoints as a special case, and fall back to the default latin font when latin codepoints are used in a font without latin codepage coverage.

(This has been an issue for Russian, Indic languages, the google noto fonts, etc, and accounts for most of the present difficultly in choosing an appropriate font for a particular wiki language.)
Comment 1 Gerrit Notification Bot 2014-08-02 20:27:15 UTC
Change 151360 had a related patch set uploaded by Cscott:
Use Lohit fonts when possible.

https://gerrit.wikimedia.org/r/151360
Comment 2 Gerrit Notification Bot 2014-08-03 08:44:12 UTC
Change 151360 merged by jenkins-bot:
Use Lohit fonts when possible.

https://gerrit.wikimedia.org/r/151360
Comment 3 C. Scott Ananian 2014-08-13 21:40:17 UTC
The above patches partially fix the problem -- they switch to the default latin font for latin code pages.  But we should really enumerate the full set of code points mapped by a font.  That's part two of fixing this bug.
Comment 4 C. Scott Ananian 2014-08-13 21:46:58 UTC
Note that the default latin font doesn't cover the ~ character (!), which is used in https://en.wikipedia.org/wiki/Moon#Internal_structure in the sentence, "this is only ~20% the size of the Moon, in contrast to the ~50% of most other terrestrial bodies".
Comment 5 Nemo 2014-08-21 10:11:46 UTC
This sort of font hardcoding really doesn't scale... https://gerrit.wikimedia.org/r/#/c/151360/1/lib/index.js,cm

Is it really impossible to load the appropriate fonts as installed on the server (we already install them for EasyTimeline etc. (e.g. bug 20825)?
See also the ULS fontrepo: https://git.wikimedia.org/tree/mediawiki%2Fextensions%2FUniversalLanguageSelector/HEAD/data%2Ffontrepo

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links