Last modified: 2012-06-04 01:13:47 UTC
PDF export extension doesn't support لا in Arabic script. https://secure.wikimedia.org/wikibooks/ar/w/index.php?title=%D8%AE%D8%A7%D8%B5:%D9%83%D8%AA%D8%A7%D8%A8&bookcmd=download&collection_id=b1bc7ad46bebae20&writer=rl&return_to=%D8%B3%D9%84%D9%81%D9%86%D9%8A+3+%D8%AC%D9%86%D9%8A%D9%87%3A+%D8%A7%D9%84%D8%A5%D8%AA%D8%B5%D8%A7%D9%84%D8%A7%D8%AA+%D9%88%D8%A7%D9%84%D9%85%D8%AC%D8%AA%D9%85%D8%B9+%D9%81%D9%8A+%D9%85%D8%B5%D8%B1
PDF export extension has problem with * https://secure.wikimedia.org/wikibooks/ar/wiki/%D8%B3%D9%84%D9%81%D9%86%D9%8A_3_%D8%AC%D9%86%D9%8A%D9%87:_%D8%A7%D9%84%D8%A5%D8%AA%D8%B5%D8%A7%D9%84%D8%A7%D8%AA_%D9%88%D8%A7%D9%84%D9%85%D8%AC%D8%AA%D9%85%D8%B9_%D9%81%D9%8A_%D9%85%D8%B5%D8%B1 in PDF version page 4 and it looks messy
*** Bug 17766 has been marked as a duplicate of this bug. ***
and PDFs still is LTR instead of RTL
Could any of you please do me two favors: 1) provide a minimal example: a) create a page in your user space containing exactly a single arabic word which has one character missing in the PDF OR b) post this example word right here 2) I also need a suggestion for a font which is suitable for the arabic script. This font ideally completely covers the following unicode blocks: * Arabic http://www.unicode.org/charts/PDF/U0600.pdf * Arabic Supplement http://www.unicode.org/charts/PDF/U0750.pdf * Arabic Presentation Forms-A http://www.unicode.org/charts/PDF/UFB50.pdf * Arabic Presentation Forms-B http://www.unicode.org/charts/PDF/UFE70.pdf The most important thing is that this font is guaranteed to include the missing glyph in the example provided in 1). ---- I fixed the problem for fa.wikipedia.org (LTR instead of RTL) but didn't update the render servers yet. I'll do that later today probably.
(In reply to comment #4) > Could any of you please do me two favors: > > 1) provide a minimal example: > a) create a page in your user space containing exactly a single arabic word > which has one character missing in the PDF OR > b) post this example word right here > > 2) I also need a suggestion for a font which is suitable for the arabic script. > This font ideally completely covers the following unicode blocks: > * Arabic http://www.unicode.org/charts/PDF/U0600.pdf > * Arabic Supplement http://www.unicode.org/charts/PDF/U0750.pdf > * Arabic Presentation Forms-A http://www.unicode.org/charts/PDF/UFB50.pdf > * Arabic Presentation Forms-B http://www.unicode.org/charts/PDF/UFE70.pdf > The most important thing is that this font is guaranteed to include the missing > glyph in the example provided in 1). > > ---- > I fixed the problem for fa.wikipedia.org (LTR instead of RTL) but didn't update > the render servers yet. I'll do that later today probably. 1)I wrote "لا" in my user space ( http://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf ) 2)we have problem with ligutar ( http://en.wikipedia.org/wiki/Typographic_ligature )connection ل + ا ==> لا .it is rendering problem not missing glyph. 3)the best font that is now using in fa.wiki and fa.book for printing ( http://fa.wikipedia.org/wiki/%D9%85%D8%AF%DB%8C%D8%A7%D9%88%DB%8C%DA%A9%DB%8C:Print.css ) ( http://fa.wikibooks.org/wiki/%D9%85%D8%AF%DB%8C%D8%A7%D9%88%DB%8C%DA%A9%DB%8C:Print.css ) is Nazli that is used in SVG rendering. it is also beautiful for Arabic printing, but till now Arabic wiki projects don't have any Css printing definition. 4)we have many RTL wikies such as "mzn, glk, ckb, ur, pnb, arz, dv, ps, sd, ks, yi,.." that they need RTL support
1-it has also problem with ZWNJ(http://en.wikipedia.org/wiki/Zero-width_non-joiner) 1-1-sample :http://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf 2-would you tell me please where we can translate words like (fetching,rendering,..), i checked in http://translatewiki.net/w/i.php?title=Special:Translate&group=ext-collection-other&language=fa&task=view&offset=0&limit=500#msg_coll-rendering_finished_title but i couldn't find it's address for translating.
Thanks a lot for compiling the page with the problematic markup! Regarding the font choice: Nazli is already used...therefore the problem with the missing glyphs is probably not caused by the font - I'll investigate. The only problem with Nazli seems to be that there is not bold variant. Can you confirm that? Wouldn't that be a problem? I also added the above mentioned languages to the list of RTL languages [1] Words like fetching, rendering can currently not be localized. The problem is that currently only the rendering software is localized. The component which fetches the resources and starts the rendering process is not localized. Note: I found out that the width calculation for arabic currently fails, that is probably the reason for the strange alignment of all text. I am currently trying to solve that. [1] http://code.pediapress.com/git/mwlib.rl?p=mwlib.rl;a=commit;h=1abb5024b50b82bc117c730c8c255529fb3ec484
(In reply to comment #7) > Thanks a lot for compiling the page with the problematic markup! > > Regarding the font choice: Nazli is already used...therefore the problem with > the missing glyphs is probably not caused by the font - I'll investigate. The > only problem with Nazli seems to be that there is not bold variant. Can you > confirm that? Wouldn't that be a problem? > > I also added the above mentioned languages to the list of RTL languages [1] > > Words like fetching, rendering can currently not be localized. The problem is > that currently only the rendering software is localized. The component which > fetches the resources and starts the rendering process is not localized. > > Note: I found out that the width calculation for arabic currently fails, that > is probably the reason for the strange alignment of all text. I am currently > trying to solve that. > > [1] > http://code.pediapress.com/git/mwlib.rl?p=mwlib.rl;a=commit;h=1abb5024b50b82bc117c730c8c255529fb3ec484 thank you for you responsible replies in http://meta.wikimedia.org/wiki/SVG_fonts Nazli has bold variant also we used it in http://fa.wikipedia.org/wiki/%D9%85%D8%AF%DB%8C%D8%A7%D9%88%DB%8C%DA%A9%DB%8C:Print.css as ''.wikitable caption {font-weight: bold;}''for wikitable caption so it has bold variant
I listed other bugs in https://secure.wikimedia.org/wikibooks/ar/wiki/%D9%85%D8%B3%D8%AA%D8%AE%D8%AF%D9%85:Reza1615/test#problem_with_charts.27_alinment because currently it has RTL.
according to bug 745 please add also 'am' ,'arc','bcc' ,'bqi' ,'dz' ,'ha' ,'he' ,'ku' ,'ug' as RTL wikis
after updating servers in fa.wiki all of bugs are solved http://fa.wikipedia.org/wiki/%D8%AA%D9%88%DA%A9%D9%84_%D9%85%D8%A7_%D8%A8%D9%87_%D8%AE%D8%AF%D8%A7%D8%B3%D8%AA except: 1- problem in rendering لا 2-would you please convert page numbers and any numbers that are in text as formatnum or 1==>۱ 2==>۲ 3==>۳ 4==>۴ 5==>۵ 6==>۶ 7==>۷ 8==>۸ 9==>۹ 0==>۰ 3- rendering reordered non-spacing marks 4-pdf export in fa.wiki doesn't support ''' as bold 5-infobox's font is smaller that usual also its shape is wider that usual http://fa.wikipedia.org/wiki/%D9%85%DB%8C%D9%84%D8%A7%D9%86 6-pdf export doesn't support correctly location map in any wiki (also en wiki) http://en.wikipedia.org/wiki/Ahvaz
Maybe problem of لا is from nazli font. so I added Roya font in Mediawiki:print.css. Roya and Nazli is open source and GNU-based font. you can see details in here: http://fa.farsiweb.ir/fawiki/Persian_Fonts
for missing bold nazli has nazlib font may be you didn't use it!
I thinks there is another bug for books see: http://fa.wikipedia.org/wiki/%DB%B4%DB%B2_(%D9%85%DB%8C%D9%84%D8%A7%D8%AF%DB%8C) and please make a PDF from this page. the table has so many problems! :(
I added the above mentioned languages to the set of rtl-languages. More importantly: I have fixed the bug that was responsible for the general mis-alignment of all arabic text. (More specifically: all text that required character shaping) The missing bold font is also fixed (nazlib as suggested). Furthermore I fixed the problem with "لا" - this should have taken care of all "missing glyph" boxes in the text. The main problem that persists is rendering of complex/stacked non-spacing marks as shown in [1]. I have no idea how to fix that - that might be complex. Amir: I'll take a look at the article you mentioned. [1] https://secure.wikimedia.org/wikibooks/ar/wiki/%D9%85%D8%B3%D8%AA%D8%AE%D8%AF%D9%85:Reza1615/test#rendering_reordered_non-spacing_marks
i updated http://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf 1-it is contained bugs that amir mentioned this bug is with {{#expr: }} and {{formatnum: |R}} 2- this extension doesn't support Farsi numbers in text also with # 3-for non-spacing marks firfox had this bug and they solved it in this report https://bugzilla.mozilla.org/show_bug.cgi?id=635639 4-this extension force all of text to be RTL but some of texts must be LTR (I mentioned it in my sample)
i updated http://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf it doesn't support italic and italic bold
Regarding the italic problem: I couldn't find an italic (nor bold italic) variant of the Nazli font. For this to work I need: 4 font files (regular, bold, italic, bold-italic) which cover the unicode ranges Arabic, Arabic Supplement, Arabic Presentation Forms A + B. The zip file linked from [1] does not contain a font satisfying all criteria. [1] http://fa.farsiweb.ir/fawiki/Persian_Fonts
I just stumbled over Freefarsi: http://fpf.sourceforge.net/per/index.html ubuntu/debian package: ttf-freefarsi This font looks promising - I'll check if it works properly for arabic/farsi
I have checked this font in LibreOffice and seems to be ok.
Created attachment 8933 [details] test pdf with nazli font Source of PDF: http://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf&oldid=5430743
Created attachment 8934 [details] test pdf with freefarsi font Source of PDF: http://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf&oldid=5430743
I just uploaded two test document. The one using freefarsi fixes the italic/bold-italic issue - this was expected. One more thing that seems (mostly?) fixed are the non-spacing marks: could you guys please check the "rendering reordered non-spacing marks" section and compare it to the source [1]. The only issue I can see is wrong vertical positioning of the non-spacing marks. Any hints on how to solve that are appreciated (thanks for the firefox bug link, that already helped...). Is the freefarsi version readable? The reason that I am asking is: completely fixing this issue might be beyond the scope currently. [1] http://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf&oldid=5430743
(In reply to comment #22) > Created attachment 8934 [details] > test pdf with freefarsi font > > Source of PDF: > http://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf&oldid=5430743 1. bugs with non-spacing marks is solved 2. italic and bold italic now are ok New bugs with will happen Freefarsi: 1. This font is replacing bullets of "*" wikimarkaup withة 2. According to http://fa.wikipedia.org/wiki/%D9%BE%D8%B1%D9%88%D9%86%D8%AF%D9%87:L2_versus_L3.jpg with freefarsi هٔ can not rendered well so this font is not good enough for using in Persian and Arabic texts! Please don't use this font because it is not standard. Nazli is the best as I checked that in http://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf&printable=yes and it can render in italic form by browser. So, in my opinion it is better to use italic render engine instead of using italic font.
vertical positioning of the non-spacing marks is not so important and it is OK :) horizontal position is important .in this sample it is correct.
Italic: If freefarsi can't be used that's fine with me, but: I can't get italic to work if I use the Nazli font. The browser might render a non-italic font in an italic way - the render engine I am using to produce PDFs can't do that. => I need a font including all its variants (normal, bold, italic, bold-italic). Everything that is missing will be rendered using the regular font. Something completely different: The direction of the text is sometimes broken. This happens if pretty much anything except plain text is used in conjunction with LRT text when the base direction is RTL. Example: C<sub>50</sub>H<sub>70</sub>... (as you pointed out Reza) or an even more fun example: '''توسط word one''' word two - همکاری I can't fix this. Not today, and not anywhere in the near future. If this is supposed to be fixed I need to switch the underlying PDF render engine. Since there is no real alternative to what we use now (reportlab) this is even more complicated. Therefore I am guessing that changing the render engine would take at least 6 months. Unfortunately this is completely out of scope at the moment. If the text direction problem I just described is a show stopper for right-to-left languages than we can stop now. If the problem does not occur too often and is acceptable we can go on. The third alternative is swapping the rendering engine, but a sponsor for 6 months of work would need to be found ;)
Created attachment 8942 [details] updating webfarsi italic font and solving freefarsi bug I made italic version for Nazli also I solved freefarsi bullet bug . would you please test them is it possible that pdf render engine, renders python or other language codes in color mode? http://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf2 would you please solve other bugs such as numbers and infobox's size?
I rendered test documents with the updated fonts you, reza, provided. I chose to upload them to our server which is more convenient for me and shouldn't make a difference for anybody else. Please compare: http://pediapress.com/files/rtl/test_nazli.pdf http://pediapress.com/files/rtl/test_freefarsi.pdf I spotted a possible bug in the bold variant of the nazli font: the line-heights seem to be a little too big. This might result in the missing spacing below the bold paragraph. If the Nazli font is supposed to be used then this should be fixed as well. Reza: I'll look in the other issues. Regarding the numbers: which numbers should be converted to the farsi variant? Only for numbered lists or also for page numbers etc.
Created attachment 8958 [details] Nazli font bold New version 1-I attached new version of Nazli bold version please replace it with old one 1-1- in Nazli version i saw the non-spacing marks had problem but in freefarsi it is ok. sadly freefarsi font is not Persian style some of its glyphs are in Urdu's style.is it possible to solve the non-spacing marks' problem in Nazli? 2- for numbers: 2-1-page number 2-2-numbered lists 2-3-citation numbers (references numbers) 2-4- all of the numbers in Infobox in my sample are in person but in pdf version only some of them are converted.
1-syntax highlight in english version is colorful but in fa.wiki is black and wight http://en.wikipedia.org/wiki/User:Reza1615/pdf http://fa.wikipedia.org/wiki/کاربر:Reza1615/pdf 2- location map in both en.wiki and fa.wiki is incorrect.
(In reply to comment #29) > Created attachment 8958 [details] > Nazli font bold New version > > 1-I attached new version of Nazli bold version please replace it with old one > 1-1- in Nazli version i saw the non-spacing marks had problem but in freefarsi > it is ok. sadly freefarsi font is not Persian style some of its glyphs are in > Urdu's style.is it possible to solve the non-spacing marks' problem in Nazli? This probably has to be "fixed" in the font. My assumtion is based on the mozilla/firefox bug report you mentioned earlier: the non-spacing marks have "wrong" width values. A rendering engine with better support for diacritics probably would not exhibit this behavior. But for the PDF rendering backend we are using the font probably needs to be corrected: the width of the non-spacing marks has to be explicitly set (or unset?) to zero. A comparison of one non-spacing mark in the two fonts might reveal the problem > > 2- for numbers: > 2-1-page number > 2-2-numbered lists > 2-3-citation numbers (references numbers) > 2-4- all of the numbers in Infobox in my sample are in person but in pdf > version only some of them are converted. I'll check that and try to fix it.
Created attachment 8959 [details] Nazli New pak I changed all of non-spacing marks properties for (Nazli,Nazlib,Nazli-italic,Nazlibold-italic).would you please test them?
(In reply to comment #30) > 1-syntax highlight in english version is colorful but in fa.wiki is black and > wight > http://en.wikipedia.org/wiki/User:Reza1615/pdf > http://fa.wikipedia.org/wiki/کاربر:Reza1615/pdf fixed with: http://code.pediapress.com/git/mwlib.rl?p=mwlib.rl;a=commit;h=59d45fe60762fa55e43eaac58d9b3ee9fed985fb > 2- location map in both en.wiki and fa.wiki is incorrect. This is a known issue: absolute positioned content isn't rendered correctly. Unfortunately I have no idea how to fix this.
(In reply to comment #32) > Created attachment 8959 [details] > Nazli New pak > > I changed all of non-spacing marks properties for > (Nazli,Nazlib,Nazli-italic,Nazlibold-italic).would you please test them? There seems to be a problem with the font. The rendering engine crashes with the following traceback: Traceback (most recent call last): File "/home/volker/repos/mwlib.rl/mwlib/rl/rlwriter.py", line 528, in renderBook self.doc.build(elements) File "/home/volker/py26/lib/python2.6/site-packages/mwlib.ext-0.12.3-py2.6-linux-i686.egg/mwlib/ext/reportlab/platypus/doctemplate.py", line 906, in build self._endBuild() File "/home/volker/py26/lib/python2.6/site-packages/mwlib.ext-0.12.3-py2.6-linux-i686.egg/mwlib/ext/reportlab/platypus/doctemplate.py", line 848, in _endBuild if getattr(self,'_doSave',1): self.canv.save() File "/home/volker/py26/lib/python2.6/site-packages/mwlib.ext-0.12.3-py2.6-linux-i686.egg/mwlib/ext/reportlab/pdfgen/canvas.py", line 1123, in save self._doc.SaveToFile(self._filename, self) File "/home/volker/py26/lib/python2.6/site-packages/mwlib.ext-0.12.3-py2.6-linux-i686.egg/mwlib/ext/reportlab/pdfbase/pdfdoc.py", line 235, in SaveToFile f.write(self.GetPDFData(canvas)) File "/home/volker/py26/lib/python2.6/site-packages/mwlib.ext-0.12.3-py2.6-linux-i686.egg/mwlib/ext/reportlab/pdfbase/pdfdoc.py", line 247, in GetPDFData fnt.addObjects(self) File "/home/volker/py26/lib/python2.6/site-packages/mwlib.ext-0.12.3-py2.6-linux-i686.egg/mwlib/ext/reportlab/pdfbase/ttfonts.py", line 1126, in addObjects pdfFont.ToUnicode = doc.Reference(cmapStream, 'toUnicodeCMap:' + baseFontName) File "/home/volker/py26/lib/python2.6/site-packages/mwlib.ext-0.12.3-py2.6-linux-i686.egg/mwlib/ext/reportlab/pdfbase/pdfdoc.py", line 516, in Reference raise ValueError, "redefining named object: "+repr(name) ValueError: redefining named object: 'toUnicodeCMap:AAAAAA+Nazli'
(In reply to comment #29) > Created attachment 8958 [details] > Nazli font bold New version > is it ok?
(In reply to comment #33) > (In reply to comment #30) > > 2- location map in both en.wiki and fa.wiki is incorrect. > > This is a known issue: absolute positioned content isn't rendered correctly. > Unfortunately I have no idea how to fix this. I updated http://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf#test may be it helps you
(In reply to comment #35) > (In reply to comment #29) > > Created attachment 8958 [details] > > Nazli font bold New version > > > > is it ok? This is getting a little confusing...The new font version with the fixed non-spacing marks does not render at all. The PDF export crashes due to some error in the font I guess.
(In reply to comment #36) > (In reply to comment #33) > > (In reply to comment #30) > > > > 2- location map in both en.wiki and fa.wiki is incorrect. > > > > This is a known issue: absolute positioned content isn't rendered correctly. > > Unfortunately I have no idea how to fix this. > > I updated > http://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Reza1615/pdf#test > may be it helps you Thanks, but that does not help. I investigated the issue with absolute positioned stuff intensively in the past. The result was that for the downloadable PDFs I have no idea how to solve it.
Let me sum up the font situation: Freefarsi: all four variants present, non-spacing marks rendered correctly. BUT: one character (هٔ) not suited for persian or arabic Nazli: all four font variants present (thanks Reza!) and working. non-spacing marks are not rendered correctly. Reza tried to fix the font, but that resulted in crashes of the rendering engine. Error message below: ValueError: redefining named object: 'toUnicodeCMap:AAAAAA+Nazli' Rendering with the Nazli font works if I only use the regular variant. Therefore my suspicion is that there was a mistake when constructing the bold (possibly als for italic/bold-italic) variant of the font. Maybe the internal font name was accidentally changed from something like NazliBold to Nazli - therefore colliding with the regular Nazli font. @Reza: could you double check that and if possible fix it. In that case I could use your Nazli font variants with fixed non-spacing mark positioning.
I added the internationalization keyword (i18n). Can people still reproduce this problem? Is it still affecting people? I'm checking since we've deployed MediaWiki 1.18 to all Wikimedia Foundation wikis and that has some fixes that might have solved this problem.
Thanks Volker Haas, he solved them in PDF export engine.