Last modified: 2009-08-27 12:13:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T13215, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 11215 - Install PdfHandler extension


Summary:	Install PdfHandler extension

Status:	RESOLVED FIXED

Product:	Wikimedia
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement with 12 votes (vote)
Target Milestone:	---
Assigned To:	Rob Halsell

URL:	http://www.mediawiki.org/wiki/Extensi...
Whiteboard:
Keywords:	shell

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2007-09-06 18:20 UTC by Martin Seidel
Modified:	2009-08-27 12:13 UTC (History)
CC List:	17 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Martin Seidel 2007-09-06 18:20:55 UTC

Please install the PdfHandler extension on Wikimedia servers.

The handler suports Pdf files, extracts pages, generates thumbs from extracted pages and displays Pdf files in multipage view like DjVu. It works together with ProofreadPage and embedding images with [[Image:Bla.pdf|page=5]] works too. The extension was tested by Raymond and me. A preview works under http://www.xarax.eu/wiki/Bild:110.pdf.  The extension would be usefull for proofreading on all wikisource projects, but also to make image thumbs on all wikimedia projects.

Comment 1 joergens.mi 2007-09-06 19:47:07 UTC

I support this wish. The arguments can be seen above

Comment 2 Gregory Maxwell 2007-09-09 15:40:18 UTC

The example given is a scanned document. Why are these files not using djvu instead? Djvu files are usually substantially smaller (especially if the pdf is not using the patented jbig compression), we have pre-existing support for them, and the djvu viewer is nicer for high resolution images.

Comment 3 joergens.mi 2007-09-09 19:52:22 UTC

1 Because we are getting, typically pdf files from academic libraries, most archives and libraries dosn't even know this lizard tech format. 

2 Most of the programs support pdf, only some ist supporting djvu. 

3 Pdf ist a de facto standard world wide, djvu is not. 

4 Downloading an reading an pdf ist possible for everybody owning an PC because an acrobat reader is on every pc. For djvu you have to find the according programs.

5 In pdf file sometimes even the transcription of the text (ocr) is embedded, i don't know that things like that are possible in djvu. 

6 Pdf is an format which is accepted by commons, we should be able to use it in a usefull manor.

I hope the 6 points above will give you the answers to your question.

Comment 4 JeLuF 2007-11-04 17:16:38 UTC

According to http://www.mediawiki.org/wiki/Extension:PdfHandler#Bugs_and_enhancements the extension can't handle already uploaded images. That's a show stopper. 

Please fix and re-open this bug afterwards.

Comment 5 Martin Seidel 2007-12-01 02:13:49 UTC

with current mediawiki (1.12alpha (r28001)) the extension works with already uploaded pdf files

Comment 6 ThomasV 2007-12-01 11:56:59 UTC

I support the activation of this extension on Wikisource.

We all agree that the Djvu format is technically more 
appropriate for scanned documents. 

However, the process of creating Djvu files is too difficult for most contributors. 
As a result, contributors do not provide scanned images of their wikisource 
documents; they only provide OCR-ed text, and they keep the pdf on their own 
hard disk, wich makes collaborative verification impossible.

activating this extension on wikisource would solve that problem.

Comment 7 John Mark Vandenberg 2007-12-04 02:09:40 UTC

GIF files are also not desirable when PNG is more appropriate, but we do not prohibit GIF files from being viewed.

I agree that this is a useful improvement to the wikisource infrastructure, for the reasons given by others in comment #3 and comment #6.  Wikisource guidelines should recommend DJVU over PDF, and provide information to assist in the learning curve.

Comment 8 iPork 2007-12-14 17:38:45 UTC

I support this extension and I agree completely with ThomasV.

Comment 9 Accurimbono 2007-12-14 17:41:19 UTC

Strong support! Accurimbono

Comment 10 Yann Forget 2007-12-27 22:11:51 UTC

Please install this extension. Thanks, Yann

Comment 11 Joshua Sherurcij 2008-01-28 06:38:05 UTC

I strongly support the need for this extension at Wikisource, to improve attempts at collaborative effort and proofreading, as well as hosting documents that would otherwise be unhosted. <br /> -Joshua

Comment 12 Brion Vibber 2008-12-24 23:14:20 UTC

Taking a peek...

Comment 13 Brion Vibber 2008-12-24 23:50:07 UTC

Software requirements for deployment:

* Install ghostscript and xpdf-utils on image scalers (gs, pdfinfo)
* Install xpdf-utils on app servers (pdfinfo)

In general I think it seemed to be working ok when I last tested and tweaked it, so once the software deps are in we can throw it on test.wikipedia.org.

Comment 14 zephyrus4 2008-12-26 22:24:35 UTC

	
1. I agree with ThomasV's comments. 2. I think converting pdf files into djvu files requires time and effort that might be given to effective proofreading. 3. I prefer djvu when it can be done, but lots of documents won't be converted so fast: checking if a modiff is or is not a vandalism is very difficult, with many pdf that exist on libraries, if we have insufficient tools or no tools at all to use them. So I ask for these tools too. Zeph.

Comment 15 zephyrus4 2009-02-27 09:59:40 UTC

There is a problem with the result: if I select the "Version imprimable" option in the left menu of http://fr.wikisource.org/wiki/Du_contrat_social/Texte_entier the plain text that I get is correct, but if I select the "Version PDF" option the text is cut into pieces and it is no more understandable at all. Zeph.

Comment 16 Brion Vibber 2009-02-27 18:22:51 UTC

(In reply to comment #15)
> There is a problem with the result: if I select the "Version imprimable" option
> in the left menu of
> http://fr.wikisource.org/wiki/Du_contrat_social/Texte_entier the plain text
> that I get is correct, but if I select the "Version PDF" option the text is cut
> into pieces and it is no more understandable at all. Zeph.

PDF export (Collection extension -> mwlib) is totally unrelated. This bug is about installing support for inline display of uploaded PDF files in pages.

Comment 17 Techman224 2009-02-27 18:26:41 UTC

I support it

Comment 18 Mike.lifeguard 2009-02-27 23:34:23 UTC

(In reply to comment #17)
> I support it
> 

Please don't add useless comments like this one. Commenting on bugs is for offering technical information related to solving the bug. Anything else simply lowers the SNR, making life difficult for everyone. Please CC yourself if you want to follow progress, and vote if all you have to offer is "I support it."

Comment 19 ThomasV 2009-03-19 06:30:31 UTC

I no longer support the activation of this. 
I changed my mind for the following reason:

Djvu files may contain a text layer, which can store the result of an OCR. This 
text layer can be extracted and provide a starting point for corrections on the 
wiki. Soon this will be done automatically, when a page is edited for the first 
time, without the need to use a robot (see latest changes to ProofreadPage).

The text layer is not supported by the PDF format. Users who start to work on a 
PDF project might later realize that they want a djvu file, because they need to 
start from an OCR. This will force them to rename all the pages. This is messy.

In contrast, if they start from a djvu file, it will always be possible to add 
or improve the text layer, by uploading a new version of the file. Moving pages 
around is not needed.

In addition, the last year has shown that the community has learned to create and 
handle djvu files, which is the appropriate format for scans.

Comment 20 Mike.lifeguard 2009-03-19 17:37:05 UTC

(In reply to comment #19)
> I no longer support the activation of this. 
Sorry, this is not the place to put such comments. Discussion and consensus-building belongs on the wiki, not on the bug. The bug is for technical implementation of the request. However, to briefly address your comments, there is no reason that other wikis cannot make use of this extension, even if Wikisource prefers djvu. Furthermore, there is no reason we cannot support both, and every reason we should do so.

Comment 21 Brion Vibber 2009-03-19 18:13:22 UTC

(In reply to comment #19)
> I no longer support the activation of this. 
> I changed my mind for the following reason:
> 
> Djvu files may contain a text layer, which can store the result of an OCR. This 
> text layer can be extracted and provide a starting point for corrections on the 
> wiki.

So can PDF.

Comment 22 Brion Vibber 2009-08-12 23:29:44 UTC

IIRC we fixed this up to stick it on the Usability wiki when we moved it to shared infrastructure... if the PDF rendering is working on the scaler boxes we should be free to enable it generally.

Comment 23 ThomasV 2009-08-27 12:13:26 UTC

pdfhandler seems to have been installed

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links