Last modified: 2014-02-20 23:59:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 21061 - Add uploaded file text and metadata from files to fulltext search set
Add uploaded file text and metadata from files to fulltext search set
Status: NEW
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
unspecified
All All
: Low enhancement with 4 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 21062
Blocks: 13370 Wikisource 6421 6422
  Show dependency treegraph
 
Reported: 2009-10-08 17:52 UTC by Brion Vibber
Modified: 2014-02-20 23:59 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2009-10-08 17:52:54 UTC
We're starting to integrate text extraction for djvu and pdf files -- currently used for ProofreadPage extension -- but it's not currently exposed to the search indexing.

This is also something frequently desired for text document types like .doc and .odf, and there are some extensions for doing that but there's not a clean interface to plug it in to that can be supported for all search backends.

Note that supporting the Lucene search which updates separately might require some additional attention.

Related bugs:
* bug 6421 - search djvu file text
* bug 6422 - search pdf file text
* bug 13370 - search file metadata

Also interesting idea:
* bug 18045 - search text of linked files (but if these are remote, that's much harder to handle!)

Things we need:
* clear interface on File for things that need to be fetched (exif metadata, page text)
* clear interface on the SearchEngine class for plugging additional info in to updates
* a way to expose additional searchable info to the Lucene search's updaters (plugin to oai interface maybe to toss in extra data fields?)
Comment 1 User:Docu 2009-12-15 01:04:04 UTC
Related:

*Bug 21795 "camera categories" (proposal c would allow searching metadata through categories they generate)
Comment 2 DrTrigon 2013-12-29 22:00:26 UTC
bug 6421 could finally be closed - thanks to everybody involved there!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links