Last modified: 2014-06-08 06:00:00 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40498, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38498 - Add an OAI-PMH API for index page
Add an OAI-PMH API for index page
Product: MediaWiki extensions
Classification: Unclassified
ProofreadPage (Other open bugs)
All All
: Normal enhancement (vote)
: ---
Assigned To: Tpt
Depends on: 37419
Blocks: oai-pmh Wikisource
  Show dependency treegraph
Reported: 2012-07-19 15:27 UTC by Tpt
Modified: 2014-06-08 06:00 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Tpt 2012-07-19 15:27:37 UTC
We need a good API to export metadata of index pages in a standardized and interoperable format in order to share them with GLAM community. They use the OAI-MPH protocol . I think that it may be a good idea to add this feature to Proofreadpage. I'm writing a first version of this feature (demo on labs : ).

To add this feature I've written a small metadata typing system. It's a very big addition to ProofreadPage and it need to rewrite the configuration of index pages (bug 37419) in all Wikisources. I suggest to test this big feature in one Wikisource during the needed time of stabilisation before a deployment on all Wikisources when a Special page to help to configure pfp will be done (bug 37839).
Comment 1 Brion Vibber 2012-07-19 21:14:27 UTC
I'm familiar with OAI-PMH, but could you lay out a brief explanation of how you propose exposing data via this protocol and how to manage it?

(I'm the author of the existing OAI extension for MediaWiki, which we've used in the past for some whole-wiki mirrors and today mostly for updating our search indexes.)
Comment 2 Tpt 2012-07-20 13:16:15 UTC
The data are the content of the pages of the Index namespace. This index namespace is manage by ProofreadPage in order to store metadata about scan that are proofread in a template (Mediawiki:Proofreadpage_index_template). (example: ). The OAI-PMH API is done to expose these metadata. The metadata are the standard bibliographic data (title, author...) + data related to the proofreading (number of pages proofreads in the book...). We can imagine sets build with the categories that regroup index pages.

I've improve the configuration system of the index pages ([[MediaWiki:Proofreadpage data config]]) in order to say: this entry in the index template is this in a list of properties known by ProofreadPage (title, author, publisher, identifier...), has this type (string, page link, number, LCCN...) and multiple values are separated by these strings ('; ', ' and '...).

With this configuration Proofreadpage get values of the entries of the index template, split them (with the strings listed in the config) try if they respect the type and expose them throw the API. I've implemented simple Dublin Core but I would like also provide data with a more efficient system.
Comment 3 Max Klein 2012-10-22 20:29:17 UTC
This would be a hugely useful addition. At current I'm employed by OCLC who run WorldCat [1]. If this OAI-PMH were to become active on Wikisource, it would be possible for WorldCat start cataloguing Wikisource. This would be great boon to both systems because WorldCat would be more useful in showing Free Full Text versions, and Wikisource can tap into the big traffic that WorldCat pulls. WorldCat already supports harvesting via OAI-PMH so this would be very low-hanging fruit.
Comment 4 Tpt 2012-11-01 14:14:21 UTC
Patch uploaded:
Comment 5 Max Klein 2012-11-01 16:45:09 UTC
(In reply to comment #4)
> Patch uploaded:

Let me know when this is live TPT and I will start harvesting.
Comment 6 db [inactive,noenotif] 2012-11-19 20:14:41 UTC
(In reply to comment #4)
> Patch uploaded:

Status Merged
Comment 7 Nemo 2012-11-20 10:21:54 UTC
(In reply to comment #6)
> (In reply to comment #4)
> > Patch uploaded:
> Status Merged

I've copied the available information to [[mw:Extension:Proofread_Page#OAI-PMH]], please someone add more.
Comment 8 Nemo 2012-11-20 10:25:05 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Patch uploaded:
> Let me know when this is live TPT and I will start harvesting.

It will presumably go live for Wikisources with wmf4 on Wednesday, November 28 (let us know what's missing for this bug report or additional ones to be considered closed).
Comment 9 Andre Klapper 2013-02-13 12:40:05 UTC
(In reply to comment #8)
> (let us know what's
> missing for this bug report or additional ones to be considered closed).

Tpt: Could you answer this? Or can this report be closed as FIXED?
Comment 10 Tpt 2013-02-13 19:36:46 UTC
Yes, all most important features are now deployed and works fine. I close as FIXED.

Note You need to log in before you can comment on or make changes to this bug.