Last modified: 2014-09-16 17:27:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70538, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68538 - Get dbpedia off OAI
Get dbpedia off OAI
Status: NEW
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
wmf-deployment
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 68867
  Show dependency treegraph
 
Reported: 2014-07-24 22:09 UTC by Sam Reed (reedy)
Modified: 2014-09-16 17:27 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Sam Reed (reedy) 2014-07-24 22:09:03 UTC
cf an email from Sebastian Hellman in February 2013

"We built *a lot* of infrastructure which depends on the updates.
So if the OAI-PMH stream would suddenly not work anymore, it would jeopardize three or four open-source projects and would cause a lot of problems further down the data chain (i.e. people who get the data from us ).

So yes, we are still using the OAI-PMH stream and we evven plan to extend the usage to more language versions of Wikipedia and many language versions of Wiktionary.
Of course, we are willing to change to the MediaWiki API, if necessary (and we also have to man power to achieve this within several months).
There were two major reasons, why we didn't switch, yet:
1. we have a running system, there is no real incentive to switch unless you tell us to.
2. we didn't have a contact from Wikimedia. I wrote one or two emails in the past, but didn't get a response.
3. We did not find any good documentation on how to get *all* updates from Wikipedia. Query RC and then do Special:Export requests?
4. We were afraid to get blocked, since we would be over the 1 request per second limit.


We would be happy, if we could get into contact and settle this matter to be compatible with the future. We are in contact with WikiData already (Anja Jentzsch worked on DBpedia before)."




I re-enabled oai auditing earlier today, and it would seem at the time of writing this email, that dbpedia are the only user of the OAI interface...
Comment 1 Brion Vibber 2014-07-24 22:11:35 UTC
Note that old search used OAI for internal updates at least on some wikis, but this should be gone soon with full CirrusSearch deployment.

What's the situation with the new rcstream etc things -- can these be adapted to send page text as well, or do we have a better way for them to do that kind of data fetch?
Comment 2 Sam Reed (reedy) 2014-07-30 17:14:17 UTC
Since 20140724215329

mysql:wikiadmin@db1038 [oai]> select oa_client, ou_name, count(oa_client) from o                                                                                        aiaudit left join oaiuser on oa_client = ou_id group by oa_client;
+-----------+--------------+------------------+
| oa_client | ou_name      | count(oa_client) |
+-----------+--------------+------------------+
|         0 | NULL         |             1055 |
|         6 | lsearch2     |           126808 |
|        12 | fresheye.com |             5967 |
|        13 | dbpedia      |            38854 |
+-----------+--------------+------------------+
4 rows in set (0.37 sec)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links