Last modified: 2014-02-12 23:38:07 UTC
Add info if a langlinks is stored at repository of local to langlinks/ll on api.php?action=query&prop=langlinks&titles=... Bots need this info, because currently bots try to search for a langlink source on local wikipages. If the cannot find its source on the main page they start searching for langlink on included pages (mostly on template namespace lankings are included from subpage). This costs many page source requests and processing time for parsers a bot frameworks. But if bots would know that langlinks are already stored at wikidata they do not have to request source code of many local pages. Example: http://de.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=Vorlage:! currently returns <api> <query> <pages> <page pageid="5327033" ns="10" title="Vorlage:!"> <langlinks> <ll lang="ace" xml:space="preserve">Pola:!</ll> <ll lang="ar" xml:space="preserve">قالب:!</ll> <ll lang="as" xml:space="preserve">সাঁচ:!</ll> </langlinks> </page> </pages> </query> </api> maybe this can be extended to <api> <query> <pages> <page pageid="5327033" ns="10" title="Vorlage:!"> <langlinks> <ll lang="ace" storage="repository" xml:space="preserve">Pola:!</ll> <ll lang="ar" storage="local" xml:space="preserve">قالب:!</ll> <ll lang="as" storage="repository" xml:space="preserve">সাঁচ:!</ll> </langlinks> </page> </pages> </query> </api> If querying this info takes much resources an extra parameter should be added (like llurl for fullurl extra info) and info should only be shown if requested.
This bug come up in a thread on Project chat (http://www.wikidata.org/wiki/Wikidata:Project_chat#Prioritizing_Hungarian_articles) and it could be important to fix it. That is it has load issues, but will not impact us very much as it is only one bot for now.
(In reply to comment #1) > This bug come up in a thread on Project chat > (http://www.wikidata.org/wiki/Wikidata: > Project_chat#Prioritizing_Hungarian_articles) > and it could be important to fix it. That is it has load issues, but will not > impact us very much as it is only one bot for now. Is this bug still current? «i hope that bugzilla:41345 will be available before client extension goes live. Merlissimo (talk) 16:25, 30 November 2012 (UTC)» Which has already happened, and bots found another way it seems?
This is still open. There is not real solution. Because only article namespace is imported atm bots simply expect that langlinks are on wikidata if not founded in main source. Handling langlinks from inculded subpages like on template namespace will be impossible if this bug is not resolved.
To solve this bug, could someone comment on who creates langlinks table entries in the client DB? I might be mistaken, but it seems that the langlinks are not pulled dynamically from the repo, but rather copied in the background or on null edits. If this is the case, we might have to modify langlinks table to include an extra column for the "source".
re #4: Langlinks are pulled directly from the repo, but only when the page is re-rendered. When an item changes on wikidata.org, a background process (dispatchChanges.php) is used to invalidate the respective pages, so they get re-rendered. This may take a few minutes.
re #3: I currently see no easy way to do this. There is just no place to store this info on the client, and schema changes to large tables (like adding a field to the langlink table) are only done if absolutely necessary. We could add a separate table to track this, but that has additional implications, needs more thought and is not trivial to code either. I'm actually quite happy that we can manage without *any* changes to the client database.
(In reply to comment #4) > To solve this bug, could someone comment on who creates langlinks table > entries > in the client DB? I might be mistaken, but it seems that the langlinks are > not > pulled dynamically from the repo, but rather copied in the background or on > null edits. When the page is parsed and a langlink is found, it calls addLanguageLink() on the ParserOutput object. The Wikidata client code hooks into the ParserAfterParse hook and does the same for all the additional language links it wants to add. The accumulated list of language links in the ParserOutput (eventually) gets saved to the langlinks table. > If this is the case, we might have to modify langlinks table to > include an extra column for the "source". Seems that way to me. ParserOutput and whatever does the actual updating of langlinks would also have to be changed to handle the extra field.
It just occurred to me that we could stuff the list of "local" links, without the ones from wikidata, into the page_props table. It would be serialized data, so we couldn't directly compare that to what's in the langlink table, but when asking for the langlinks for a specific page, it would be sufficient to provide the information which link comes from where.
polluting page_props is just an ugly hack, the best idea would be adding a new column to langlinks, or not storing wikidata links in langlinks.
(In reply to comment #0) > Add info if a langlinks is stored at repository of local to langlinks/ll on > api.php?action=query&prop=langlinks&titles=... > > Bots need this info, because currently bots try to search for a langlink > source > on local wikipages. If the cannot find its source on the main page they start > searching for langlink on included pages (mostly on template namespace > lankings > are included from subpage). This costs many page source requests and > processing time for parsers a bot frameworks. > > But if bots would know that langlinks are already stored at wikidata they do > not have to request source code of many local pages. > > Example: > http://de.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=Vorlage: > ! > > currently returns > <api> > <query> > <pages> > <page pageid="5327033" ns="10" title="Vorlage:!"> > <langlinks> > <ll lang="ace" xml:space="preserve">Pola:!</ll> > <ll lang="ar" xml:space="preserve">قالب:!</ll> > <ll lang="as" xml:space="preserve">সাঁচ:!</ll> > </langlinks> > </page> > </pages> > </query> > </api> > > maybe this can be extended to > <api> > <query> > <pages> > <page pageid="5327033" ns="10" title="Vorlage:!"> > <langlinks> > <ll lang="ace" storage="repository" > xml:space="preserve">Pola:!</ll> > <ll lang="ar" storage="local" xml:space="preserve">قالب:!</ll> > <ll lang="as" storage="repository" xml:space="preserve">সাঁচ:!</ll> > </langlinks> > </page> > </pages> > </query> > </api> > > If querying this info takes much resources an extra parameter should be added > (like llurl for fullurl extra info) and info should only be shown if > requested. I'd rather suggest something like: <api> <query> <pages> <page pageid="5327033" ns="10" title="Vorlage:!"> <langlinks> <ll lang="ace" shared="" xml:space="preserve">Pola:!</ll> <ll lang="ar" xml:space="preserve">قالب:!</ll> <ll lang="as" shared="" xml:space="preserve">সাঁচ:!</ll> </langlinks> </page> </pages> </query> </api>