Last modified: 2014-07-18 00:47:34 UTC
My bot created http://www.wikidata.org/w/index.php?title=Q38790&oldid=432372 and half an hour later it created an item having the same sitelinks: http://www.wikidata.org/w/index.php?title=Special:Undelete&target=Q39272×tamp=20121113025848 I expected that this always fails with an error.
I see the logs, but I cannot reproduce the issue. Can you point me to the code Merlbot is using? It is simply using the setitems module, right? But which flags are set? It definitively should fail with an error, an that is what happens to me when I try it.
We should look whether the uniqueness check is done on the master database - if not, the conflict would not be detected before saving. Then, the item's primary data blob would be saved with the conflicting info, and only the secondary database update would cause a unique key error. I can imagine the above scenario - but not really with a delay of 30 minutes. Slave lag just doesn't get that high. Anyway: merl, can you check whether the request that created the second entry actually returned ok, or whether it returned a fatal error of some sort?
GET: action=wbsetitem&formal=xml POST: bot=1&exclude=ns|title|touched|descriptions&token=cf55697143ae059949f41143c66255e1%2B%5C&summary=Bot%3A+Erg%C3%A4nze%3A+%5B%5Bpt%3ASmooth+collie%5D%5D%2C%5B%5Bde%3AKurzhaarcollie%5D%5D%2C%5B%5Bja%3A%E3%82%B9%E3%83%A0%E3%83%BC%E3%82%B9%E3%83%BB%E3%82%B3%E3%83%AA%E3%83%BC%5D%5D%2C%5B%5Bpl%3AOwczarek+szkocki+kr%C3%B3tkow%C5%82osy%5D%5D%2C%5B%5Bcs%3AKolie+kr%C3%A1tkosrst%C3%A1%5D%5D%2C%5B%5Bru%3A%D0%9A%D0%BE%D0%BB%D0%BB%D0%B8+%D0%BA%D0%BE%D1%80%D0%BE%D1%82%D0%BA%D0%BE%D1%88%D1%91%D1%80%D1%81%D1%82%D0%BD%D1%8B%D0%B9%5D%5D%2C%5B%5Bfr%3AColley+%C3%A0+poil+court%5D%5D%2Cw%2Clt%2Cfi%2Cit&data=%7B%22sitelinks%22%3A%5B%7B%22site%22%3A%22ptwiki%22%2C%22title%22%3A%22Smooth+collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22dewiki%22%2C%22title%22%3A%22Kurzhaarcollie%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22jawiki%22%2C%22title%22%3A%22%E3%82%B9%E3%83%A0%E3%83%BC%E3%82%B9%E3%83%BB%E3%82%B3%E3%83%AA%E3%83%BC%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22plwiki%22%2C%22title%22%3A%22Owczarek+szkocki+kr%C3%B3tkow%C5%82osy%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22cswiki%22%2C%22title%22%3A%22Kolie+kr%C3%A1tkosrst%C3%A1%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22ruwiki%22%2C%22title%22%3A%22%D0%9A%D0%BE%D0%BB%D0%BB%D0%B8+%D0%BA%D0%BE%D1%80%D0%BE%D1%82%D0%BA%D0%BE%D1%88%D1%91%D1%80%D1%81%D1%82%D0%BD%D1%8B%D0%B9%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22frwiki%22%2C%22title%22%3A%22Colley+%C3%A0+poil+court%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22enwiki%22%2C%22title%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22ltwiki%22%2C%22title%22%3A%22Trumpaplaukis+kolis%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22fiwiki%22%2C%22title%22%3A%22Sile%C3%A4karvainen+skotlanninpaimenkoira%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22itwiki%22%2C%22title%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%5D%2C%22labels%22%3A%5B%7B%22language%22%3A%22pt%22%2C%22value%22%3A%22Smooth+collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Kurzhaarcollie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22ja%22%2C%22value%22%3A%22%E3%82%B9%E3%83%A0%E3%83%BC%E3%82%B9%E3%83%BB%E3%82%B3%E3%83%AA%E3%83%BC%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22pl%22%2C%22value%22%3A%22Owczarek+szkocki+kr%C3%B3tkow%C5%82osy%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22cs%22%2C%22value%22%3A%22Kolie+kr%C3%A1tkosrst%C3%A1%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22ru%22%2C%22value%22%3A%22%D0%9A%D0%BE%D0%BB%D0%BB%D0%B8+%D0%BA%D0%BE%D1%80%D0%BE%D1%82%D0%BA%D0%BE%D1%88%D1%91%D1%80%D1%81%D1%82%D0%BD%D1%8B%D0%B9%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fr%22%2C%22value%22%3A%22Colley+%C3%A0+poil+court%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22en%22%2C%22value%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22lt%22%2C%22value%22%3A%22Trumpaplaukis+kolis%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fi%22%2C%22value%22%3A%22Sile%C3%A4karvainen+skotlanninpaimenkoira%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22it%22%2C%22value%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%5D%2C%22aliases%22%3A%5B%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Kurzhaar+Collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Collie+Smooth%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Kurzhaariger+Schottischer+Sch%C3%A4ferhund%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Langhaariger+Schottischer+Sch%C3%A4ferhund%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22pl%22%2C%22value%22%3A%22Collie+kr%C3%B3tkow%C5%82osy%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22pl%22%2C%22value%22%3A%22Collie+Smooth%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22cs%22%2C%22value%22%3A%22Kr%C3%A1tkosrst%C3%A1+kolie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22ru%22%2C%22value%22%3A%22%D0%9A%D0%BE%D0%BB%D0%BB%D0%B8+%D0%BA%D0%BE%D1%80%D0%BE%D1%82%D0%BA%D0%BE%D1%88%D1%91%D1%80%D1%81%D1%82%D0%BD%D0%B0%D1%8F%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fr%22%2C%22value%22%3A%22Colley+a+poil+court%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fi%22%2C%22value%22%3A%22Nahkacollie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fi%22%2C%22value%22%3A%22Sile%C3%A4karvainen+collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fi%22%2C%22value%22%3A%22Lyhytkarvainen+collie%22%2C%22add%22%3A%22%22%7D%5D%7D SETITEM:/api /api:servedby=srv300 /api/error:code=save-failed /api/error:info=Edit not allowed: * Site link [[cswiki:Kolie krátkosrstá]] already used by item [[Q38790]]. * Site link [[dewiki:Kurzhaarcollie]] already used by item [[Q38790]]. * Site link [[enwiki:Smooth Collie]] already used by item [[Q38790]]. * Site link [[fiwiki:Sileäkarvainen skotlanninpaimenkoira]] already used by item [[Q38790]]. * Site link [[frwiki:Colley à poil court]] already used by item [[Q38790]]. * Site link [[itwiki:Smooth Collie]] already used by item [[Q38790]]. * Site link [[jawiki:スムース・コリー]] already used by item [[Q38790]]. * Site link [[ltwiki:Trumpaplaukis kolis]] already used by item [[Q38790]]. * Site link [[plwiki:Owczarek szkocki krótkowłosy]] already used by item [[Q38790]]. * Site link [[ptwiki:Smooth collie]] already used by item [[Q38790]]. * Site link [[ruwiki:Колли короткошёрстный]] already used by item [[Q38790]]. I also found a second duplicate (about "Osteuropäischer Schäferhund") which is related to the first error in time. wb_items_per_site is missing the sitelink information for this item (unique index). I already wrote in irc that it could be useful to write a maintenance script which compares json objects to rows in wb_items_per_site tables, so that more errors can be found.
Several people in the dev team has tried to locate this bug without success. I wonder if we should put this on a semipermanent hold for now and double check it later when we have the new branch on wikidata.org. Putting more work into a bughunt when it is so close to a new rollout seems counterproductive. With the new rollout we should probably do more logging in the transaction so we can identify if there is any problems.
merl, can you reproduce the bug on our test system http://wikidata-test-repo.wikimedia.de ?
The question is how likely is it that this bug happens. A third error was reported on wikidata. We really need a script that detects these duplicates. If there are only these three reported errors reproducing the bug at testwiki will become very difficult. My bot checks the existence of sitelinks using wbgetitems/wbgetentities. So if the second item is created by my bot and the delay to the first item is more than five minutes also the response of the get module must be wrong.
Just a note, so that we don't forget to delete all currently existing duplicates after this bug is fixed. de:Superkritisch http://www.wikidata.org/wiki/Q291263 http://www.wikidata.org/wiki/Q291265 de:Osteuropäischer Schäferhund http://www.wikidata.org/wiki/Q38504 http://www.wikidata.org/wiki/Q38505 de:St. Charles http://www.wikidata.org/wiki/Q296142 http://www.wikidata.org/wiki/Q296143 de:Leicester (Begriffsklärung) http://www.wikidata.org/wiki/Q298181 http://www.wikidata.org/wiki/Q298182 de:Vineland (New Jersey) http://www.wikidata.org/wiki/Q139482 http://www.wikidata.org/wiki/Q153881
I checked the logs for de:Superkritisch: On a first try my bot got a 504 response. This caused that the request was resended to the server. The second try was successful with id 291265 response. Some hours later my bot sended a wbgetitems query containing de:Superkritisch and got 291263 as response.
I4972bd7b avoids a race condition concerning the uniqueness check on sitelinks. That may have cause this bug, although only if two edits where made less than 30 seconds apart. There's a similar issue with the uniqueness check on label/description for items, but that doesn't concern the bug as reported.
Change I4972bd7b: (bug 42325) Avoid race condition in SiteLinkTable
Verified in Wikidata demo sprint 26
Reopening, it seems this problem still exists. [[d:Q10000000]] was a duplicate of the now-deleted [[d:Q7357369]]. Additionally, my duplicate searcher has been finding them every now and then, but when my bot went on a mass-creation spree they started popping up even faster (see [[Special:Log/delete/Legoktm]]) mysql> select count(*) from logging where log_type="delete" and log_action="delete" and log_timestamp >= 20130300000000 and log_comment like "Exact dupe%"; +----------+ | count(*) | +----------+ | 475 | +----------+ 1 row in set (0.12 sec) ("Exact dupe of Q###" is the deletion summary my bot prefills)
(Lowering priority, this is no longer as big of an issue as it used to be)
Might be related to bug 45882 which maybe leaves the secondary storage in an inconsistent (-> incomplete) state.
*** Bug 48260 has been marked as a duplicate of this bug. ***
I think it is now impossible to create true duplicates after recent updates. Does anyone know how it could be tested/confirmed? Then we can close this bug.
Sadly this still occurs on the site if bots submit the same new item twice during a very small time span. For example: https://www.wikidata.org/w/index.php?title=Q17294561&action=history and https://www.wikidata.org/w/index.php?title=Q17294560&action=history In this case both items were created within the same second... I'm not sure how to fix these problem. The only possible fix I can think of right now is to build a create-lock mechanism that uses memcached during entity creation, but that's going to be a lot of work for an ugly solution.
(In reply to Marius Hoch from comment #17) > Sadly this still occurs on the site if bots submit the same new item twice > during a very small time span. > > For example: > https://www.wikidata.org/w/index.php?title=Q17294561&action=history > and https://www.wikidata.org/w/index.php?title=Q17294560&action=history > > In this case both items were created within the same second... I'm not sure > how to fix these problem. While it would be nice if Wikidata prevented this, the more important issue (and much simpler) is fixing the bug in the bot which created two identical items. My guess is the bot is running multiple threads and they are not coordinated, and probably even competing with each other.