Last modified: 2014-07-18 00:47:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44325, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42325 - prevent creation of items having the same sitelinks
prevent creation of items having the same sitelinks
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Wikidata bugs
:
: 48260 (view as bug list)
Depends on: 45882
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-21 13:53 UTC by merl
Modified: 2014-07-18 00:47 UTC (History)
15 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description merl 2012-11-21 13:53:06 UTC
My bot created http://www.wikidata.org/w/index.php?title=Q38790&oldid=432372 and half an hour later it created an item having the same sitelinks: http://www.wikidata.org/w/index.php?title=Special:Undelete&target=Q39272&timestamp=20121113025848

I expected that this always fails with an error.
Comment 1 denny vrandecic 2012-11-26 10:28:26 UTC
I see the logs, but I cannot reproduce the issue. Can you point me to the code Merlbot is using? It is simply using the setitems module, right? But which flags are set?

It definitively should fail with an error, an that is what happens to me when I try it.
Comment 2 Daniel Kinzler 2012-11-26 23:49:57 UTC
We should look whether the uniqueness check is done on the master database - if not, the conflict would not be detected before saving. Then, the item's primary data blob would be saved with the conflicting info, and only the secondary database update would cause a unique key error.

I can imagine the above scenario - but not really with a delay of 30 minutes. Slave lag just doesn't get that high.

Anyway: merl, can you check whether the request that created the second entry actually returned ok, or whether it returned a fatal error of some sort?
Comment 3 merl 2012-11-27 00:04:23 UTC
GET:
action=wbsetitem&formal=xml
POST:
bot=1&exclude=ns|title|touched|descriptions&token=cf55697143ae059949f41143c66255e1%2B%5C&summary=Bot%3A+Erg%C3%A4nze%3A+%5B%5Bpt%3ASmooth+collie%5D%5D%2C%5B%5Bde%3AKurzhaarcollie%5D%5D%2C%5B%5Bja%3A%E3%82%B9%E3%83%A0%E3%83%BC%E3%82%B9%E3%83%BB%E3%82%B3%E3%83%AA%E3%83%BC%5D%5D%2C%5B%5Bpl%3AOwczarek+szkocki+kr%C3%B3tkow%C5%82osy%5D%5D%2C%5B%5Bcs%3AKolie+kr%C3%A1tkosrst%C3%A1%5D%5D%2C%5B%5Bru%3A%D0%9A%D0%BE%D0%BB%D0%BB%D0%B8+%D0%BA%D0%BE%D1%80%D0%BE%D1%82%D0%BA%D0%BE%D1%88%D1%91%D1%80%D1%81%D1%82%D0%BD%D1%8B%D0%B9%5D%5D%2C%5B%5Bfr%3AColley+%C3%A0+poil+court%5D%5D%2Cw%2Clt%2Cfi%2Cit&data=%7B%22sitelinks%22%3A%5B%7B%22site%22%3A%22ptwiki%22%2C%22title%22%3A%22Smooth+collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22dewiki%22%2C%22title%22%3A%22Kurzhaarcollie%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22jawiki%22%2C%22title%22%3A%22%E3%82%B9%E3%83%A0%E3%83%BC%E3%82%B9%E3%83%BB%E3%82%B3%E3%83%AA%E3%83%BC%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22plwiki%22%2C%22title%22%3A%22Owczarek+szkocki+kr%C3%B3tkow%C5%82osy%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22cswiki%22%2C%22title%22%3A%22Kolie+kr%C3%A1tkosrst%C3%A1%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22ruwiki%22%2C%22title%22%3A%22%D0%9A%D0%BE%D0%BB%D0%BB%D0%B8+%D0%BA%D0%BE%D1%80%D0%BE%D1%82%D0%BA%D0%BE%D1%88%D1%91%D1%80%D1%81%D1%82%D0%BD%D1%8B%D0%B9%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22frwiki%22%2C%22title%22%3A%22Colley+%C3%A0+poil+court%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22enwiki%22%2C%22title%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22ltwiki%22%2C%22title%22%3A%22Trumpaplaukis+kolis%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22fiwiki%22%2C%22title%22%3A%22Sile%C3%A4karvainen+skotlanninpaimenkoira%22%2C%22add%22%3A%22%22%7D%2C%7B%22site%22%3A%22itwiki%22%2C%22title%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%5D%2C%22labels%22%3A%5B%7B%22language%22%3A%22pt%22%2C%22value%22%3A%22Smooth+collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Kurzhaarcollie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22ja%22%2C%22value%22%3A%22%E3%82%B9%E3%83%A0%E3%83%BC%E3%82%B9%E3%83%BB%E3%82%B3%E3%83%AA%E3%83%BC%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22pl%22%2C%22value%22%3A%22Owczarek+szkocki+kr%C3%B3tkow%C5%82osy%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22cs%22%2C%22value%22%3A%22Kolie+kr%C3%A1tkosrst%C3%A1%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22ru%22%2C%22value%22%3A%22%D0%9A%D0%BE%D0%BB%D0%BB%D0%B8+%D0%BA%D0%BE%D1%80%D0%BE%D1%82%D0%BA%D0%BE%D1%88%D1%91%D1%80%D1%81%D1%82%D0%BD%D1%8B%D0%B9%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fr%22%2C%22value%22%3A%22Colley+%C3%A0+poil+court%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22en%22%2C%22value%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22lt%22%2C%22value%22%3A%22Trumpaplaukis+kolis%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fi%22%2C%22value%22%3A%22Sile%C3%A4karvainen+skotlanninpaimenkoira%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22it%22%2C%22value%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%5D%2C%22aliases%22%3A%5B%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Kurzhaar+Collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Collie+Smooth%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Kurzhaariger+Schottischer+Sch%C3%A4ferhund%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Smooth+Collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22de%22%2C%22value%22%3A%22Langhaariger+Schottischer+Sch%C3%A4ferhund%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22pl%22%2C%22value%22%3A%22Collie+kr%C3%B3tkow%C5%82osy%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22pl%22%2C%22value%22%3A%22Collie+Smooth%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22cs%22%2C%22value%22%3A%22Kr%C3%A1tkosrst%C3%A1+kolie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22ru%22%2C%22value%22%3A%22%D0%9A%D0%BE%D0%BB%D0%BB%D0%B8+%D0%BA%D0%BE%D1%80%D0%BE%D1%82%D0%BA%D0%BE%D1%88%D1%91%D1%80%D1%81%D1%82%D0%BD%D0%B0%D1%8F%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fr%22%2C%22value%22%3A%22Colley+a+poil+court%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fi%22%2C%22value%22%3A%22Nahkacollie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fi%22%2C%22value%22%3A%22Sile%C3%A4karvainen+collie%22%2C%22add%22%3A%22%22%7D%2C%7B%22language%22%3A%22fi%22%2C%22value%22%3A%22Lyhytkarvainen+collie%22%2C%22add%22%3A%22%22%7D%5D%7D

SETITEM:/api
/api:servedby=srv300
/api/error:code=save-failed
/api/error:info=Edit not allowed:
* Site link [[cswiki:Kolie krátkosrstá]] already used by item [[Q38790]].
* Site link [[dewiki:Kurzhaarcollie]] already used by item [[Q38790]].
* Site link [[enwiki:Smooth Collie]] already used by item [[Q38790]].
* Site link [[fiwiki:Sileäkarvainen skotlanninpaimenkoira]] already used by item [[Q38790]].
* Site link [[frwiki:Colley à poil court]] already used by item [[Q38790]].
* Site link [[itwiki:Smooth Collie]] already used by item [[Q38790]].
* Site link [[jawiki:スムース・コリー]] already used by item [[Q38790]].
* Site link [[ltwiki:Trumpaplaukis kolis]] already used by item [[Q38790]].
* Site link [[plwiki:Owczarek szkocki krótkowłosy]] already used by item [[Q38790]].
* Site link [[ptwiki:Smooth collie]] already used by item [[Q38790]].
* Site link [[ruwiki:Колли короткошёрстный]] already used by item [[Q38790]].

I also found a second duplicate (about "Osteuropäischer Schäferhund") which is related to the first error in time.

wb_items_per_site is missing the sitelink information for this item (unique index). I already wrote in irc that it could be useful to write a maintenance script which compares json objects to rows in wb_items_per_site tables, so that more errors can be found.
Comment 4 jeblad 2012-11-28 09:07:54 UTC
Several people in the dev team has tried to locate this bug without success.

I wonder if we should put this on a semipermanent hold for now and double check it later when we have the new branch on wikidata.org. Putting more work into a bughunt when it is so close to a new rollout seems counterproductive.

With the new rollout we should probably do more logging in the transaction so we can identify if there is any problems.
Comment 5 Anja Jentzsch 2012-11-28 09:12:44 UTC
merl, can you reproduce the bug on our test system http://wikidata-test-repo.wikimedia.de ?
Comment 6 merl 2012-11-30 13:33:43 UTC
The question is how likely is it that this bug happens. A third error was reported on wikidata. We really need a script that detects these duplicates. If there are only these three reported errors reproducing the bug at testwiki will become very difficult.

My bot checks the existence of sitelinks using wbgetitems/wbgetentities. So if the second item is created by my bot and the delay to the first item is more than five minutes also the response of the get module must be wrong.
Comment 7 merl 2012-12-03 00:56:50 UTC
Just a note, so that we don't forget to delete all currently existing duplicates after this bug is fixed.

de:Superkritisch
http://www.wikidata.org/wiki/Q291263
http://www.wikidata.org/wiki/Q291265

de:Osteuropäischer Schäferhund
http://www.wikidata.org/wiki/Q38504
http://www.wikidata.org/wiki/Q38505

de:St. Charles
http://www.wikidata.org/wiki/Q296142
http://www.wikidata.org/wiki/Q296143

de:Leicester (Begriffsklärung)
http://www.wikidata.org/wiki/Q298181
http://www.wikidata.org/wiki/Q298182

de:Vineland (New Jersey)
http://www.wikidata.org/wiki/Q139482
http://www.wikidata.org/wiki/Q153881
Comment 8 merl 2012-12-03 01:45:43 UTC
I checked the logs for de:Superkritisch:

On a first try my bot got a 504 response. This caused that the request was resended to the server. The second try was successful with id 291265 response.

Some hours later my bot sended a wbgetitems query containing de:Superkritisch and got 291263 as response.
Comment 9 Daniel Kinzler 2012-12-03 16:02:07 UTC
I4972bd7b avoids a race condition concerning the uniqueness check on sitelinks. That may have cause this bug, although only if two edits where made less than 30 seconds apart.

There's a similar issue with the uniqueness check on label/description for items, but that doesn't concern the bug as reported.
Comment 10 jeblad 2012-12-05 10:45:03 UTC
Change I4972bd7b: (bug 42325) Avoid race condition in SiteLinkTable
Comment 11 abraham.taherivand 2012-12-12 16:43:14 UTC
Verified in Wikidata demo sprint 26
Comment 12 Kunal Mehta (Legoktm) 2013-04-05 01:39:35 UTC
Reopening, it seems this problem still exists.

[[d:Q10000000]] was a duplicate of the now-deleted [[d:Q7357369]].

Additionally, my duplicate searcher has been finding them every now and then, but when my bot went on a mass-creation spree they started popping up even faster (see [[Special:Log/delete/Legoktm]])

mysql> select count(*) from logging where log_type="delete" and log_action="delete" and log_timestamp >= 20130300000000 and log_comment like "Exact dupe%";
+----------+
| count(*) |
+----------+
|      475 |
+----------+
1 row in set (0.12 sec)

("Exact dupe of Q###" is the deletion summary my bot prefills)
Comment 13 Kunal Mehta (Legoktm) 2013-04-05 01:40:34 UTC
(Lowering priority, this is no longer as big of an issue as it used to be)
Comment 14 Marius Hoch 2013-04-05 01:42:58 UTC
Might be related to bug 45882 which maybe leaves the secondary storage in an inconsistent (-> incomplete) state.
Comment 15 denny vrandecic 2013-06-27 10:17:27 UTC
*** Bug 48260 has been marked as a duplicate of this bug. ***
Comment 16 Matěj Suchánek 2014-06-08 10:55:18 UTC
I think it is now impossible to create true duplicates after recent updates. Does anyone know how it could be tested/confirmed? Then we can close this bug.
Comment 17 Marius Hoch 2014-07-17 23:34:37 UTC
Sadly this still occurs on the site if bots submit the same new item twice during a very small time span.

For example:
https://www.wikidata.org/w/index.php?title=Q17294561&action=history
and https://www.wikidata.org/w/index.php?title=Q17294560&action=history

In this case both items were created within the same second... I'm not sure how to fix these problem. The only possible fix I can think of right now is to build a create-lock mechanism that uses memcached during entity creation, but that's going to be a lot of work for an ugly solution.
Comment 18 John Mark Vandenberg 2014-07-18 00:47:34 UTC
(In reply to Marius Hoch from comment #17)
> Sadly this still occurs on the site if bots submit the same new item twice
> during a very small time span.
> 
> For example:
> https://www.wikidata.org/w/index.php?title=Q17294561&action=history
> and https://www.wikidata.org/w/index.php?title=Q17294560&action=history
> 
> In this case both items were created within the same second... I'm not sure
> how to fix these problem.

While it would be nice if Wikidata prevented this, the more important issue (and much simpler) is fixing the bug in the bot which created two identical items.

My guess is the bot is running multiple threads and they are not coordinated, and probably even competing with each other.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links