Last modified: 2014-11-19 10:31:18 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T42009, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 40009 - Special:Import increases NUMBEROFARTICLES for each Revision instead of each Article
Special:Import increases NUMBEROFARTICLES for each Revision instead of each A...
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Export/Import (Other open bugs)
1.24rc
All All
: High normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 45269 (view as bug list)
Depends on: 57788
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-05 15:24 UTC by Marcus Buck
Modified: 2014-11-19 10:31 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Marcus Buck 2012-09-05 15:24:46 UTC
Recently we've got Special:Import enabled for transwikis from some others mediawiki wikis (https://bugzilla.wikimedia.org/show_bug.cgi?id=38943).

But apparently the article counter (NUMBEROFARTICLES) is not incremented once for each imported article, but once for each revision of the imported article!

So if I import an article with 100 revisions in the version history NUMBEROFARTICLES goes up by 100 instead of by 1.

I hope this can be fixed and that the article counter can be reset to it's actual value (it seems this needs to be done by someone with shell access [according to this related but different bug report: https://bugzilla.wikimedia.org/show_bug.cgi?id=5703]). I assume this also affects other wikis and it just went unnoticed because other wikis deal with higher article creation rates so the unusual article number bumps were covered by regular article creations and went unnoticed? Please check and update the article counters of the other wikis too if this applies.

Thank you very much for your time
User:Slomox
Marcus Buck
Comment 1 Marcus Buck 2012-09-06 06:52:23 UTC
I just realized that I only indirectly mentioned (by linking bug:38943) that I am speaking about nds.wikipedia.org. So, yeah, it's nds.wp where we've observed the problem.
Comment 2 Marcus Buck 2012-09-26 11:02:00 UTC
*tumbleweed*
Comment 3 db [inactive,noenotif] 2013-03-16 18:08:09 UTC
*** Bug 45269 has been marked as a duplicate of this bug. ***
Comment 4 Stefan Fussan 2013-04-02 10:08:26 UTC
The same happend to Wikivoyage. The number of articles jumped from 13,230 (2013-03-22) to 13,539 (2013-03-29) to 14,160 (2013-04-01) After importing two or three articles.
Comment 5 Andre Klapper 2013-04-15 11:54:35 UTC
Bug 5703 ("Special:Import needs to update site statistics") might be related, but it states the contrary (but might not be valid anymore, not clear from comment 18 and comment 21 there).
Comment 6 Jens 2013-07-07 14:16:37 UTC
Same problem at http://frr.wikipedia.org/ . The counter jumped from ~3,400 to ~5,600 after doing several imports on July,02. The counter works correctly, when I'm FIRST importing a page, and THEN work on that imported page, rename it etc. But the counter obviously is counting each single revision as an article, when I'm merging an imported article into an existing article.

I'd be glad, if there's a chance to reset the NUMBEROFPAGES in the nr:0 of frrwiki to the correct number.

Thanks for your attention
Murma174

http://frr.wikipedia.org/wiki/Spezial:Statistik
http://frr.wikipedia.org/wiki/Benutzer_Diskussion:Murma174
Comment 7 Jens 2013-07-07 14:25:32 UTC
In addition to Comment 6:

Although the counter is working correctly, when I'm first importing the page, the imported page does not show up in http://frr.wikipedia.org/wiki/Spezial:Neue_Seiten

Murma174
Comment 8 Jens 2013-07-18 12:47:19 UTC
Addition to Comment 6:
I couldn't reproduce the bug mentioned above. Today it worked fine, when I imported a template from dewiki into an existing template from frrwiki. The counter did not count, and that is correct in this case.

Murma174
Comment 9 Jens 2013-08-08 08:04:18 UTC
Addition to Comment 6:
Important information on the bug!

The bug occurs, when I'm importing a page and leave the checkbox "importing all revisions" checked.

When I uncheck the checkbox and first import only the last revision, and afterwards import the complete revision history, the bug does not occur!

A good workaround to avoid this bug could be, to leave this checkbox UNCHECKED as standard setting.
Comment 10 Jens 2013-08-16 07:57:45 UTC
Another information on the bug:

The counter adds [n-1] articles, when there are [n] revisions imported.

Example: Importing an article with 42 revisions increases the number of articles by 41.
Comment 11 Andyrom75 2013-10-04 06:41:40 UTC
Same thing occurs on http:/it.wikivoyage.org (I think in all voys in general, if not in all wikis).
Comment 12 Andyrom75 2013-10-10 22:24:03 UTC
I've compared all the voy statistics with the real articles count (without redirect). There's no language with the right number. Some of those has an higher articles count due to the above described bug, while others has a lower count, but I can't state the reason why.

Can someone take in charge the resolution of this bug?
Comment 13 adehertogh 2013-12-13 06:47:16 UTC
Same problem on http://fr.wikivoyage.org. 1 article imported and the counter grows of 60 articles!
Comment 14 Ricordisamoa 2014-02-16 07:48:54 UTC
Someone on this bug? Pleeease!
Comment 15 Andre Klapper 2014-02-16 19:24:41 UTC
If you want to speed this up: Patches are welcome: http://www.mediawiki.org/wiki/Developer_access
Comment 16 Ricordisamoa 2014-02-16 23:15:21 UTC
(In reply to Andre Klapper from comment #15)
> If you want to speed this up: Patches are welcome:
> http://www.mediawiki.org/wiki/Developer_access

I know very little PHP, but I would recommend increasing its priority.
Comment 17 Andyrom75 2014-03-13 09:14:53 UTC
Considering that no one in 1.5y has solved this bug, is it possible to reset/reprocessed the various https://en.wikivoyage.org/wiki/Special:Statistics (I mean for each language)? The problem is that those numbers do not reflect the real amount of articles, images, etc...

At least the current discrepancy will be mitigated.
Comment 18 zhuyifei1999 2014-03-21 09:46:44 UTC
(In reply to Andyrom75 from comment #17)
> Considering that no one in 1.5y has solved this bug, is it possible to
> reset/reprocessed the various
> https://en.wikivoyage.org/wiki/Special:Statistics (I mean for each
> language)? The problem is that those numbers do not reflect the real amount
> of articles, images, etc...
> 
> At least the current discrepancy will be mitigated.

That's bug 57788
Comment 19 This, that and the other (TTO) 2014-07-20 05:55:08 UTC
Confirming that this bug exists on the WMF cluster, but I cannot reproduce it on a local MediaWiki installation. I wonder if it is something to do with memcached (which I do not have locally)?
Comment 20 This, that and the other (TTO) 2014-07-21 11:22:02 UTC
I observe that this bug does *not* occur when importing additional revisions to an existing page (see [[testwiki:Kennebunk Free Library]]).

I now suspect this is because, upon importing each revision, Import#importOldRevision creates a new WikiPage object to check whether the page being imported already exists on the target wiki. This existence check is performed using the slave database servers. If the page does not exist, the site-wide article count is incremented.

Because this process happens several times in quick succession (once for each revision of the new page), the slave databases have not caught up to the fact that the page was created upon the import of the first revision. So the site-wide article count is incremented once for each revision.

(This also explains why I couldn't reproduce the issue locally! I don't have a master/slave setup.)

If my theory is right, a workaround for this issue, until it can be resolved, would be to create the page you want to import (with bogus content) before importing. Then, once the import is done, you can delete the page and selectively restore all revisions except the bogus revision.
Comment 21 Gerrit Notification Bot 2014-07-22 02:05:00 UTC
Change 148309 had a related patch set uploaded by TTO:
Use master DB to check for page existence during import

https://gerrit.wikimedia.org/r/148309
Comment 22 Gerrit Notification Bot 2014-07-26 06:58:34 UTC
Change 148309 merged by jenkins-bot:
Use master DB to check for page existence during import

https://gerrit.wikimedia.org/r/148309
Comment 23 This, that and the other (TTO) 2014-07-26 07:27:19 UTC
Tentatively marking fixed. If this issue still shows up on Wikimedia wikis after 7 August, we can reopen this and investigate further.
Comment 24 Andyrom75 2014-07-26 11:19:38 UTC
This, that and the other, your patch is already running? In the affirmative case we can test it, importing a page. If the article counter will increase of 1 unit instead of the total amount of the revision, the test has succeeded and the bug definitely fixed.
Comment 25 This, that and the other (TTO) 2014-07-26 11:26:40 UTC
(In reply to Andyrom75 from comment #24)
> This, that and the other, your patch is already running?

Not yet; you'll have to wait until at least 4 August to be able to try it out on Wikivoyage editions.
Comment 26 Andyrom75 2014-08-18 16:44:28 UTC
This, that and the other, I've reopend the bug because in zh:voy have been imported few pages on Aug 17th and the pagescount has increased drastically by almost 200 units, while on NS:0 I can see only 935 pages.

Could you give a look at it? Thanks
Comment 27 This, that and the other (TTO) 2014-08-22 07:06:49 UTC
Just tested this on MediaWiki.org, and it worked correctly (page count increased by 1, number of edits increased by the right amount).
Comment 28 This, that and the other (TTO) 2014-08-29 02:43:29 UTC
Apparently still happening; see bug 57788 comment 3. No idea what could be causing it. Perhaps we need to add some logging to production MediaWiki...
Comment 29 Andyrom75 2014-09-10 06:14:22 UTC
Is something that you can take care about or do you need the support of someone else?
Comment 30 This, that and the other (TTO) 2014-09-10 08:06:03 UTC
I don't have time at the moment to look into this. It seems to me that the erroneous increasing of statistics upon import is happening less often than it was before, but it obviously still needs to be fixed.
Comment 31 Andyrom75 2014-11-17 08:05:55 UTC
TTO have you got some of your precious spare time to take back in charge this bug too?

Once solved we could run again the statistic to reset the count (hopefully for the last time).

Thanks
Comment 32 This, that and the other (TTO) 2014-11-17 10:12:07 UTC
I can confirm that this bug is still occurring. I imported all history of [[Ratnapura Portuguese fort]] (4 countable revisions) to testwiki, and content page count increased by 4. 

On a local test installation, import of the same page increased the content page count by 1. So it is not occurring on a simple wiki, but it is occurring on a complex setup (WMF cluster).

This makes the issue very difficult for me to debug.
Comment 33 Andyrom75 2014-11-17 10:38:07 UTC
"Very difficult" doesn't mean "impossible", so I keep on relying on your troublehooting skills :-P

PS
I haven't understood what do you mean with "simple wiki Vs. WMF cluster". I suppose that also the other bug affected all the wikis.
Comment 34 Gerrit Notification Bot 2014-11-17 11:02:16 UTC
Change 173779 had a related patch set uploaded by TTO:
Debugging statements to try to diagnose bug 40009

https://gerrit.wikimedia.org/r/173779
Comment 35 This, that and the other (TTO) 2014-11-17 11:07:31 UTC
Ignore that, the bot wasn't meant to pick that up
Comment 36 Gerrit Notification Bot 2014-11-17 11:18:26 UTC
Change 173783 had a related patch set uploaded by Ori.livneh:
Debugging statements to try to diagnose bug 40009

https://gerrit.wikimedia.org/r/173783
Comment 37 Gerrit Notification Bot 2014-11-17 11:24:50 UTC
Change 173779 merged by jenkins-bot:
Debugging statements to try to diagnose bug 40009

https://gerrit.wikimedia.org/r/173779
Comment 38 This, that and the other (TTO) 2014-11-19 02:17:17 UTC
The problem seems to be around here [1]: the database is being queried for page links, to determine whether the page is countable before each revision is imported. However, this invariably returns false when called at [2]. Obviously querying the slave is pretty pointless in this context, but querying the master isn't much better, because (IIRC) page link updates are done via the job queue.

I've been thinking about this for some hours now, and I think I have an acceptable way of fixing this. Patch is on its way.

I would dearly love to get rid of that horrible "stateless" WikiRevision class and replace it with something that is context-aware...

[1] http://git.wikimedia.org/blob/mediawiki%2Fcore.git/713ee118efd2d99b9700124b605bfc1ca50939bb/includes%2Fpage%2FWikiPage.php#L867
[2] http://git.wikimedia.org/blob/mediawiki%2Fcore.git/713ee118efd2d99b9700124b605bfc1ca50939bb/includes%2FImport.php#L1478
Comment 39 Andyrom75 2014-11-19 07:58:17 UTC
I haven't reviewed the scripts but your idea seems to be reasobale.
Comment 40 Gerrit Notification Bot 2014-11-19 10:31:15 UTC
Change 174386 had a related patch set uploaded by TTO:
Cache countable statistics to prevent multiple counting on import

https://gerrit.wikimedia.org/r/174386

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links