Last modified: 2014-02-03 20:06:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T53254, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 51254 - tag_summary missing records
tag_summary missing records
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: High major (vote)
: ---
Assigned To: Sean Pringle
:
Depends on:
Blocks: 40867
  Show dependency treegraph
 
Reported: 2013-07-12 19:18 UTC by Sean Pringle
Modified: 2014-02-03 20:06 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Sean Pringle 2013-07-12 19:18:45 UTC
tag_summary duplicates data in change_tag, but is missing some records. 

Eg:

select * from change_tag where ct_rev_id = 563615370;
+-----------+-----------+-----------+--------------+-----------+
| ct_rc_id  | ct_log_id | ct_rev_id | ct_tag       | ct_params |
+-----------+-----------+-----------+--------------+-----------+
| 589674173 |      NULL | 563615370 | visualeditor | NULL      |
+-----------+-----------+-----------+--------------+-----------+

select * from tag_summary where ts_rev_id = 563615370;
Empty set (0.01 sec)

Cause unknown at time of writing.

Relevant recent activity:

https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#VisualEditor_tag_not_working_correctly

https://bugzilla.wikimedia.org/show_bug.cgi?id=40867
Comment 1 Bartosz Dziewoński 2013-07-12 19:22:31 UTC
Only seems to affect en.wp right now (works correctly on pl.wp and mw.org, for example).
Comment 2 Rob Lanphier 2013-07-12 20:51:38 UTC
Sean and Asher narrowed this down to a problem with the schema change tool that we use, and are working on a strategy to fix the data.  This looks like it's strictly a db-related problem that once fixed should stay fixed (assuming we don't try another similar schema migration before an upstream fix is made to the migration tool)
Comment 3 Bartosz Dziewoński 2013-07-12 20:56:21 UTC
Was it determined if any other databases apart from en.wp's one were affected?
Comment 4 Sam Reed (reedy) 2013-07-12 21:00:14 UTC
(In reply to comment #3)
> Was it determined if any other databases apart from en.wp's one were
> affected?

Not sure. The wikis that potentially may have this issue are:

+       'arwiki' => true,
+       'commonswiki' => true,
+       'cswiki' => true,
+       'dewiki' => true,
+       'elwiki' => true,
+       'enwiki' => true,
+       'enwikisource' => true,
+       'enwiktionary' => true,
+       'eswiki' => true,
+       'etwiki' => true,
+       'fawiki' => true,
+       'fiwiki' => true,
+       'frwiki' => true,
+       'hewiki' => true,
+       'huwiki' => true,
+       'idwiki' => true,
+       'itwiki' => true,
+       'jawiki' => true,
+       'ltwiki' => true,
+       'mrwiki' => true,
+       'nlwiki' => true,
+       'plwiki' => true,
+       'ptwiki' => true,
+       'rowiki' => true,
+       'ruwiki' => true,
+       'simplewiki' => true,
+       'svwiki' => true,
+       'trwiki' => true,
+       'ukwiki' => true,
+       'zhwiki' => true,

cf bug 40867#c6
Comment 5 Sean Pringle 2013-07-12 22:17:09 UTC
Firstly, we've determined this problem occurred due to an (apparent) bug in pt-online-schema-change when using a combination of:

- A table without primary key
- A table with unique indexes that all include nullable columns
- An unfortunately timed REPLACE statement in normal db traffic

Posc does online table alteration by:

- Creating a copy of the table with altered schema
- Setting triggers on the original table to keep the copy updated
- Copying data across using a batch process

In this case, posc set a DELETE trigger on tag_summary using a poor UNIQUE index (ts_log_id) with low cardinality and a nullable field. Then during the batching process, an external REPLACE statement with ts_log_id=NULL caused many too many rows to be deleted in the temporary table being altered. Given that many rows in tag_summary have ts_log_id=NULL, the table was massively reduced in size.

Now to the fix:

We've checked the other wikis and found no problems; only enwiki was affected.

Furthermore, only enwiki.tag_summary was affected. We've verified that enwiki.change_tag is complete and did not suffer the same problem. This was based on:

- Index cardinality and table size information collected before running the schema migration
- An investigation of the events in the binary log surrounding the migration period

Currently we are rebuilding tag_summary based on change_tag data. That will complete within 30 mins at the time of writing this comment.
Comment 6 Sean Pringle 2013-07-12 23:32:51 UTC
enwiki.tag_summary rebuild is complete.
Comment 7 Steven Walling 2013-07-12 23:37:27 UTC
Just checked this on-wiki as well. Seems fixed.
Comment 8 Robert Rohde 2013-07-13 00:59:47 UTC
Sorry to add to what I'm sure was a bit of a hectic day for someone, but I'm still seeing lingering bits of corruption.  Perhaps some sort of edge case that wasn't handled correctly by the rebuild?  99.9% of tags may be okay at this point, but here are some example that still seem to be errors.

A API query of 200 revisions tags as flagged as "blanking":

http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rctag=blanking&rclimit=200&rcprop=user%7Ccomment%7Ctitle%7Ctags%7Ctimestamp|ids&rccontinue=2013-07-12T22:20:40Z|589061595

While this query returns 200 entries, we find that only 188 of them report as actually having the "blanking" tag.

The remainder are things like 
  rcid="590123889" timestamp="2013-07-12T14:30:16Z"
  <tag>visualeditor</tag>
  
  rcid="590032703" timestamp="2013-07-12T00:33:31Z" 
  <tag>mobile edit</tag>

Where some other tag is reported but the expected "blanking" tag is not reported.

For another example of this issue see the API query for the "visualeditor-needcheck" tag:

http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rctag=visualeditor-needcheck&rclimit=200&rcprop=user%7Ccomment%7Ctitle%7Ctags%7Ctimestamp|ids

This tag should only be applied if the "visualeditor" tag is also present, but we observe that most of the results have either "visualeditor" or "visualeditor-needcheck" but not both.  A few entries even have other tags entirely.


What appears to have happened is that rebuild didn't correctly handle cases where a single revision was subject to multiple tags.  Instead it looks as though the rebuilt table applies at most one tag to each of the historical revisions.  Most of the time that's okay since few revisions actually have multiple tags, but it still leaves a bit of corruption and missing data on the rare cases when a revision is expected to have multiple tags.
Comment 9 Andre Klapper 2013-07-15 11:31:45 UTC
(In reply to comment #8)
> A API query of 200 revisions tags as flagged as "blanking":
> While this query returns 200 entries, we find that only 188 of them report as
> actually having the "blanking" tag.

That's still the case today.
Comment 10 Greg Grossmeier 2013-07-15 17:31:58 UTC
Lowering priority a bit since I don't there is data loss here (the table that was used to recreate the data still exists).

James: Assigning to you to determine the priority for getting around to fixing this data (since it affects VE related data, and you know what metrics are being tracked).
Comment 11 Sean Pringle 2013-07-15 21:33:56 UTC
Am investigating whether the tag_summary rebuild was conceptually flawed with regard to revisions with multiple tags, or not.

Also dumping enwiki binlogs on a slave (we have a month's worth) and pulling out all change_tag queries. Will reload them offline and join against a copy of change_tag to prove whether it is, in fact, completely intact.
Comment 12 Sean Pringle 2013-07-16 18:08:02 UTC
As Robert suggested in comment 8, the rebuild process missed some rows where revisions had multiple tags.

The script has been fixed and will run in batches on enwiki today. More info shortly...
Comment 13 Sean Pringle 2013-07-16 18:10:00 UTC
Btw, change_tag still looks complete to me; the binlog shows no problems there. Should just be the tag_summary rebuild logic at fault.
Comment 14 Sean Pringle 2013-07-16 21:25:40 UTC
Rebuild #2 of tag_summary has completed and the reports in comment 8 look better (to me). Anyone care to verify...
Comment 15 James Forrester 2013-07-16 22:19:46 UTC
(In reply to comment #14)
> Rebuild #2 of tag_summary has completed and the reports in comment 8 look
> better (to me). Anyone care to verify...

Appears to work for me, yes. Might be worth waiting for others to weigh-in, but from my POV this is fixed.
Comment 16 Robert Rohde 2013-07-17 05:16:07 UTC
Much better, but I'm still seeing some issues:

Looking for 500 "blanking" tags gives 498 "blanking" plus 2 labeled as just "mobile edit".

http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rctag=blanking&rclimit=500&rcprop=user%7Ccomment%7Ctitle%7Ctags%7Ctimestamp
Comment 17 Robert Rohde 2013-07-17 12:53:47 UTC
As a follow up, the two problematic tags I note in Comment 16 are both recent.  It is possible they have a different underlying cause than the previous corruption.  For example, this might represent a logic error in how the "mobile edit" tag is being recorded.
Comment 18 Steven Walling 2013-07-25 22:58:25 UTC
(In reply to comment #16)
> Much better, but I'm still seeing some issues:
> 
> Looking for 500 "blanking" tags gives 498 "blanking" plus 2 labeled as just
> "mobile edit".
> 
> http://en.wikipedia.org/w/api.
> php?action=query&list=recentchanges&rctag=blanking&rclimit=500&rcprop=user%7C
> comment%7Ctitle%7Ctags%7Ctimestamp

There are other strange things going on with tags... 

http://en.wikipedia.org/wiki/Wikipedia_talk:Tags#Incorrect_tagging

Not sure if it's related or if we should file a separate bug for incorrect tagging. I think mobile is also suffering from this issue (or was as of yesterday).
Comment 19 Bartosz Dziewoński 2013-07-25 23:05:14 UTC
Whatever is causing that (maybe just a misconfigured local filter?), it's most likely not related to this bug.
Comment 20 Bartosz Dziewoński 2013-10-18 19:55:17 UTC
That was bug 52077. Closing this.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links