Last modified: 2014-09-18 17:43:12 UTC
The dewiki_p.watchlist table is missing at the labs DB. Toolserver had a sanitized view of the watchlist table, so please add the same view to labs. Thank you.
Toolserver view: mysql> describe watchlist; +----------------------------+----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------------------+----------------+------+-----+---------+-------+ | wl_namespace | int(11) | NO | | 0 | | | wl_title | varbinary(255) | NO | | | | | ts_wl_user_touched_cropped | varbinary(8) | NO | | | | +----------------------------+----------------+------+-----+---------+-------+ 3 rows in set (0.00 sec) ts_wl_user_touched_cropped is an abbreviated form of that user's user_touched value so users running queries can filter out inactive users. It included year, month and day. Example: 20131126.
This needs an okay from Legal; adding Michelle Paulson as CC.
*???* no reply since 2013-12-04...
Michelle / Luis: Any legal updates here? Steinsplitter: Asking a specific question to a specific person is likely more effective.
I think Luis was handling this issue, not me. I'll ping him about it.
I have been handling these generally, but wasn't cc'd on this one so didn't know about it. Andre, lets discuss separately if some other approach makes sense for consistent notifications around bugzilla/legal. May also want to see discussion at bug 58196. Am I correct in understanding that the sanitization here is (1) remove wl_user and (2) the sanitized user_touch? (But that as with bug 58196, this is on a per-wiki basis, so the sanitization of user_touch to a particular date may not be strong enough for particular wikis?)
(In reply to Luis Villa (WMF Legal) from comment #6) > Am I correct in understanding that the sanitization here is (1) remove > wl_user and (2) the sanitized user_touch? (But that as with bug 58196, this > is on a per-wiki basis, so the sanitization of user_touch to a particular > date may not be strong enough for particular wikis?) Yes, wl_user is definitely unavailable. And yes, "ts_wl_user_touched_cropped" has similar issues as those mentioned in bug 58196. If we split out "ts_wl_user_touched_cropped" to a separate ticket, there shouldn't be any issue with exposing only wl_namespace and wl_title (for now). This allows users to count the number of rows (to determine the number of page watchers of a particular title), but gives away no personal data. In my mind, exposing wl_namespace and wl_title alone don't rise to the level of requiring legal approval, but I guess it can't hurt to ask. :-)
(In reply to MZMcBride from comment #7) > [...] > If we split out "ts_wl_user_touched_cropped" to a separate ticket, there > shouldn't be any issue with exposing only wl_namespace and wl_title (for > now). This allows users to count the number of rows (to determine the number > of page watchers of a particular title), but gives away no personal data. In > my mind, exposing wl_namespace and wl_title alone don't rise to the level of > requiring legal approval, but I guess it can't hurt to ask. :-) If an equivalent for ts_wl_user_touched_cropped wouldn't be approved leaving only (wl_namespace, wl_title), from a performance perspective it would probably be very useful to not expose that table at all, but provide those count aggregates as hourly/daily snapshots (or optimize the indexes/views for this specific query).
Any progress on this? I would like to get the average number of watchers for the Portuguese Wikipedia and the number of watchers for a group of articles. So the solution proposed by MZMcBride works in my case. While we don't have access to the watchlist table, is there an alternative way to get this information?