Last modified: 2014-02-23 01:07:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T51088, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 49088 - Make archive table partially accessible on Wikimedia Labs
Make archive table partially accessible on Wikimedia Labs
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Sean Pringle
:
Depends on: 49189
Blocks: labs-replication
  Show dependency treegraph
 
Reported: 2013-06-03 17:27 UTC by Kunal Mehta (Legoktm)
Modified: 2014-02-23 01:07 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-06-03 17:27:16 UTC
On the Toolserver, users have limited access to the archive table. This is used for getting a deleted edit count or other analysis.

mysql> describe archive;
+---------------+------------------+------+-----+---------+-------+
| Field         | Type             | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+-------+
| ar_namespace  | int(11)          | NO   |     | 0       |       |
| ar_title      | varbinary(255)   | NO   |     |         |       |
| ar_user       | int(5) unsigned  | NO   |     | 0       |       |
| ar_user_text  | varbinary(255)   | NO   |     |         |       |
| ar_timestamp  | varbinary(14)    | NO   |     |         |       |
| ar_minor_edit | tinyint(1)       | NO   |     | 0       |       |
| ar_flags      | tinyblob         | NO   |     | NULL    |       |
| ar_rev_id     | int(8) unsigned  | YES  |     | NULL    |       |
| ar_len        | int(8) unsigned  | YES  |     | NULL    |       |
| ar_page_id    | int(10) unsigned | YES  |     | NULL    |       |
| ar_parent_id  | int(10) unsigned | YES  |     | NULL    |       |
+---------------+------------------+------+-----+---------+-------+

From #wikimedia-labs earlier today (trimmed for relevance):
[10:15:38 AM] <legoktm>	 Oh yeah Coren, is there an eta for the archive table being available? and is there a bug tracking it?
[10:16:35 AM] <Coren>	 legoktm: I don't think there's a bug tracking it, and it's a couple weeks before I have a definitive answer.
<...>
[10:18:49 AM] <legoktm>	 Out of curiosity, is it a legal issue or a technical one thats holding it back?
[10:21:32 AM] <Coren>	 legoktm: Legal.
<...>
[10:23:01 AM] <Coren>	 legoktm: I can tell you offhand that, if it is going to be okayed at all, it will be on a per-case basis and likely require approval with a process similar to that of getting the researcher right.
Comment 1 Alex Monk 2013-06-03 17:38:04 UTC
(In reply to comment #0)
> [10:23:01 AM] <Coren>     legoktm: I can tell you offhand that, if it is
> going
> to be okayed at all, it will be on a per-case basis and likely require
> approval
> with a process similar to that of getting the researcher right.

What is that process?
Comment 2 Marc A. Pelletier 2013-06-12 16:26:43 UTC
> What is that process?

That's actually part of the question Legal will have to solve.  (It is, currently, on their desk).
Comment 3 Marc A. Pelletier 2013-07-10 02:33:44 UTC
Legal has approved replication of a suitably redacted archive table (in particular, it will not have edit summaries); but there are "interesting" technical hurdles involved in replicating that table that require a (long overdue) delicate upgrade of the table itself on the masters that are a dependency.

Our DB team is now aware of the request, and they'll be able to get on it as soon as resources and time allows.
Comment 4 Kunal Mehta (Legoktm) 2013-07-10 03:49:05 UTC
Thanks for the update Marc. Could you clarify if this access will still require "a process similar to that of getting the researcher right"?

The copy on the Toolserver never had access to edit summaries; when you say "suitably redacted", are you comparing it to the actual archive table or the already limited version on the Toolserver described in comment 0?
Comment 5 Marc A. Pelletier 2013-07-10 11:00:05 UTC
No, the table will be generally available and won't require extra hoops. Also, the schema will be identical to production's; we have chosen to null columns rather than elide them entirely for the labs DB (makes some tools easier).
Comment 6 Cyberpower678 2013-10-03 18:42:08 UTC
Any updates on this?
Comment 7 Marc A. Pelletier 2013-10-31 13:24:13 UTC
Bug 49189 (a dependency) has been updated with status information by our DBA.
Comment 8 Dario Taraborelli 2013-11-22 23:50:40 UTC
I'm copying Michelle from Legal so she can help us figure out the rationale for blanking edit summaries, in response to comment 3. 

One possible reason (credits go to Ironholds): "lazy" page deletion for content that should have been oversighted may result in the snippet of the original page to be stored in the summary of the first revision of that page (and accidentally exposed if the summary is not censored).
Comment 9 Oliver Keyes 2013-11-23 00:03:13 UTC
Not just OS, but revdel stuff too - it's not that uncommon to just delete rather than revdel-each-edit-and-delete, since from a user POV it leads to the same outcome (content is only visible to sysops)
Comment 10 mpaulson 2013-11-23 00:58:07 UTC
Legal's opinion has been accurately conveyed here.  We are comfortable with putting the archive table in Labs as long as the edit summary is redacted.  Dario also asked me about providing metadata.  We are ok with this as long as the metadata provided does not include IPs (that are otherwise nonpublic) or location information.
Comment 11 Sean Pringle 2013-11-26 07:28:53 UTC
The archive table should now be replicating to labs with ar_text and ar_comment redacted. The views should be ready shortly.
Comment 12 Marc A. Pelletier 2013-11-26 14:23:43 UTC
The views are now in place, and the redacted archive table should now be visible from the replicas.  Note that it may take a while for replication to "catch up", especially on the larger wikis.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links