Last modified: 2014-10-26 21:09:45 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2639, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 639 - Add feature annotate/blame command, to indicate who last changed each line / word
Add feature annotate/blame command, to indicate who last changed each line / ...
Status: NEW
Product: MediaWiki
Classification: Unclassified
History/Diffs (Other open bugs)
unspecified
All All
: Low enhancement with 16 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/wiki/Talk:2000s
:
: 1652 1827 4796 7366 9455 10031 13927 18218 18810 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-10-03 21:47 UTC by Inedible Bulk
Modified: 2014-10-26 21:09 UTC (History)
17 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Implementation of blame (8.49 KB, text/plain)
2006-02-05 02:30 UTC, Edward Z. Yang
Details
Defines Annotation class for annotating based on revisions (8.10 KB, text/plain)
2006-02-12 05:37 UTC, Edward Z. Yang
Details
Test suite for Annotation package. (12.20 KB, text/plain)
2006-02-12 05:38 UTC, Edward Z. Yang
Details

Description Inedible Bulk 2004-10-03 21:47:54 UTC
I have had many times where I would continuously go through a history to find
out who added an offending line, or a curious line which I need to contact them
about.  As various people mentioned here (Such as --TK) did not sign, it would
take a while to figure out exactly who TK was.  It would be rather nice to be
able to highlight/search a line, and it would tell me times that that line was
affected, which would allow me to easily find who added said line.

This is a feature request, and as so, I labeled it an enhancement, as there's no
easy way to request features.  Apologies if I did this wrong.  I also searched
"Line" and only found a very few bugs, none of which like this.
Comment 1 Brion Vibber 2004-10-04 00:15:16 UTC
CVS has this on a line-by-line basis, so it's theoretically doable for a word-oriented check (since we have 
paragraph-oriented text, 'lines' are whole paragraphs and that's not as useful). However I suspect it's 
optimized by CVS's diff-based storage.

This would be spiffy indeed, but it's likely an expensive operation. (Particularly as some pages have 
thousands of revisions.) Something to keep in mind for the future.

Also note that when text is rearranged the results may be misleading.
Comment 2 Inedible Bulk 2004-10-05 05:11:07 UTC
I've not seen the CVS in action (Unless that's the test wiki).  Basically
tracking the origins of a paragraph would be a great improvement as it is, the
line basis is just nitpicky as I really had meant paragraph _i guess_ to begin
with).

I just wanted to track the history of a line, being a comment by a person, which
would in itself also be a paragraph.  As words can be added to lines at any time
(and i mean non-paragraph, wordwrapped lines) as well as the length (see
wordwrapping), this would be very processor intensive, and only slightly more
useful than paragraph history tracing.
Comment 3 Brion Vibber 2004-10-05 05:35:01 UTC
I should clarify that I'm talking about CVS itself, the 'cvs annotate' command. It gives output like this, 
marking each line with the revision number, user, and date that that line was last changed:

1.1       (eloquenc 28-Feb-04): if ( "" == $title && "delete" != $action ) {
1.58      (zhengzhu 22-Sep-04):      $wgTitle = Title::newFromText( wfMsgForContent( "mainpage" ) );
1.10      (vibber   08-Mar-04): } elseif ( $curid = $wgRequest->getInt( 'curid' ) ) {
1.1       (eloquenc 28-Feb-04):      # URLs like this are generated by RC, because rc_title isn't always 
accurate
1.10      (vibber   08-Mar-04):      $wgTitle = Title::newFromID( $curid );
1.1       (eloquenc 28-Feb-04): } else {
1.1       (eloquenc 28-Feb-04):      $wgTitle = Title::newFromURL( $title );
1.1       (eloquenc 28-Feb-04): }
Comment 4 Inedible Bulk 2004-10-06 03:01:15 UTC
(In reply to comment #3)
> I should clarify that I'm talking about CVS itself, the 'cvs annotate'
command. It gives output like this, 
> marking each line with the revision number, user, and date that that line was
last changed:
> 

Ah, I understand now.  Yes, this feature (In paragraph form if not lineform)
would be excellent in mediawiki/wikipedia.
Comment 5 Brion Vibber 2005-03-07 22:30:03 UTC
*** Bug 1652 has been marked as a duplicate of this bug. ***
Comment 6 andrea m 2005-04-05 19:06:13 UTC
*** Bug 1827 has been marked as a duplicate of this bug. ***
Comment 7 Rob Church 2005-10-09 16:37:33 UTC
(In reply to comment #2)
> I've not seen the CVS in action (Unless that's the test wiki).

Just in case you weren't aware, CVS (Concurrent Versions System) is the source
control tool used by the developers. The annotate command is often used to find
out who broke what part of the code. :)
Comment 8 Edward Z. Yang 2006-01-30 00:45:19 UTC
*** Bug 4796 has been marked as a duplicate of this bug. ***
Comment 9 Edward Z. Yang 2006-01-30 01:13:00 UTC
This feature is called blame in Subversion. I don't think it's feasible on a per
sentence basis, and we shouldn't worry about getting that out first. I really
think this would be useful.

Unfortunantely, it does seem to be an expensive operation (even Subversion says
so). How would it work? Hmm...

If we had delta based histories, getting a blame operation would be a simple
matter of scrolling backwards in the history in increments, matching the diffs
to current lines until all the lines had been matched, and then spitting that
out. However, we have a sort of compressed fulltext history thing, with diffs
computed on the fly (correct me if I'm wrong).

So, it would indicate to me, that the solution would be to generate these delta
histories when a blame is requested, and then keep it on file for the rest of
eternity. This, however, increases redundancy, and has its own synchronization
problems. Perhaps a move to delta compression is in order? Or has it already
happened?

:?

::is thoroughly confused, but would really like the feature::
Comment 10 Brion Vibber 2006-01-30 01:43:21 UTC
At WikiSym, a guy was showing off some work he was doing on this kind of stuff. 
He was basically running the comparisons offline and building a parallel 
database which could be then queried quickly. Once built, additional diffs can 
be added in pretty fast as well, at least in theory.
Comment 11 Edward Z. Yang 2006-02-05 02:30:37 UTC
Created attachment 1367 [details]
Implementation of blame

So, what this attachment does is it creates a blame() function, which takes an
array of revisions, and computes the diff in the form of an Annotation object.
See the SimpleTest testcase: it works. It's horrible code though, but I was
hoping to get it running on the Toolserver (unfortunantely, pulling revisions
from the database is also a horribly complicated problem, albeit one that can
be bypassed).
Comment 12 Edward Z. Yang 2006-02-12 05:37:07 UTC
Created attachment 1386 [details]
Defines Annotation class for annotating based on revisions

Much cleaner code, having been rewritten. A test suite is also going to be
uploaded for it. Still needs integration and a AnnotationPrinter.
Comment 13 Edward Z. Yang 2006-02-12 05:38:32 UTC
Created attachment 1387 [details]
Test suite for Annotation package.

Test suite for the annotation package. After all, TDD is good.
Comment 14 Edward Z. Yang 2006-02-12 05:41:18 UTC
With the implementation of the Annotation in place, there are several more tasks
to do:

1. Hook this code up to a special page
2. Create a new table annotations for storing the cached annotations
3. Create a maintenance script that will munch through all pages and generate
all initial annotations
4. Create an AnnotationPrinter
5. Add a hook to edit saves that recompiles the annotation

2, 3 and 5 are necessary in order to make this sort of extension efficient
enough for a huge wiki like English Wikipedia.

Any comments???
Comment 15 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-09-19 01:59:38 UTC
*** Bug 7366 has been marked as a duplicate of this bug. ***
Comment 16 Edward Z. Yang 2006-09-19 02:20:17 UTC
I've decided to unassign the bug to me. This is a very tricky piece of software
to implement and I don't think I'd be most qualified to do it. That's not to say
that the code isn't any good, but it still needs to be integrated with MediaWiki.
Comment 17 Kimon Berlin (gribeco) 2006-11-10 01:38:29 UTC
I really would like to know who was the *first* to introduce a given
sentence/paragraph, so I can hunt down copyright violators and kill them =)
Comment 18 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-11-10 01:39:52 UTC
That requires considerably more complexity.  You have to decide what happens
when lines are split or merged or moved, to begin with.
Comment 19 Andrew Garrett 2006-11-10 01:40:27 UTC
I think that running an annotation on a page every time it's saved would make
saving /very/ slow on pages with large histories. My suggestion would be /only/
updating the annotation for the changed lines, rather than redoing the entire
annotation.
Comment 20 Thomas Bleher 2006-11-10 10:39:25 UTC
Maybe a crazy idea, but anyway: I started using git (the version control tool used 
for the linux kernel) two weeks ago and am already amazed at it's power and 
flexibility. It's very fast and has good tools for searching through history.
Maybe the whole Wikipedia history could be imported into git? After that, new page 
saves would be added as new commits; as this is very fast in git, it won't represent 
a problem for the servers.
Comment 21 Thomas Bleher 2006-11-10 23:26:51 UTC
To make the git idea more practical, it would also be possible to have a git repository for each 
wikipedia page; git is very space efficient, so this would not be a problem (I think it would 
probably need less space than the DB) and the repositories could be stored on different servers.
As pages are effectively independent from each other, so a shared repository wouldn't have many 
advantages.
Comment 22 Rob Church 2006-11-11 13:13:23 UTC
That would require gutting MediaWiki's internals, breaking compatibility with
huge amounts of other implementations; requiring the use of another piece of
software, and *could* introduce serious performance problems, despite the "speed
of git", as it were. The current use of the database is optimised in various
places for speed and overall load balancing as it is.

A "blame" command would be nice to have, but it's going to need a sane
implementation, not a radical reorganising of literal terabytes of information.
Comment 23 Sean 2007-01-19 21:13:23 UTC
> I have had many times where I would continuously go through a history to find
> out who added an offending line, or a curious line which I need to contact them
> about.

Me too; it sucks!

But note that a full-on CVS/Subversion line-by-line "annotate"
command is more than this feature really needs to be.  All you
really need is a box where you can type some text, and click
"Find first version of this article containing this text".

The code could just look at revisions of the article in
a binary-search fashion, so it would be fast.  Here's a
quick implementation in Perl:

   http://en.wikipedia.org/wiki/User:TotoBaggins#Wikiblame

Comment 24 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-01-28 17:37:19 UTC
Binary search is unacceptable for this.  It can return incorrect results in the case of reversions.
Comment 25 Rob Church 2007-03-30 08:33:29 UTC
*** Bug 9455 has been marked as a duplicate of this bug. ***
Comment 26 Steve Bennett 2007-04-02 02:12:23 UTC
I'll repost my request 9455 here, as it's rather simpler to implement than the
original request, and possibly less expensive:
---
It would be useful to be able to search in the prior revisions of a page in two
modes:
* Search backwards to find the first time when a specified piece of text appears
(ie, when it was added)
* Search backwards to find the last time that a specified piece of text appears
(ie, when it was removed)

Ideally one day it would be great to be able to click on text and see who added.
But in the meantime, it would be great to simply be able to search for a phrase
like "He was a supporter of Hitler." and to be able to leap to the revision when
that text first appeared.

(a slightly souped up version might show a condensed history consisting of
groups of revisions where the phrase appears at least once followed by groups of
revisions where it doesn't appear at all)
---
I notice that it would not be susceptible to whole paragraphs being moved around
as Brion commented. Since we would only be detecting whether the given phrase
exists or not, two successive diffs where the phrase existed (but in different
locations) would be treated the same. It ought to be less expensive as there is
no diffing involved: just a simple text search: Does the phrase exist in
revision T-1? No. Does the phrase exist in revision T-2? No. Does the phrase
exist in T-3? Yes. Stop.
Comment 27 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-05-25 20:49:37 UTC
*** Bug 10031 has been marked as a duplicate of this bug. ***
Comment 28 Roan Kattouw 2007-10-09 14:00:19 UTC
There is an extension [1] that does this now. WONTFIX?

[1] http://www.mediawiki.org/wiki/Extension:Annotation
Comment 29 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-10-09 17:33:19 UTC
No.  This is an important feature for reasonably effective version control and should be in core if at all possible.
Comment 30 Inedible Bulk 2007-10-24 04:29:38 UTC
I was checking out the article on Noah Webster for americanized words, and noticed that the section on it seemed to incorrectly reference american words as british, and vice versa.  I wasn't sure where the problem lied (was it specifying them wrong or had they been swapped), so I checked a bit older version which had them correctly.  It took a few nexts (as I had not realized it was so recent) to find the culprit:
http://en.wikipedia.org/w/index.php?title=Noah_Webster&diff=prev&oldid=166613821

Some users might have just thought that it was possibly old vandalism and just corrected it by hand.  The problem there, as evidenced by the edit I link, is that there was more vandalism than just the section I had noticed it in.  The benefit of a blame system shines here, where I can see which revision the edit occurred in and spot additional, previously hidden, edits.

I'm back at my bug, 3 years and 6 dupes later, and I can't really see what the exact status of this bug is.  I do like the new partial undo feature though, that is really nice.
Comment 31 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-10-25 00:30:49 UTC
The most important point for this bug is that it's not at all simple to do with a relational database system.  If we had something like git or Bazaar as a backend for revision storage, it would be trivial.  The interesting questions at this point seem to be

1a) If someone were to implement version storage for MediaWiki on top of something like git or Bazaar in a manner that doesn't sacrifice existing efficiency, is the Wikimedia Foundation willing to put in the time and effort to transfer the major projects?  Or even the minor projects, to start with?  (Probably not going to get an unambiguous "yes" here without progress on (2a).)

1b) If so, is anyone willing to do it?  (So far, no, and probably not going to be yes unless (1a) is fulfilled.)

2a) Is it possible to implement blame efficiently and scalably on top of an RDBMS?  (No evidence for a yes to this that I've seen: Ambush Commander admits that his work is not efficient enough for use right now.)

2b) If so, is anyone willing and able to do it?  (So far, no, and definitely not going to be yes unless (2a) is fulfilled.)

The picture is unlikely to change at any time in the foreseeable future, unless we get someone to step forward and put in a lot of work that may or may not end up amounting to something.  Put another way, in standard open-source fashion: if you really want it, you're going to have to write it yourself.
Comment 32 Roan Kattouw 2008-05-02 08:35:17 UTC
*** Bug 13927 has been marked as a duplicate of this bug. ***
Comment 33 Roan Kattouw 2009-05-16 17:11:54 UTC
*** Bug 18810 has been marked as a duplicate of this bug. ***
Comment 34 Chad H. 2009-07-14 17:38:09 UTC
*** Bug 18218 has been marked as a duplicate of this bug. ***
Comment 35 Jesús Martínez Novo (Ciencia Al Poder) 2011-04-21 18:53:24 UTC
I felt also interested on it, but thinking on the day-by-day edits on a Wiki, I think that a blame/annotate SVN/CVS-like feature is not feasible in a MediaWiki installation, specially in a public one where vandalism is common.

The annotation feature makes sense on a controlled development system where changes are not very huge. But here at Wikimedia (and other public wikis) where we deal with vandalism, it's common for vandals to blank pages or large sections of a page. That defeats the whole annotation system, since all lines would be marked as changed.

Instead, the idea of Steve Bennett at Comment 26 (posted on Bug 9455) would be more useful here, which only needs a text or pattern search of every revision text. That could also be implemented using JavaScript, retrieving every revision text trough the API and doing the search.

Bug 9455 was closed as resolved duplicated of this one, but I think it's worth to reopen it and probably think of implementing it if this one wouldn't be implemented.
Comment 36 Tristan Miller 2012-09-27 12:41:41 UTC
(In reply to comment #28)
> There is an extension [1] that does this now. WONTFIX?
> 
> [1] http://www.mediawiki.org/wiki/Extension:Annotation

The WikiTrust userscript also has this functionality:
https://de.wikipedia.org/wiki/Benutzer:NetAction/WikiTrust/WikiPraise
Comment 37 Jesús Martínez Novo (Ciencia Al Poder) 2013-08-29 17:19:43 UTC
The tool from Comment 26 is now available at
http://wikipedia.ramselehof.de/wikiblame.php

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links