Last modified: 2014-11-04 22:51:07 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T21092, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 19092 - Diffs of small changes can be misleading
Diffs of small changes can be misleading
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
History/Diffs (Other open bugs)
unspecified
All All
: Low normal with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 21953 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-05 12:46 UTC by BIL
Modified: 2014-11-04 22:51 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description BIL 2009-06-05 12:46:53 UTC
When viewing the difference between two versions of an article, often small differences are shown as big.

For example when adding a newline into a paragraph the diff viewer is confused and it looks like a big change.

Another example is if changing something little in a paragraph and added a new one above, like a headline. Then the slightly changed paragraph is not recognised and it looks like a big change. One example http://en.wikipedia.org/w/index.php?title=Literal_Video_Version&diff=prev&oldid=293461052

Bug 5072 (3 years old) is related.
Comment 1 Alex Z. 2009-07-14 22:12:48 UTC
tweaked the summary to be more descriptive
Comment 2 Philippe Verdy 2009-11-20 04:37:33 UTC
The dif is not that big. A single paragraph, formatted as a single line, is plit in two separate lines, and it is normal that these two lines are tagged in diffs. This is the normal behavior of Unified diffs which compares full lines. The actual result in fact displays more granular differences with coloring, isn't it enough ?
You seem to want that MediaWiki automatically splits lines into several parts to show the differences and similitudes between fragments of lines. I don't think it will be very useful (and in many cases it will just add many more differences, on very small fragments.
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-12-27 16:16:34 UTC
*** Bug 21953 has been marked as a duplicate of this bug. ***
Comment 4 Stmrlbs 2009-12-28 00:53:24 UTC
I disagree.  Being able to see the differences between revisions is important, as it keeps people accountable for their changes.  To flag a whole paragraph as being "deleted", then "added" (which is what it looks like in the revision history) because someone inserts a blank line before that paragraph is very misleading.  

The javascript gadget Cacycle(WikEdDiff [http://en.wikipedia.org/wiki/User:Cacycle/wikEdDiff] seems to be able to differentiate these types of changes.  Perhaps someone could look at the code there, and convert it to the appropriate code used in displaying revision history.
Comment 5 Conrad Irwin 2009-12-28 02:59:08 UTC
There is no "best" diff output, it depends on personal taste. The beauty of the output of a diff depends as much on the postprocessing of the output as the algorithm used, or the parameters with which it is used.

The wikEdDiff uses a different theoretical approach, based on Heckel 1978, from MediaWiki, based on Myers 1986, which may or may not give better results overall. From memory, Heckel is generally better at spotting which strings came from where in the source, but can fail quite nastily if the source contains few unique words. In such cases it may generate very large diffs for very small changes - these failures are unintuitive, unlike the current failures, in which it's easy for a human to see why the computer has been misled (though I think they would be rarer in typical use, so maybe it would be an even match). The Heckel algorithm is, to my mind, quite beautiful in a language with inbuilt hashing, so a quick proof-of-concept should not be hard to whip up - though optimising it well enough to actually be used in MediaWiki might be a bit of a slog. (For further incentive, I have a primitive 3-way-merge tool based on Heckel which is wonderfully fun, you can safely fix a typo in the middle of a sentence while someone else moves the sentence to a new place, if we were to change the diff algorithm, we'd likely want the merge function to follow suit, though it's not necessary).
Comment 6 Dan Collins 2011-07-16 02:47:07 UTC
Conrad Irwin explained why our diffs may not be excellent. There are a few different algorithms to make diffs, and every one of them has a few cases where it won't be quite right. Presumably the one we're using was selected because the devs thought it would perform best. If there's a suggestion to use a different algorithm, and justification as to why it is better, that can be opened as a new bug, however unfortunately it's not likely that anyone has the time or skill to design a perfect diff algorithm. Closing wontfix.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links