Last modified: 2012-01-24 02:39:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T35331, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 33331 - Excessive punctuation highlighting in wikidiff2
Excessive punctuation highlighting in wikidiff2
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
wikidiff2 (Other open bugs)
unspecified
All All
: Normal major with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://de.wikipedia.org/wiki/Benutzer...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-22 19:31 UTC by TMg
Modified: 2012-01-24 02:39 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description TMg 2011-12-22 19:31:16 UTC
Short: Do not colorize characters red that did not changed.

Long: A few weeks ago the diff algorithm was changed in all Wikipedia projects. If a single character was changed in earlier versions, this single character was marked red. Now, all non-whitespace characters around this character become red. In many cases this marks so much red, it becomes impossible to see what really changed, especially when editing templates, links or images.

I think the idea was to make edited whitespace visible (was invisible in earlier versions of the diff). If this is true, why don't you make the whitespace visible? Only the whitespace? Not surrounding text that did not changed?

Here are a few really bad examples:

http://de.wikipedia.org/w/index.php?title=Giovanni_Kessler&diff=prev&oldid=86716346
http://de.wikipedia.org/w/index.php?title=Terraria&curid=6244640&diff=97103203&oldid=97093972
http://de.wikipedia.org/w/index.php?title=Giovanni_Kessler&diff=prev&oldid=86716346
http://de.wikipedia.org/w/index.php?title=Hans_Bentzien&diff=prev&oldid=78225726

I created a script to fix this issue (I know it can't work in all cases, it's just a bad hack to fix at least some of the issues):

http://de.wikipedia.org/wiki/Benutzer_Diskussion:TMg/cleanDiff.js
Comment 1 Mark A. Hershberger 2011-12-22 20:53:14 UTC
https://en.wikipedia.org/wiki/User:Cacycle/wikEdDiff is a gadget on enwiki that does something similar.  If you're interested in fixing wikidiff the source is available.
Comment 2 TMg 2011-12-22 21:26:42 UTC
Before editing source that may or may not the cause, there are some questions: When was this changed and why? Is this a configuration issue or something in the CPP source of the extension? In which revision was this bug introduced?

http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/wikidiff2/?view=log

I think it was intended to be a feature but from my point of view it's a bug. At wikipedia.org the diff was better in October 2011 (as described above) and got worse in November 2011. Why isn't it possible to simply go back to the old version (as requested in bug #32601)?

wikEdDiff is no solution (neither is my script).
Comment 3 Mark A. Hershberger 2011-12-23 01:12:50 UTC
(In reply to comment #2)
> Why isn't it possible to simply go back to the old
> version (as requested in bug #32601)?

Because Bug 26038,  bug 25725 and bug 27993 as well as maybe some Thai support (see r67994) depend on the new version.  It is better to fix the problem that was introduced instead of re-introducing four old problems.

*** This bug has been marked as a duplicate of bug 32601 ***
Comment 4 Tim Starling 2011-12-23 01:26:58 UTC
Reopening. Thank you for the report, I didn't know there was an issue with punctuation highlighting. The word splitting algorithm was rewritten in version 1.1.0 of wikidiff2 which was deployed on November 2 per bug 27720.
Comment 5 Tim Starling 2011-12-23 05:57:41 UTC
Fix committed in r107135.
Comment 6 TMg 2011-12-23 12:06:04 UTC
Does this mean you still colorize full words in red even if only a single character changed? Why? The previous version was good, it was marking single characters only. What was the problem? Where was this change discussed?

Neither bug 26038 (replacing a dash with another dash) not bug 25725 (removing some whitespace from the HTML output) nor bug 27993 (bad diff in the last line, as far as I understand) nor some Thai support (that's a Wikipedia with 70,000 articles) can explain why the diff algorithm was changed so extreme for all languages (for a total of over 10 million articles!).
Comment 7 Tim Starling 2011-12-27 22:22:26 UTC
(In reply to comment #6)
> Does this mean you still colorize full words in red even if only a single
> character changed? Why? The previous version was good, it was marking single
> characters only. What was the problem? Where was this change discussed?

Yes, the full word will be highlighted even if only a single character is changed. This has been the behaviour of the diff engine on Wikipedia since 2002, except in Chinese, Japanese and Thai text. Before that, a line-by-line diff was used. We've never had character-level diffs for European languages.
Comment 8 TMg 2012-01-02 21:50:12 UTC
You are right. I'm sorry, I mixed something. I will wait, and when the fix is live in the Wikipedia projects I will try to improve my user script as far as I can and ask other users what they think about character-level diff. I will post my results in an other report. Thank you so far.
Comment 9 Jon Harald Søby 2012-01-15 18:14:42 UTC
I came here to report this same bug, but found this, and am glad that it is taken care of. Will it come live on Wikimedia wikis with 1.19?
Comment 10 Mark A. Hershberger 2012-01-15 20:29:01 UTC
(In reply to comment #9)
> I came here to report this same bug, but found this, and am glad that it is
> taken care of. Will it come live on Wikimedia wikis with 1.19?

You can check the beta site: http://beta.wmflabs.org/
Comment 11 Jon Harald Søby 2012-01-15 23:07:44 UTC
Okay, thanks. I guess it is not in 1.19, then, given the following diffs:

http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Problem_reports&curid=10&diff=265&oldid=264 (would expect only ":" to be highlighted on the right)

http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Problem_reports&curid=10&diff=261&oldid=260 (would expect only "337121" to be highlighted on the left)


PS! Your script is really handy, TMg!
Comment 12 TMg 2012-01-18 00:17:22 UTC
According to
http://labs.wikimedia.beta.wmflabs.org/wiki/Special:Version
that wiki is running 1.19alpha (r109243) but obviously the bug is not fixed.

All non-whitespace characters left and right of a change are highlighted including all punctuation characters.
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Talk&diff=1492&oldid=1491
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Test_cases&diff=prev&oldid=112

And why is the space at the end of this change highlighted? The space was not changed.
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Test_cases&diff=prev&oldid=116
Comment 13 Tim Starling 2012-01-20 00:03:41 UTC
Fix deployed. Note that wikidiff2 is deployed separately to MediaWiki, so the MediaWiki version doesn't tell you anything. Labs was not running the new version.
Comment 14 TMg 2012-01-22 13:56:00 UTC
Is this some kind of test or game to you? Breaking features and basically ignoring all complaints by marking them as invalid or fixed? Aren't you able to look at the examples? The bug is not fixed.

One of the examples looks better now:

http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Test_cases&diff=prev&oldid=116

Everything else is still broken:

http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Talk&diff=1492&oldid=1491
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Test_cases&diff=prev&oldid=112

How hard can it be to revert the single change that broke the diff algorithm?

I'm very sorry, I don't want to be personal. But the diff is an essential feature for me and this really drives me crazy.
Comment 15 TMg 2012-01-22 14:09:41 UTC
It seems you are talking about the Wikipedia projects and not about Labs. How should I know?

http://de.wikipedia.org/w/index.php?title=Wikipedia:Spielwiese&diff=98698173&oldid=98698134

I'm sorry. Resolved. Fixed.
Comment 16 Tim Starling 2012-01-22 19:25:56 UTC
Labs is new and I haven't ever logged into it or fixed anything on it. At present, it's not really my problem if something is broken on it. I'm sorry Mark gave you a link to it, anything you see there in relation to wikidiff2 is very unlikely to be related to what is going on on the main cluster.
Comment 17 Mark A. Hershberger 2012-01-23 19:20:51 UTC
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Bug33331&diff=1530&oldid=1529

yay?

http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Talk&diff=1492&oldid=1491 still shows the same thing, though.  Now to see if caching is (somehow) a problem.
Comment 18 Tisza Gergő 2012-01-23 20:07:41 UTC
(In reply to comment #17)
> Now to see if caching is (somehow) a problem.

Indeed it is:
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Talk&diff=1492&oldid=1491&foo=bar
Comment 19 Mark A. Hershberger 2012-01-24 02:39:47 UTC
looks fixed now.  Ryan Lane told me to restart memcached.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links