Last modified: 2010-05-15 15:29:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2487, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 487 - HTML comments cause invisible text in tables


Summary:	HTML comments cause invisible text in tables

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	1.3.x
Hardware:	All All

Importance:	Normal minor (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://en.wikipedia.org/w/wiki.phtml?...
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2004-09-14 15:45 UTC by Angela
Modified:	2010-05-15 15:29 UTC (History)
CC List:	1 user (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Make preview more consistent with displayed article (730 bytes, patch) 2004-09-14 20:30 UTC, Wil Mahan	Details
Custom function to remove comments (1.90 KB, patch) 2004-09-16 03:31 UTC, Wil Mahan	Details
Add an attachment (proposed patch, testcase, etc.)

Description Angela 2004-09-14 15:45:40 UTC

If you use the following table, the "Weird invisible text" will be invisible
when you preview the page before saving it. It shows up normally if you save the
page and only affects the page preview. If you remove <!--bar-->, it all works
normally.

{|
|-
| valign="top" |
<!--foo--> Weird invisible text
|}

<!--bar-->

Comment 1 Brion Vibber 2004-09-14 17:53:09 UTC

I can reproduce the disappearing text, but I don't see it showing "normally if you save the page".

Comment 2 Angela 2004-09-14 18:01:59 UTC

http://en.wikipedia.org/w/wiki.phtml?title=User:Angela/Sandbox&oldid=5879312 is
a saved version of it. The text is not invisible there (which is what I meant by
showing normally). It's only when you preview the page that it's invisible.

Comment 3 Brion Vibber 2004-09-14 18:15:16 UTC

It seems to stay invisible if there's any following text:
http://meta.wikimedia.org/w/wiki.phtml?title=Sandbox&oldid=60172

Perhaps the preview/editing-related stuff triggers the same thing.

Comment 4 Wil Mahan 2004-09-14 20:30:57 UTC

Created attachment 39 [details]
Make preview more consistent with displayed article

The fact that the preview behaves differently from viewing the article is due
to the way newlines are appended to the preview text (I'm not sure why). If
this patch is applied, "Weird invisible text" shows up in the preview, as
expected.

But if you add <p><p> to the end of the testcase, you can reproduce the problem
both in the preview and in the saved article, as Brion said.

Comment 5 Brion Vibber 2004-09-14 20:46:53 UTC

The patch may be a good thing (please check that it produces correct spacing in the output with both preview before & preview after edit box options 
set), but the root problem is the vanishing text itself...

Comment 6 Wil Mahan 2004-09-14 22:49:51 UTC

Indeed, the root problem is this line in Parser.php:

 # Remove HTML comments
 $text = preg_replace( '/(\\n *<!--.*--> *(?=\\n)|<!--.*-->)/sU', '$2', $text );

The regular expression is supposed to remove comments, but it transforms this:
#### BEGIN text
{|
|-
| valign="top" |
<!--foo--> Weird invisible text
|}

<!--bar-->

<p>
#### END text

into this:

#### BEGIN text
{|
|-
| valign="top" |

<p>
#### END text

The desired result of the regex is:
#### BEGIN text
{|
|-
| valign="top" |
 Weird invisible text
|}

<p>
#### END text

The regex will need to be fixed; I'll try to take a look at this more later.

Comment 7 JeLuF 2004-09-15 05:53:45 UTC

replacing it with

$text = preg_replace( '/(\\n *<!--.*--> *|<!--.*?-->)/sU', '', $text );

works for me. I don't see why this should be anchored to an end-of-line.
Tested it with some combinations, and didn't break any trivial example.

Committed to CVS HEAD for further testing.

Comment 8 Wil Mahan 2004-09-15 06:41:50 UTC

The problem with that is if you have something like

abc
<!-- comment --> def

your code eats the newline and gives "abc def", while the original code gives

abc
 def

and the second seems more correct to me. This is really tricky.

Comment 9 JeLuF 2004-09-15 06:59:19 UTC

abc
 def

would start preformatted text. But that should only happen if the line starts
with a blank. It doesn't. It starts with a <!--. So at least the space should
also be eaten:

abc
def

But,

abc
<!-- comment -->
def

should not become

abc

def

since that would add a paragraph, which is not intended.

But:

* abc
<!-- comment --> def

currently becomes

* abc def

and this probably should not happen

Comment 10 Wil Mahan 2004-09-15 07:12:59 UTC

I was just pointing out that your code represents a change in behavior from the
original. In addition to the problem you mentioned, what about this:

abc
  <!-- comment --> def

That looks more like preformatted text, but your regexp would still make "abc
def", generating only normal text.

The only thing I have been able to convince myself is correct so far is this:

$text = preg_replace( '/\\n *<!--.*--> *(?=\\n)/sUe',
       '(strstr("$1", "-->")) ? "$0" : ""', $text );
$text = preg_replace( '/<!--.*-->/sU', '', $text );

it's slow and quite ugly, but I think at least does what we want.

Comment 11 Wil Mahan 2004-09-15 07:32:14 UTC

Err, I meant

$text = preg_replace( '/\\n *<!--(.*)--> *(?=\\n)/sUe',
        '(strstr("$1", "-->")) ? "$0" : ""', $text );
$text = preg_replace( '/<!--.*-->/sU', '', $text );

This gets your example right:

abc
<!-- comment -->
def

But another constraint is that

abc
<!-- comment -->
<!-- comment -->
def

should give the same thing; at least, that's how it works now. To make that work
you can't capture the newline following the first comment, because then the
second comment won't match the pattern and have its newline removed. But without
capturing the newline at the end of a comment, how can you know whether to
remove trailing whitespace?

In short, I'm not sure there's a good way to do this without either changing the
current behavior, or doing two separate passes as I did above.

Comment 12 Wil Mahan 2004-09-16 03:31:33 UTC

Created attachment 40 [details]
Custom function to remove comments

I got tired of trying to make a regexp do the right thing, so I wrote a custom
function to strip out the comments by hand. I hope this will be more robust
than the previous methods, and in my tests, it was faster. :)

Let me know what you think.

Comment 13 Wil Mahan 2004-09-17 08:48:24 UTC

I just realized the change JeLuF committed is very broken. Now in the text

abc <!-- test -->
blah blah blah blah
<!-- test -->

all the "blah"s get removed.

The "U" modifier makes all quantifiers ungreedy by default, so the ? acually
enables greediness. Oops!

I also think the problem with the problem with lists and preformatted text must
be addressed. My second patch would fix this, but if it's too ugly, I can look
for another solution....

Comment 14 Wil Mahan 2004-09-17 15:48:28 UTC

Sorry for the spam, but I think this will do the trick:

$text = preg_replace_callback( '/(\\n)??( *)<!--.*-->( *(?=\\1))??/sU', 
removeComment, $text );
function removeComment($m) {
        if ($m[1] === "\n")
                return '';
        else
                return $m[2] . $m[3];
}

although the custom function still seems faster for larger inputs.

Comment 15 JeLuF 2004-09-20 06:17:47 UTC

Committed the custom function to CVS HEAD.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links