Last modified: 2011-02-06 10:04:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T28639, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 26639 - Headings appear twice, instead of edit section links
Headings appear twice, instead of edit section links
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.18.x
All All
: Normal normal (vote)
: ---
Assigned To: Daniel Friesen
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-09 11:13 UTC by Thomas Bleher
Modified: 2011-02-06 10:04 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Screenshot showing the buggy behaviour (31.47 KB, image/png)
2011-01-09 11:13 UTC, Thomas Bleher
Details
Screenshot showing the correct behaviour (32.86 KB, image/png)
2011-01-09 11:14 UTC, Thomas Bleher
Details
The correct html output, from phase3 r79519 (25.19 KB, text/html)
2011-01-09 11:49 UTC, Thomas Bleher
Details
The incorrect html output, from phase3 r79878 (23.38 KB, text/html)
2011-01-09 11:51 UTC, Thomas Bleher
Details
LocalSettings.php file which exhibits the problem (7.68 KB, application/x-httpd-php)
2011-01-13 07:53 UTC, Thomas Bleher
Details

Description Thomas Bleher 2011-01-09 11:13:23 UTC
Created attachment 7962 [details]
Screenshot showing the buggy behaviour

Using current phase3 trunk (r79878), headings appear twice, and section edit links are missing. It worked correctly on r79519. Therefore I suspect the parser changes made by Daniel Friesen. Attached are two screenshots that show the buggy and intended behaviour.

I should also add that I use a small Javascript function to move edit section links, like it is or was used on the german wikipedia. It is included below.

 function moveEditsection() {
   if (typeof oldEditsectionLinks == 'undefined' || oldEditsectionLinks == false) {
     var spans = document.getElementsByTagName("span");
     for(var i = 0; i < spans.length; i++) {
       if(spans[i].className == "editsection") {
         spans[i].style.fontSize = "x-small";
         spans[i].style.fontWeight = "normal";
         spans[i].style.cssFloat = "none";
         spans[i].style.marginLeft = "0px";
         spans[i].parentNode.appendChild(document.createTextNode(" "));
         spans[i].parentNode.appendChild(spans[i]);
       }
     }
   }
 }
Comment 1 Thomas Bleher 2011-01-09 11:14:03 UTC
Created attachment 7963 [details]
Screenshot showing the correct behaviour
Comment 2 Daniel Friesen 2011-01-09 11:25:51 UTC
Link to url with the issue please.

I can't identify the cause of the issue without page source.
Comment 3 Niklas Laxström 2011-01-09 11:32:05 UTC
And... does the problem occur without the script?
Comment 4 Thomas Bleher 2011-01-09 11:48:13 UTC
The page that is shown in the screenshot is http://spiele.j-crew.de/wiki/Technikworkshop . I have however already reverted to the old version because of the bug (so you won't see the issue directly, but you can examine the page source). It occured on multiple pages, so it doesn't seem too page specific.
I should also note that I had to explicitly purge the page to see the new behaviour, which I found a bit strange - after reading the commit messages, I had assumed that old parsed pages would be evicted from the cache automatically.

I downloaded the page using wget, and the headings are already doubled in the page source so it seems to be unrelated to the Javascript code I posted.

Old version of first header:
<h2><span class="editsection">[<a href="/w/index.php?title=Technikworkshop&amp;action=edit&amp;section=1" title="Abschnitt bearbeiten: Kurzbeschreibung">Bearbeiten</a>]</span> <span class="mw-headline" id="Kurzbeschreibung"><a href="/wiki/Datei:Info.png" class="image"><img alt="" src="/images/b/b3/Info.png" width="32" height="32" /></a> Kurzbeschreibung</span></h2>

New version of the first header:
<h2>Kurzbeschreibung <span class="mw-headline" id="Kurzbeschreibung"><a href="/wiki/Datei:Info.png" class="image"><img alt="" src="/images/b/b3/Info.png" width="32" height="32" /></a> Kurzbeschreibung</span></h2>


As you can see, the word "Kurzbeschreibung" is doubled.
Comment 5 Thomas Bleher 2011-01-09 11:49:52 UTC
Created attachment 7965 [details]
The correct html output, from phase3 r79519
Comment 6 Thomas Bleher 2011-01-09 11:51:03 UTC
Created attachment 7966 [details]
The incorrect html output, from phase3 r79878
Comment 7 Daniel Friesen 2011-01-09 11:56:52 UTC
Hmmm... this is strange behavior...

It seams like for some reason something is stripping the <editsection [...]>Text</editsection> down to "Text".

But everything worked fine when I coded it... parser tests to. and those included stuff in other languages.
Comment 8 Thomas Bleher 2011-01-13 07:53:40 UTC
Created attachment 7985 [details]
LocalSettings.php file which exhibits the problem

In the hope of making progress on this bug: here is a (slightly redacted for passwords and such) version of LocalSettings.php, with which I can reliably reproduce the bug. I tested without any extensions, and the bug is still there. Any ideas what could cause the bug?
Comment 9 Daniel Friesen 2011-01-13 08:13:29 UTC
(In reply to comment #8)
> Created attachment 7985 [details]
> LocalSettings.php file which exhibits the problem
> 
> In the hope of making progress on this bug: here is a (slightly redacted for
> passwords and such) version of LocalSettings.php, with which I can reliably
> reproduce the bug. I tested without any extensions, and the bug is still there.
> Any ideas what could cause the bug?

I tried some of those settings related to the parser but didn't have any luck. Try disabling tidy to see if that's causing the issue. I tried using the same setting but I'm not sure if I actually have tidy configured on my server.
Comment 10 Thomas Bleher 2011-01-13 14:56:14 UTC
(In reply to comment #9)
> Try disabling tidy to see if that's causing the issue. I tried using the same
> setting but I'm not sure if I actually have tidy configured on my server.

Yes, this is it! When disabling tidy, everything works correctly.

So the next step is probably to find out why tidy messes up the page this way...
Comment 11 Mark A. Hershberger 2011-01-22 03:43:40 UTC
Daniel, how did tidy cause this?  Can you track it down?
Comment 12 Daniel Friesen 2011-01-22 05:24:09 UTC
"how" tidy is causing the issue is fairly obvious without even browsing the code. It sounds like tidy is being run after we insert the editsection tokens, but before we replace them with real editsection tokens. The editsection tokens are made in the format <editsection page="..." section="...">...</editsection> specifically because that's the only way to get the LanguageConverter to treat the information in the exact way we want it to. However, unfortunately editsection is not a real html tag, so when tidy sees it before we replace it with editsection markup (which we want to let be customized) it decides to strip out the tag part and leave nothing but the heading hint behind, which is usually the same text as the header's text, leading to that text being displayed twice instead of having a header and an editsection link.

Now the "solution" is less obvious. Perhaps I'll experiment and see if I can use something like a mw: or mediawiki: xml/xhtml namespace and use something like <mw:editsection [...]> instead of <editsection> which is in the default xhtml/html namespace and trick tidy into leaving those tags alone.
Comment 13 Daniel Friesen 2011-01-29 11:08:19 UTC
Experimenting a little, Tidy seams to completely ignore the fact that xml and xml namespaces are valid in xhtml and decide to strip out valid namespaced xml anyways. So even when using <mw:editsection> and defining a valid xhtml namespace in the code we send to tidy, it still decides to ignore that and strip it out.

The only thing I've seen to have an effect is adding 'input-xml: yes' to the tidy.conf, however that might have other unwanted effects someone who knows tidy better should point out.
Comment 14 Thomas Bleher 2011-01-29 12:51:53 UTC
Hmm, so there seem to be several ways forward:

1) Modify tidy config and add 'input-xml: yes'
2) Remove $wgUseTidy, and use the non-tidy-mode in MediaWiki
3) Adapt the parser and the LanguageConverter so they can work together, without interfering with tidy
4) Revert the feature for now

Option 1 sounds a bit dangerous, as tidy is explicitly used to clean up faulty markup; using an XML parser that doesn't correct errors seems wrong in this context (The option is described as "This option specifies if Tidy should use the XML parser rather than the error correcting HTML parser.") That said: I haven't experimented with this config option, so I may be completely wrong on this point.

Option 2 seems quite unrealistic (see eg bug 12221 for what can happen without tidy installed). This change could break page rendering in lots of "wrong-but-worked-until-now" cases.

I do not know the code well enough to say if option 3 is viable, but from the comments so far it doesn't seem so easy.

IMHO, it would therefore be best to go for option 4 and revert the changes for now, until a better solution is found. This would at least make trunk work again in this common configuration.

Any other options/opinions?
Comment 15 Daniel Friesen 2011-01-29 13:14:13 UTC
5) Make tidy run on the parser output text each time it's fetched, instead of running on the text before saving it to the parser output

Though looking at the code we have ParserBeforeTidy and ParserAfterTidy and trying to remove tidy from Parser::parse might not be totally desirable.

6) Hack up MWTidy so that it replaces things like <mw:editsection> with something like the parser's UNIQ tokens, runs tidy (making tidy ignore the tokens) and then replaces the tokens with the original blocks.

May have some effects like making tidy not work inside anything inside a <mw:...> block, and it might fall apart with nesting, but it's probably the best option.
Comment 16 Daniel Friesen 2011-02-03 23:44:26 UTC
Are you still seeing this bug in trunk? I could swear I still have trunk installed, I don't have any special local changes to fix the bug left, but I can't reproduce the bug anymore.
Comment 17 Thomas Bleher 2011-02-04 07:18:16 UTC
Still happens with r81507.
$wgUseTidy = false --> everything is OK
$wgUseTidy = true  --> headings messed up as described
Comment 18 Daniel Friesen 2011-02-04 07:49:29 UTC
Hmmm, ok, either this has something to do with me upgrading some stuff on my server probably not or there's something wrong with the output in the pages I'm using to test on my test wiki causing XHTML errors tidy can't handle and instead results in tidy being ignored... guess I need to play with test pages.
Comment 19 Daniel Friesen 2011-02-06 01:39:24 UTC
I tried making a change in r81583 to hide these markers from tidy.

See if this fixed bug.
Comment 20 Thomas Bleher 2011-02-06 10:04:52 UTC
Thanks, that fixes the issue for me!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links