Last modified: 2012-01-13 18:51:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T11954, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 9954 - Detect "extra whitespace" / BOM conditions
Detect "extra whitespace" / BOM conditions
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.11.x
All All
: Normal enhancement (vote)
: ---
Assigned To: Edward Z. Yang
: patch, patch-need-review
: 10371 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-18 17:59 UTC by Brion Vibber
Modified: 2012-01-13 18:51 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Some work in this regard (1.87 KB, patch)
2007-06-27 17:55 UTC, Brion Vibber
Details
Full patch, i18n too! (5.96 KB, patch)
2007-06-30 04:07 UTC, Edward Z. Yang
Details
Full patch v3, with improved documentation and Rob Church's suggestion (8.24 KB, patch)
2007-06-30 16:39 UTC, Edward Z. Yang
Details
Slightly tweaked patch - relative path output and an RSS fix (9.41 KB, patch)
2007-07-05 19:15 UTC, Brion Vibber
Details

Description Brion Vibber 2007-05-18 17:59:29 UTC
It's not infrequent for people to complain of broken RSS feeds and other behavioral oddities which are caused by extra whitespace in a custimized or extension PHP source file.

Usually this is a BOM character at the start or extra line breaks at the end inserted by a "helpful" text editor.

It should in many cases be possible to detect this condition, as well as other possible problems, by  checking the value of headers_sent() somewhere in OutputPage, or perhaps even in Setup.php, before actual output is ready to start.

It should even be possible to scan all files included so far for the common problem conditions in this case, telling the user where to look and what to fix.
Comment 1 Antoine "hashar" Musso (WMF) 2007-05-18 18:00:49 UTC
Upstream issue : http://bugs.php.net/bug.php?id=22108

Maybe add a script in t/maint to check for it.
Comment 2 Brion Vibber 2007-05-18 18:03:34 UTC
The most common problems are for non-technical third-party users, so a maintenance script requiring a bunch of non-standard stuff is maybe not the best thing -- it would never be run by the target audience.
Comment 3 Antoine "hashar" Musso (WMF) 2007-05-18 19:50:28 UTC
Test script in t/maint/bom.t (r22239).
Comment 4 Brion Vibber 2007-06-26 14:24:47 UTC
*** Bug 10371 has been marked as a duplicate of this bug. ***
Comment 5 Brion Vibber 2007-06-27 17:55:39 UTC
Created attachment 3832 [details]
Some work in this regard

Fiddled about a bit with this the other day, needs some polishing up before committing.
Comment 6 Edward Z. Yang 2007-06-28 18:21:27 UTC
I was fiddling around with the patch, and noticed a few interesting things:

* The automatic enabling of gzip compression will interfere with the ability to detect whether or not checkOutputBreakage() is necessary: if it is enabled then headers will not be sent and MediaWiki will think everything is hunky-dory. We should also check ob_get_contents() for a non-false/empty string value.

* To use character escapes, you need to use double-quotes. Patch as it stands won't be able to detect BOMs.

* Using MediaWiki exceptions is not the most friendly way of bubbling the error back to clueless end-users, because they need to enabled using $wgShowExceptionDetails (the instructions are clear enough, though, so it's not too big of a problem). Furthermore, a full stack trace is given even when it is effectively useless. Is there any way to mute this?

* The current places checkOutputBreakage() is called don't get to the response fast enough to get things working. Because WebResponse sends all headers, it would probably be more appropriate to perform the check there, and then set a little variable that would prevent the check from occurring over and over again

* headers_sent() has a neat little extra two parameters it accepts, $file and $line, which can get rid of most of the futzing around with checking included files. This only works, however, if output buffering is disabled, so the code is not in vain.

I will submit a patch after confirmation that my analysis is correct.
Comment 7 Brion Vibber 2007-06-28 19:02:25 UTC
(In reply to comment #6)
> I was fiddling around with the patch, and noticed a few interesting things:
> 
> * The automatic enabling of gzip compression will interfere with the ability to
> detect whether or not checkOutputBreakage() is necessary: if it is enabled then
> headers will not be sent and MediaWiki will think everything is hunky-dory. We
> should also check ob_get_contents() for a non-false/empty string value.

Not true in my testing so far, but perhaps that was only on LocalSettings.php.

> * To use character escapes, you need to use double-quotes. Patch as it stands
> won't be able to detect BOMs.

Woops. :)

> * Using MediaWiki exceptions is not the most friendly way of bubbling the error
> back to clueless end-users, because they need to enabled using
> $wgShowExceptionDetails (the instructions are clear enough, though, so it's not
> too big of a problem).

That's just a quick way to do an error,f eel free to experiment with other display options. Further, the exception details are not required for this.

> * The current places checkOutputBreakage() is called don't get to the response
> fast enough to get things working. Because WebResponse sends all headers, it
> would probably be more appropriate to perform the check there, and then set a
> little variable that would prevent the check from occurring over and over again

Details...

> * headers_sent() has a neat little extra two parameters it accepts, $file and
> $line, which can get rid of most of the futzing around with checking included
> files. This only works, however, if output buffering is disabled, so the code
> is not in vain.

This won't necessarily always get you what you want. Also it might be nice to report multiple problems. But yes, might help. :)
Comment 8 Edward Z. Yang 2007-06-30 04:07:56 UTC
Created attachment 3853 [details]
Full patch, i18n too!

This took a lot longer than I thought it would, mainly because I ended up having to internationalize the whole thing (translators, get on it!)

A few notes:
- I added MWException::$suppressStackTrace, which is a public variable sub-classes can override in order to change the behavior of the default reports
- I added a margin to the default exception text so that it doesn't wrap around the floated image. It looks weird when that happens
- Besides the few typos I fixed, the basic algorithm is the same. The headers_sent() call will handle the few cases where our specialized checks miss things. The unknown error message should never be called
- Fallback messages are the same as the message keys. I considered duplicating the messages in the class itself, but decided against it since premature headers will almost never cause $wgLang not to be populated.
- Detailed exception reporting needs to be on to see this in its full glory. I don't know if users are smart enough to do that, but... whatever. Darwinism or something.
Comment 9 Rob Church 2007-06-30 04:27:35 UTC
A sterling effort, but I would suggest using the exception for internal purposes and *forcing* the user to see the message. Most of the people this sort of thing is going to benefit won't know to switch $wgShowExceptionDetails on, and so will never see the advice being given.
Comment 10 Edward Z. Yang 2007-06-30 16:39:47 UTC
Created attachment 3856 [details]
Full patch v3, with improved documentation and Rob Church's suggestion

Alright, following Rob Church's suggestion I've added another member variable, $alwaysShowExceptionDetails which MWException subclasses can use to override $wgShowExceptionDetails. Also, the lack of documentation on some of the functions was bothering me, so I added docblocks.

It strikes me that the FatalError exception doesn't play nice with command-line utilities: since it's HTML error message, the HTML will get output as raw text. Maybe we should add strip_tags()?
Comment 11 Brion Vibber 2007-07-05 19:15:12 UTC
Created attachment 3872 [details]
Slightly tweaked patch - relative path output and an RSS fix

Nice work! A couple notes:

First, it might be nice if we could get the message prettied up a little more before we put this in a release; we want it to be easy to read and easy to see how to fix the problem.

One quick change I made was to make the file path relative against the wiki installation path instead of absolute. This will first make it shorter -- which is nice :) -- and second it avoids exposing complete directory paths, which would worry some people as it leaks potentially private information about filesystem structure which can be used to guide attacks.

A couple paragraph breaks, some bold or teletype markup on the filename, and maybe even a link to a help page on www.mediawiki.org might do wonders here for making it ready for a panicked end-user.

Related to this, while it's fun to say "get a better editor" it's probably not very nice. ;)


Another issue I noticed is that there's a *lot* of code that calls header() directly, so wouldn't necessarily hit the check in WebResponse::header(). It may just be necessary to go fix them all, as with the one in Feed that I fixed so the RSS feed for recent changes would trigger the explanatory exception _before_ the PHP warning from header() splashed across output.
Comment 12 p858snake 2011-04-30 00:10:00 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 13 Antoine "hashar" Musso (WMF) 2011-11-10 07:16:59 UTC
At some point in time, we had a spike of people using notepad (which insert a BOM) to edit their PHP files. This is rare by now, I guess most people use a decent editor nowadays or newbies are smarter?

The worse thing in the patch, is that it adds yet another feature to maintain for minimal benefits.

I would personally close the bug and keep handling the BOM issue through support requests.
Comment 14 Antoine "hashar" Musso (WMF) 2012-01-13 18:51:36 UTC
...and closing bug report. We can handle those rare issues through the usual support channels.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links