Last modified: 2007-05-15 17:53:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T11880, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 9880 - wfMsgWikiHtml does not ensure XHTML validity
wfMsgWikiHtml does not ensure XHTML validity
Status: RESOLVED INVALID
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
PC All
: Normal minor (vote)
: ---
Assigned To: Nobody - You can work on this!
: need-parsertest
Depends on:
Blocks: html
  Show dependency treegraph
 
Reported: 2007-05-11 16:50 UTC by Benson Margulies
Modified: 2007-05-15 17:53 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
File with an unclosed br. (36.66 KB, text/html)
2007-05-14 12:16 UTC, Benson Margulies
Details

Description Benson Margulies 2007-05-11 16:50:36 UTC
[Fatal Error] :43:3: The element type "br" must be terminated by the matching
end-tag "</br>".
c:\x\arstaticwiki\ar\!\!\!\صورة~!!!!ユニセフ0195.JPG_c267.html
org.xml.sax.SAXParseException: The element type "br" must be terminated by the
matching end-tag "</br>".
[Error] :133:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :133:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :133:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :133:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :102:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :114:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :114:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :119:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :162:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Fatal Error] :44:3: The element type "br" must be terminated by the matching
end-tag "</br>".
c:\x\arstaticwiki\ar\(\2\6\صورة~(2691)_Tel_Aviv.jpg_40ef.html
org.xml.sax.SAXParseException: The element type "br" must be terminated by the
matching end-tag "</br>".
[Error] :172:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :82:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :172:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :114:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Fatal Error] :43:3: The element type "br" must be terminated by the matching
end-tag "</br>".
c:\x\arstaticwiki\ar\-\3\4\صورة~-34_sibirien_sviatoinos_bucht.JPG.JPG_07d9.html
org.xml.sax.SAXParseException: The element type "br" must be terminated by the
matching end-tag "</br>".
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :114:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :84:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :187:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
Comment 1 Brion Vibber 2007-05-11 17:55:21 UTC
1) Check with current version

2) Provide sample input to produce this

3) Compare with existing bug entries for ID issues
Comment 2 Benson Margulies 2007-05-11 18:26:58 UTC
Hmm. I'm just consuming what's coming out of the public wikipedia site, which, I
presume, doesn't run quite current version.

So, much as I'd like to be a good citizen here, I'm not sure how to proceed.

Let me ask this question: is the claim that the current version would prevent
the unclosed br tags? Those are the big problem for me. If that's the claim, I
might be able to try an experiment to see if there is still a hole allowing
people to create them.
Comment 3 Brion Vibber 2007-05-11 18:33:32 UTC
Please provide URLs to the pages you're checking, then.
Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-05-11 22:35:31 UTC
We don't validate any IDs whatsoever, including those produced by the 
interface; that issue is known.  The invalid IDs produced there would be 
typical for those involving the portal of an Arabic-alphabet site.  See bug 
4515.

It should be completely impossible for Wikipedia, which has HTML Tidy enabled, 
to have unclosed <br> tags.  I can't see any on ar.wikipedia's Main Page or at 
[[ar:تل أبيب]] (the Tel Aviv article that you appear to have been using).
Comment 5 Benson Margulies 2007-05-14 12:15:35 UTC
I'm working from the most recent AR static dump (April). Is it likely that the
quality of the tidy processing has gone up materially since then?

I'll attach a file ... I've yet to succeed in finding a live page to match one
of filenames. The page I've got here isn't the straight Tel Aviv page, it's some
special JPG-rights-explaining page.
Comment 6 Benson Margulies 2007-05-14 12:16:45 UTC
Created attachment 3641 [details]
File with an unclosed br.
Comment 7 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-05-15 00:54:59 UTC
I see the problem.  [[ar:MediaWiki:Sharedupload]] is at fault.  Probably we 
should run its output through Tidy or Sanitizer or something (does Sanitizer 
fix unclosed <br>s?), if that's not too slow.  As a site-specific workaround, 
you can ask a sysop there to edit the message to begin with <br 
style="clear:both" /> instead of <br style="clear:both">, or sed your files to 
kill that string, but this should probably be fixed in the function itself?
Comment 8 Benson Margulies 2007-05-15 01:00:13 UTC
Thank you for tracking this down from my less than informative breadcrumbs.

If these are relatively uninteresting pages, I can switch on XML parsing and
ignore pages that flunk due to this problem.
Comment 9 Brion Vibber 2007-05-15 17:53:14 UTC
See the configuration settings for tidy usage; we have it disabled for UI
messages for performance reasons.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links