Last modified: 2010-05-15 15:33:19 UTC
BUG MIGRATED FROM SOURCEFORGE
Originally submitted by Roger Persson (rogper) 2004-05-21 07:00
Of a coincident I noticed that greater-than (>) char in URLs is
rendered wrongly IF it occures as last character in URL.
Check this extra semicolon http://sample.link/<hello> in the
Check this http://sample.link/<hello> strange thing
------------------------- Additional comments ------------------------
Date: 2004-05-28 09:35
Sender: SF user vibber
The HTML output is:
It looks like the HTML stripping is being done before external
the have become "<" and ">". Semicolons are
legal in links; the _final_ punctuation (not followed by linkable
stripped, but the bits in the middle are considered fair game
belonging to a link so it extends up to the ">" but not
the final ";" (or the other ";" that follows, which
Correct behavior would be to have the link cover
then cut off at the <. This will require parsing for external
stripping HTML; perhaps another placeholder step would be useful
here (might also help the longstanding URL-within-URL bug).
Bug is present in both 1.2 and current 1.3.
*** Bug 308 has been marked as a duplicate of this bug. ***
Still present; added a test case to parserTests.
According to RFC 2396, '<' and '>' are disallowed within URIs, and hence I added
them to the list of prohibited characters.
Wil, right. The problem is that the conversion of < and > to < and > has already been done when we do the
external link parsing, and & and ; _are_ allowed in URLs.
(In reply to comment #4)
> Wil, right. The problem is that the conversion of < and > to < and > has
already been done when we do the
> external link parsing, and & and ; _are_ allowed in URLs.
Oh, I see. This should now be fixed in HEAD (Parser.php revision 1.323).
Rather than replacing external links before stripping HTML tags as
you suggested before, I just added a check for '<' and '>'
within external links. It's not an especially elegant solution, but
I think it will fix this without meddling with the order of
Added more test cases.
(In reply to comment #6)
> Added more test cases.
Fixed one by adding '<' and '>' back to the list of disallowed chars
(I added them earlier, but then I got nervous and undid the change.)
The two cases that still fail are due to the way disallowed
characters are treated as part of the link description; if that's
a bug, it's separate from this one, IMHO.
Issue fixed in HEAD and 1.5, all parsertests in HEAD passed successfully.