Last modified: 2010-05-15 15:33:19 UTC
BUG MIGRATED FROM SOURCEFORGE http://sourceforge.net/tracker/index.php?func=detail&aid=957818&group_id=34373&atid=411192 Originally submitted by Roger Persson (rogper) 2004-05-21 07:00 Of a coincident I noticed that greater-than (>) char in URLs is rendered wrongly IF it occures as last character in URL. Example: Check this extra semicolon http://sample.link/<hello> in the end Check this http://sample.link/<hello> strange thing Result: http://sample.link/<hello>; http://sample.link/<hello> ------------------------- Additional comments ------------------------ Date: 2004-05-28 09:35 Sender: SF user vibber The HTML output is: http:// sample.link/<hello>; It looks like the HTML stripping is being done before external links, so the have become "<" and ">". Semicolons are actually legal in links; the _final_ punctuation (not followed by linkable chars) is stripped, but the bits in the middle are considered fair game for belonging to a link so it extends up to the ">" but not including the final ";" (or the other ";" that follows, which is extraneous). Correct behavior would be to have the link cover "http://sample.link/", then cut off at the <. This will require parsing for external links before stripping HTML; perhaps another placeholder step would be useful here (might also help the longstanding URL-within-URL bug). Bug is present in both 1.2 and current 1.3.
*** Bug 308 has been marked as a duplicate of this bug. ***
Still present; added a test case to parserTests.
According to RFC 2396, '<' and '>' are disallowed within URIs, and hence I added them to the list of prohibited characters.
Wil, right. The problem is that the conversion of < and > to < and > has already been done when we do the external link parsing, and & and ; _are_ allowed in URLs.
(In reply to comment #4) > Wil, right. The problem is that the conversion of < and > to < and > has already been done when we do the > external link parsing, and & and ; _are_ allowed in URLs. > Oh, I see. This should now be fixed in HEAD (Parser.php revision 1.323). Rather than replacing external links before stripping HTML tags as you suggested before, I just added a check for '<' and '>' within external links. It's not an especially elegant solution, but I think it will fix this without meddling with the order of parser passes.
Added more test cases.
(In reply to comment #6) > Added more test cases. Fixed one by adding '<' and '>' back to the list of disallowed chars (I added them earlier, but then I got nervous and undid the change.) The two cases that still fail are due to the way disallowed characters are treated as part of the link description; if that's a bug, it's separate from this one, IMHO.
Issue fixed in HEAD and 1.5, all parsertests in HEAD passed successfully.