Last modified: 2010-05-15 15:33:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2289, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 289 - ">"-token in URL-tail parsed wrongly


Summary:	">"-token in URL-tail parsed wrongly

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	1.4.x
Hardware:	All All

Importance:	Normal normal with 1 vote (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	parser

Duplicates:	308 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2004-09-03 03:07 UTC by Timwi
Modified:	2010-05-15 15:33 UTC (History)
CC List:	1 user (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Timwi 2004-09-03 03:07:52 UTC

BUG MIGRATED FROM SOURCEFORGE
http://sourceforge.net/tracker/index.php?func=detail&aid=957818&group_id=34373&atid=411192
Originally submitted by Roger Persson (rogper)  2004-05-21 07:00


Of a coincident I noticed that greater-than (>) char in URLs is 
rendered wrongly IF it occures as last character in URL.

Example:
Check this extra semicolon http://sample.link/<hello> in the 
end
Check this  http://sample.link/<hello&gt strange thing

Result:
http://sample.link/<hello>;
http://sample.link/<hello>

------------------------- Additional comments ------------------------
Date: 2004-05-28 09:35
Sender: SF user vibber

The HTML output is:
http://
sample.link/&lt;hello&gt;;

It looks like the HTML stripping is being done before external
links, so
the  have become "&lt;" and "&gt;". Semicolons are
actually
legal in links; the _final_ punctuation (not followed by linkable
chars) is
stripped, but the bits in the middle are considered fair game
for
belonging to a link so it extends up to the "&gt" but not
including
the final ";" (or the other ";" that follows, which
is extraneous).

Correct behavior would be to have the link cover
"http://sample.link/",
then cut off at the <. This will require parsing for external
links before
stripping HTML; perhaps another placeholder step would be useful
here (might also help the longstanding URL-within-URL bug).

Bug is present in both 1.2 and current 1.3.

Comment 1 Timwi 2004-09-03 19:41:14 UTC

*** Bug 308 has been marked as a duplicate of this bug. ***

Comment 2 Brion Vibber 2004-10-10 13:08:14 UTC

Still present; added a test case to parserTests.

Comment 3 Wil Mahan 2004-10-11 00:30:52 UTC

According to RFC 2396, '<' and '>' are disallowed within URIs, and hence I added 
them to the list of prohibited characters.

Comment 4 Brion Vibber 2004-10-11 00:32:05 UTC

Wil, right. The problem is that the conversion of < and > to &lt; and &gt; has already been done when we do the 
external link parsing, and & and ; _are_ allowed in URLs.

Comment 5 Wil Mahan 2004-10-11 17:05:35 UTC

(In reply to comment #4)
> Wil, right. The problem is that the conversion of < and > to &lt; and &gt; has
already been done when we do the 
> external link parsing, and & and ; _are_ allowed in URLs.
> 

Oh, I see. This should now be fixed in HEAD (Parser.php revision 1.323).
Rather than replacing external links before stripping HTML tags as
you suggested before, I just added a check for '&lt;' and '&gt;'
within external links. It's not an especially elegant solution, but
I think it will fix this without meddling with the order of
parser passes.

Comment 6 Brion Vibber 2004-10-11 18:13:49 UTC

Added more test cases.

Comment 7 Wil Mahan 2004-10-11 19:01:30 UTC

(In reply to comment #6)
> Added more test cases.

Fixed one by adding '<' and '>' back to the list of disallowed chars
(I added them earlier, but then I got nervous and undid the change.)

The two cases that still fail are due to the way disallowed
characters are treated as part of the link description; if that's
a bug, it's separate from this one, IMHO.

Comment 8 Antoine "hashar" Musso (WMF) 2005-08-24 11:52:26 UTC

Issue fixed in HEAD and 1.5, all parsertests in HEAD passed successfully.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links