Last modified: 2006-05-01 20:23:38 UTC
BUG MIGRATED FROM SOURCEFORGE http://sourceforge.net/tracker/index.php?func=detail&aid=583234&group_id=34373&atid=411192 Originally submitted by Nobody/Anonymous - nobody 2002-07-18 08:48 When a URL contains another full and unescaped URL within its query string, it is correctly parsed as a single big URL when placed directly into the text. However, if put in brackets as [URL] or [URL description], the URL-in-a-bracket parsing breaks. The brackets, URL, and description appear as plain text, and the sub-URL gets reparsed as a standalone hyperlink. Example: http://www.unausa.org/newindex.asp?place=http://www.unausa.org/programs/mun.asp appears correctly as a big link to the correct, full URL. [http://www.unausa.org/newindex.asp?place=http://www.unausa.org/programs/mun.asp] should display as "[1]" being a link, but instead appears with brackets and full URL intact as text, but the portion "http://www.unausa.org/programs/mun.asp" is a linked URL. Workaround: Replacing the : with %3A fixes the parsing problem. (Of course, in this particular case only the shorter URL is actually needed, as it will be dynamically redirected to the longer URL.) ------------------------- Additional comments ------------------------ Date: 2002-07-19 21:06 Sender: SF user lcrocker I'm lowering the priority on this since there's an easy workaround, and would require messing with some pretty stable and pretty important code, but it would be nice, so I'll leave it open. ------------------------------------------------- Date: 2002-07-30 20:44 Sender: SF user vibber I'm raising the priority because I've come across a case the workaround doesn't work for. (See http://www.wikipedia.com/wiki/Wikipedia%3AVillage_pump ) If the main URL is http and the sub-URL is *ftp*, the %3A fix doesn't work: all ftp URLs are parsed /after/ all http URLs, and somehow the %3A gets transformed back into a : in the 'title' field of the link... this triggers the ftp URL-checker, so: [http://promo.net/cgi-promo/pg/t9.cgi?entry=120&full=yes& ftpsite=ftp%3A//ibiblio.org/pub/docs/books/gutenberg/ Gutenberg text] is parsed into the horrific: <a href='http://promo.net/cgi-promo/pg/t9.cgi?entry=120&full=yes &ftpsite=ftp%3A//ibiblio.org/pub/docs/books/gutenberg/' class='external' title="http://promo.net/cgi-promo/pg/t9.cgi?entry=120&am p;full=yes&amp;ftpsite=<a href="ftp://ibiblio.org/pub/docs/books/gutenberg/ class='external' title="ftp://ibiblio.org/pub/docs/books/gutenberg/"> ftp://ibiblio.org/pub/docs/books/gutenberg/</a>">Gu tenberg text</a> Simple partial fix would be to *not* unescape URL-encoded bytes when producing the 'title' attribute for the link, so it remains %3A and doesn't trigger the link converter. Alternatively, find a way to not check for URLs inside HTML tags. ------------------------------------------------- Date: 2003-01-23 00:02 Sender: SF user nichtich URLs inside all kind of links should be treated as text. For instance: [[Sandbox|http://de.wikipedia.org]] produces a link to http://de.wikipedia.org and not to [[Sandbox]] ! ------------------------------------------------- Date: 2003-03-17 07:44 Sender: nobody Logged In: NO Wiki has other problems dealing with URLs that have certain characters in it like '*' or if the URL contains part of another URL. Examples: http://example.com/*/foo/bar [http://example.com/redir/http://www.prwatch.org some link] As the original bug report indicated, URL escaping can be used as a workaround: http://example.com/%2A/foo/bar [http://example.com/redir/%68ttp://www.prwatch.org some link] --Sheldon Rampton (sheldon.rampton@verizon.net) ------------------------------------------------- Date: 2004-08-07 20:30 Sender: SF user timstarling All of these problems are now fixed except nichtich's. I also added URL-encoding, it seemed to me to be more user- friendly to allow users to paste URLs in directly. Brion's example: [http://promo.net/cgi-promo/pg/t9.cgi? entry=120&full=yes& ftpsite=ftp%3A//ibiblio.org/pub/docs/books/gutenberg/ Gutenberg text] This needs to become: [http://promo.net/cgi-promo/pg/t9.cgi? entry=120&full=yes&ftpsite=ftp% 3A//ibiblio.org/pub/docs/books/gutenberg/ Gutenberg text] This is not backwards-compatible so may require automated conversion.
Had this happen to me when trying to add a link from the Internet Archive. I replaced the "h" in the second "http" with %68- works for me. References: http://en.wikipedia.org/wiki/Smoot http://web.archive.org/web/19970806205154/%68ttp://web.mit.edu/museum/fun/smoots.html
Just for reference, I discovered an even simpler workaround for that particular problem earlier: you can leave out the second "http://" altogether from the archive.org URL, and it will still function correctly. Just in case anyone was looking for the easiest workaround.
*** Bug 1031 has been marked as a duplicate of this bug. ***
This seems to be fixed in both 1.4 and HEAD, can someone else please verify this incase I'm misunderstanding the problem?
It's marked as fixed-in-cvs (see keywords at the top). It's not yet closed since the site is still running 1.3.x
Gah, I'll get the hang of the weird way bugzilla is used here someday. I assumed if it was fixed it would be marked as FIXED, and then CLOSED once the fix is released. So very confusing.
If it's marked as "fixed", it wouldn't show in search. So we couldn't yell at people who open duplicates.
that's because the default query in the currently running b.w.o version of bugzilla is poor. see https://bugzilla.mozilla.org/show_bug.cgi?id=194116 for more discussion than you ever wanted on this issue.
*** Bug 1111 has been marked as a duplicate of this bug. ***
*** Bug 1129 has been marked as a duplicate of this bug. ***
*** Bug 1301 has been marked as a duplicate of this bug. ***
Changed summary to include the problem as at 1.4beta4, where: URLURL splits link and displayed text [URLURL] ok [URLURL URLURL] ok
Removing fixed-in-cvs keyword, as URLURL form is still kinda broken.
Added a parser test case for the failing case.
This bug is still in existance, see for example http://nl.wikipedia.org/w/index.php?title=Afbeelding:Rob_van_de_Meeberg.jpg&diff=0&oldid=1876874 . Jelle Zijlstra/Ucucha
this patch works: Index: ../includes/Parser.php =================================================================== RCS file: /user/jiangxin/project/wiki/mediawiki/src/mediawiki/includes/Parser.php,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- ../includes/Parser.php 5 Nov 2005 09:06:37 -0000 1.2 +++ ../includes/Parser.php 5 Nov 2005 16:42:35 -0000 1.3 @@ -1127,6 +1127,17 @@ while ( $i < count( $bits ) ){ $protocol = $bits[$i++]; $remainder = $bits[$i++]; + /* Fix BUG 361: URL within URL. (by johnson@worldhello.net) */ + while ( !preg_match('/[\s]+$/', $remainder) ) { + if( $i < count( $bits) ) + { + $remainder .= $bits[$i++]; + } + else + { + break; + } + } if ( preg_match( '/^('.EXT_LINK_URL_CLASS.'+)(.*)$/s', $remainder, $m ) ) { # Found some characters after the protocol that look promising
I've come across something that might be related (not sure if this belong here or as a new bug), some "illegal" URL's can cause not just the link parsing to break, but the rendering of the whole page. Check this revision for example: http://en.wikipedia.org/w/index.php?title=Image:S%2BS.jpg&oldid=49244647 I've been able to reproduce it by copying the messed up URL to other pages and previewing them, sometimes the page breaks and sometimes everyting parses ok, wich is odd. Guess the order in wich scertain things appear on the page affect it somehow.
c17 can be resumed as: http://www/?http://www/ really ==header== Which render as the incorrect: <p><a href="http://www/?http://www/ really </p><p><h2>header</h2>" class='external free' title="http://www/?http://www/ really </p><p><h2>header</h2>" rel="nofollow">http://www/?http://www/ really </p><p><h2>header</h2></a> really </p>
All occurences above are fixed in current trunk. Case in comment 17 is fixed by r14008 . Closing bug.