Last modified: 2005-07-23 04:38:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2787, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 787 - external links being rendered when they only have one slash


Summary:	external links being rendered when they only have one slash

Status:	CLOSED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low minor with 1 vote (vote)
Target Milestone:	---
Assigned To:	Antoine "hashar" Musso (WMF)

URL:	http://meta.wikimedia.org/wiki/Sandbo...
Whiteboard:
Keywords:

Depends on:	431
Blocks:	2784
	Show dependency tree / graph

Reported:	2004-10-26 16:30 UTC by Rowan Collins [IMSoP]
Modified:	2005-07-23 04:38 UTC (History)
CC List:	1 user (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Rowan Collins [IMSoP] 2004-10-26 16:30:33 UTC

Ext. links with forms such as "http:/foo" and "ftp:/foo" (rather than
"http://foo" and "ftp://foo") are currently treated as valid by Parser.php, but
then not styled by the CSS (i.e. they show up light blue in monobook, but
receive no icon). The relevant RFCs appear to define the double-slash as
mandatory (e.g. ftp://ftp.rfc-editor.org/in-notes/rfc1738.txt section 3.1), so
the parser should probably simply pass over these as invalid syntax. 

Of course, things like "mailto:" will *never* have a double-slash, so it needs
to be a scheme-specific check. Perhaps it should be treated as part of the
scheme, so the list (configurable, per bug 431) would be something like
$wgValidURISchemes=array('http://', 'ftp://', 'mailto:') etc.

Comment 1 Antoine "hashar" Musso (WMF) 2004-11-13 18:35:47 UTC

The parser handle the "http:/foo" things correctly and even build a correct link.
This way reader will at least be able to follow the link.

The fact that the icon is missing is good way to alert the reader that something
is wrong an need some minor tweaking.

Let it as this.

Comment 2 Rowan Collins [IMSoP] 2004-11-14 19:24:11 UTC

I'm sorry, but I disagree. I'm happy to have this marked as a low-severity,
low-priority bug, but a bug it is. The current behaviour does *not* handle such
links "correctly", by any definition of "correctness" we choose.

* If we want to follow standards, "http:/example.com" should *not* be treated as
a valid URL, and therefore should *not* generate a clickable link; if there's no
link at all, somebody will soon spot that something needs fixing.

* If we want to ignore the standards, and go for an easy life, we can continue
to create clickable links for such invalid URLs, and hope that browsers
generally treat them sanely. In that case, however, we need to fix the CSS to
use the same rule as the parser, so we don't get these anomalously un-labelled
links. Saying that this gives a clue that something needs fixing is, frankly,
rather silly; either the links are valid, and should be displayed properly, or
they are invalid, and should not be rendered at all.

It's not like it would even be a complex fix, especially if someone were playing
with that code anyway, for bug 431 (like I say, just make the protocols array
contain "http://", "mailto:", etc.).

Comment 3 Antoine "hashar" Musso (WMF) 2004-11-14 20:06:11 UTC

So I will prefer that the parser stop rendering URL with only one / .
That will be even better to trigger the error ;o)

Comment 4 Rowan Collins [IMSoP] 2004-11-14 20:36:19 UTC

(In reply to comment #3)
> So I will prefer that the parser stop rendering URL with only one / .
> That will be even better to trigger the error ;o)

I'm not sure if you have misunderstood me, or if I am now misunderstanding you,
but just to clarify: not rendering URLs with only one '/' would in fact be
correct behaviour (in accordance with the definitions of valid URLs in the
relevant RFCs). It would also make the error easier to spot than having the PHP
treat it as OK, but the CSS treat it as 'something unknown', which is perhaps
what you meant. As I say, either way, our code should treat such links
*consistently*.

Comment 5 Ævar Arnfjörð Bjarmason 2005-05-13 15:41:35 UTC

I added a parsertest for this in REL1_4 and HEAD.

Comment 6 Antoine "hashar" Musso (WMF) 2005-07-04 00:40:34 UTC

parser engine got a trouble cause http:foobar is atm considered
valid probably to handle the mailto:foo@b.ar case. Need to rewrite
some pieces so that the protocols are defined with the //

Eg:
define( 'URL_PROTOCOLS', 'http|https|ftp|irc|gopher|news|mailto' );
define( 'HTTP_PROTOCOLS', 'http|https' );

Should become something like:
define( 'URL_PROTOCOLS',
'http:\/\/|https:\/\/|ftp:\/\/|irc:\/\/|gopher:\/\/|news:\/\/|mailto:' );
define( 'HTTP_PROTOCOLS', 'http:\/\/|https:\/\/' );

Comment 7 Rowan Collins [IMSoP] 2005-07-04 02:28:43 UTC

> define( 'URL_PROTOCOLS',
> 'http:\/\/|https:\/\/|ftp:\/\/|irc:\/\/|gopher:\/\/|news:\/\/|mailto:' );

Or, as suggested in bug 431, make the whole thing configurable, as
$wgUrlProtocols or some such. BTW, do you really need to escape all those
slashes? What would be the side-effect of having the following, much more
pleasant definition?

$wgUrlProtocols='http://|https://|ftp://|irc://|gopher://|news:|mailto:';

[oh, and note that 'news:' doesn't have a '//']

Comment 8 Rowan Collins [IMSoP] 2005-07-05 19:53:13 UTC

(In reply to comment #7)
> BTW, do you really need to escape all those
> slashes? 

Sorry, I was being dim there - '/' would mark the end of the regex; however, PHP
uses the same regex system as Perl, so you can actually choose your own regex
delimiters. Thus my regex fragment would work, as long as any regex built from
used, say, "%regex%" instead of "/regex/".

Comment 9 Antoine "hashar" Musso (WMF) 2005-07-08 05:54:16 UTC

> Sorry, I was being dim there - '/' would mark the end of the regex;
That's why I keep escaping slashes. We could use another delimiter though.

http://twenkill.dyndns.org/wiki/787
http://test.leuksman.com/index.php/787

I fixed the bug. Malformed URLs are no more converted.

Comment 10 Antoine "hashar" Musso (WMF) 2005-07-08 06:12:12 UTC

Fixed the fix that stopped autonumbering :o)

Comment 11 Niklas Laxström 2005-07-09 20:30:38 UTC

There is some broken links floating around which need to be fixed now.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links