Last modified: 2008-03-02 13:58:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 13218 - API pretty printer should not include double quotes in hyperlinks
API pretty printer should not include double quotes in hyperlinks
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
1.12.x
All All
: Normal normal (vote)
: ---
Assigned To: Roan Kattouw
http://en.wikipedia.org/w/api.php?act...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-02 00:10 UTC by Tyler Romeo
Modified: 2008-03-02 13:58 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tyler Romeo 2008-03-02 00:10:31 UTC
When accessing the api.php file with the action parameter set to "sitematrix", a list of sites come up, as expected. However, the URLs for each site listed appears like this:

http://en.wikipedia.org/"

when it should appear like this:

http://en.wikipedia.org/

In other words, the quotation mark is included with the URL. And when it loads, an error page comes up saying that you might have wanted "http://en.wikipedia.org/wiki/%22", and it redirects you there. The fix is very simple: exclude the quotation mark.
Comment 1 Tyler Romeo 2008-03-02 00:33:52 UTC
When my previous comment was recorded, the site formatted the URL correctly by removing the quotation mark from the address on line 5. However, just to reinforce my point, the API includes the quotation mark.
Comment 2 OverlordQ 2008-03-02 00:49:33 UTC
I changed it to the extention SiteMatrix since that's seperate from the API itself, but digging through the code of both I can't decide whether the error is in SiteMatrix or in formatHTML in ApiFormatBase.php

Since I dont have access to an input file, I can't really help past that.
Comment 3 OverlordQ 2008-03-02 01:28:46 UTC
Should be able to replace lines 190&191 with something like:

$text = preg_replace('(http|https|ftp|gopher)://[\w\-\.]+/', '<a href="$0">$0</a>', $text);

since whitelist is better then trying to list every possible invalid character.
But then again, I dont know how it'll handle the other charsets.
Comment 4 Robert Leverington 2008-03-02 09:50:25 UTC
Current behaviour is as expected, perhaps your client is including the quotation mark in an automatically generated hyperlink, but the URL is surrounded by quotation marks once as expected -- when using a client program to access the API in XML mode you are expected to use a proper XML parser, in which case this would not be a problem.  Resolving as WONTFIX.
Comment 5 Daniel Friesen 2008-03-02 10:10:04 UTC
Actually, not quite.
The issue isn't with the client, it's with the API's pretty print format.

As you'll see, this is the source output:
<a href="http://advisory.wikimedia.org/&quot;">http://advisory.wikimedia.org/&quot;</a>

The issue is that the pretty print format is including the quote inside it's pretty printing when it shouldn't because 90% of the formats always wrap the url inside of the quotes.

The ending characters for the various formats appear to be:
JSON/RAW: "
XML: " and <
PHP: "
WDX: " and <
YAML: (whitespace)
So the regex should terminate links at ["<\s] when pretty printing to output valid things.

But in general, I have a feeling that this kind of thing is mainly fault of trying to do this in a poor method.
The best thing to do would be to go the way of <source/> and find GeSHi files for our output formats, and use it to pretty print the API when not using the actual data formats. After all, we shouldn't be treating pretty printing as if every data format used the same type of output format.

Though, it looks like the reason this is happening is because the linking of the pretty printing was primarily meant so that the help page which shows up by default would have actual clickable links. The SiteMatrix is just the only thing in the API that uses a http:// prefix and as a result it's being linked to.
Comment 6 Robert Leverington 2008-03-02 10:16:38 UTC
Sorry, my bad. This is the APIs fault, changing resolution.
Comment 7 Roan Kattouw 2008-03-02 12:16:33 UTC
Changing description and component to something more accurate, assigning to self.
Comment 8 Roan Kattouw 2008-03-02 13:58:48 UTC
Fixed in r31452. The regex did include " as a terminating character, but unfortunately htmlspecialchars() has already replaced all "s with &quot; by then.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links