Last modified: 2010-05-15 15:33:46 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2637, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 637 - Image captions containing external links are broken in HEAD
Image captions containing external links are broken in HEAD
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.4.x
PC All
: Normal normal (vote)
: ---
Assigned To: Wil Mahan
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-10-03 03:27 UTC by Wil Mahan
Modified: 2010-05-15 15:33 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Rowan's replaceImageLinks patch (2.92 KB, patch)
2004-10-03 03:29 UTC, Wil Mahan
Details
Wil's patch (3.53 KB, patch)
2004-10-03 03:32 UTC, Wil Mahan
Details
Reworking of Wil's concept, only different (6.93 KB, patch)
2004-10-05 00:46 UTC, Rowan Collins [IMSoP]
Details
trivial fix (952 bytes, patch)
2004-10-05 17:04 UTC, Rowan Collins [IMSoP]
Details

Description Wil Mahan 2004-10-03 03:27:21 UTC
In 1.4pre I made a change that broke image captions containing external
links. In essence, the problem is that when images are parsed before
external links, things like the image "alt" text break. I'm filing this
bug to track the possibilities for fixing it.

(Background: my change was calling replaceInternalLinks() before
replaceExternalLinks(). I did this to fix bug 2, and I think it is
best way to fix that bug. However, I'm willing to reconsider if
necessary. One possibility is to back out my change and WONTFIX bug
2, since it represents context sensitivity, and is tricky to fix,
for a rare case.)

Rowan Collins noticed this problem with image captions and made
a patch to fix it. I also made a patch using a different
approach. Both patches pass all the relevant parser test
cases; I believe they are both functionally correct. To me, the
only issue is which is better from a design standpoint.

Currently, replaceInternalLinks() is called twice; the first
pass replaces any links inside image captions, and the second
pass replaces the images with captions containing links,
themselves. With either patch, replaceInternalLinks() is instead
called only once.
Comment 1 Wil Mahan 2004-10-03 03:29:46 UTC
Created attachment 63 [details]
Rowan's replaceImageLinks patch

Rowan's patch adds a new method, replaceImageLinks(), which
called after replaceExternalLinks(). Thus replaceInternalLinks()
no longer handles images. The bug is avoided because
all images are processed after external links.

Rowan described the patch himself at
http://mail.wikipedia.org/pipermail/wikitech-l/2004-October/025647.html

I think this patch is well done, and shows a good understanding
of the problem. My main criticism is that it represents a new
parser pass, and I think we should move toward fewer passes,
not more. A counteragument is that calling replaceInternalLinks()
twice is already an extra pass, so the patch only replaces
one pass with another.
Comment 2 Wil Mahan 2004-10-03 03:32:57 UTC
Created attachment 64 [details]
Wil's patch

My patch checks every link to see if it is an image,
and if so, recursively calls replace{Internal,External}Links
on the link text. After an image is processed, it
is replaced by a placeholder, so it is not parsed further.
This should be OK, because tables, lists, etc. aren't
allowed in captions anyway.

I prefer this approach because there is only one
replaceInternalLinks() pass. Of course, any comments
or other ideas would be welcome.
Comment 3 Rowan Collins [IMSoP] 2004-10-03 18:19:52 UTC
(In reply to comment #2)
> Created an attachment (id=64) [edit]
> Wil's patch

I can see one major problem with your patch, which is that you use a text check
for the image prefix rather than Title::getNamespace(). This isn't guaranteed to
work, since a namespace can have multiple "aliases". I'll have to have a better
look to see if there are any other issues.
Comment 4 Wil Mahan 2004-10-04 00:50:11 UTC
(In reply to comment #3)
> I can see one major problem with your patch, which is that you use a text check
> for the image prefix rather than Title::getNamespace(). This isn't guaranteed to
> work, since a namespace can have multiple "aliases". I'll have to have a better
> look to see if there are any other issues.

Thanks for looking at the patch. Title::secureAndSplit() uses the same code
for checking for the Image namespace. Could you give an example of such an
"alias"?
Comment 5 Rowan Collins [IMSoP] 2004-10-04 14:33:54 UTC
(In reply to comment #4)
> Thanks for looking at the patch. Title::secureAndSplit() uses the same code
> for checking for the Image namespace. Could you give an example of such an
> "alias"?

I think that $imgpre stuff is actually redundant: the actual check for :Image:
(and also :Category:) comes within replaceInternalLinks() itself; after all, it
doesn't really constitute a different namespace, just a different link behaviour:

<code>
 $noforce = (substr($m[1], 0, 1) != ':');
 if (!$noforce) {
     # Strip off leading ':'
     $link = substr($link, 1);
 }
</code>

As for namespace aliases, I had it on good authority but couldn't find them
either, until I looked at Title::secureAndSplit(), which checks first
Namespace::getCanonicalIndex() and then $wgContLang->getNsIndex(). So a wiki
with a localised image namespace called "Bild:" will still treat "Image:" as
being the same namespace. I think the only sensible way of doing it is creating
a Title object, since somebody might want to change this behaviour further in
the future.

On the other hand, I think there are a few ways of doing what your patch does
slightly more efficiently, so I'll probably post a new one later.
Comment 6 Wil Mahan 2004-10-04 17:43:33 UTC
(In reply to comment #5)

> I think that $imgpre stuff is actually redundant: the actual check for :Image:
> (and also :Category:) comes within replaceInternalLinks() itself; after all, it
> doesn't really constitute a different namespace, just a different link behaviour:

In replaceInternalLinks(), yes, but that is not the only user of Title.
For example, when including a template, that code is necessary.
 
> As for namespace aliases, I had it on good authority but couldn't find them
> either, until I looked at Title::secureAndSplit(), which checks first
> Namespace::getCanonicalIndex() and then $wgContLang->getNsIndex(). So a wiki
> with a localised image namespace called "Bild:" will still treat "Image:" as
> being the same namespace. I think the only sensible way of doing it is creating

"Bild:" should still work with my patch. The assumption is that the localization
of "Image" is unique for a given language; I could be wrong, but I haven't seen
a counterexample.

> a Title object, since somebody might want to change this behaviour further in
> the future.

I would rather avoid creating a Title object, because I don't see a
reason to incur the overhead. However, if it is indeed necessary,
it would not be that difficult to do, and we already create the Title
for all valid links. So I don't see it as a major problem with the
patch.

> On the other hand, I think there are a few ways of doing what your patch does
> slightly more efficiently, so I'll probably post a new one later.

Indeed, I think the replaceInternalLinks() method is slow in general, and
I agree there is room for improvement. I believe correctness should come
before optimizations, though.
Comment 7 Rowan Collins [IMSoP] 2004-10-04 18:02:10 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > I think that $imgpre stuff is actually redundant: the actual check for :Image:
> > (and also :Category:) comes within replaceInternalLinks() itself; after all, it
> > doesn't really constitute a different namespace, just a different link
behaviour:
> 
> In replaceInternalLinks(), yes, but that is not the only user of Title.
> For example, when including a template, that code is necessary.

Why? If it is, then we need to fix that code. The conflict is between ":" as the
name of the MAIN namespace, and ":" as a prefix to suppress the magicality of
links to namespaces such as "Image:" and "Category:". I'm not sure why the
latter case would ever occur anywhere other than links (in the sense that if
it's not a link, it's not going to do anything "magic" anyway), which is why I
assumed that stripping the prefix in replaceInternalLinks() was sufficient to
distiguish. Can you identify any other cases where it would be needed?

> "Bild:" should still work with my patch. The assumption is that the localization
> of "Image" is unique for a given language; I could be wrong, but I haven't seen
> a counterexample.

But "Image:" will not, since you don't take into account the existence of the
"canonical names"; this isn't a big deal, but it creates an anomaly, and
disables a function unrelated to the problem you are trying to solve. The
mapping of namespace_name to namespace is potentially many-to-one; currently, a
localisation can have "Bild:" and "Image:" both mapping to the image namespace.
 
> I would rather avoid creating a Title object, because I don't see a
> reason to incur the overhead. However, if it is indeed necessary,
> it would not be that difficult to do, and we already create the Title
> for all valid links. So I don't see it as a major problem with the
> patch.

No, it's not a major problem; but I *can't* see a "correct" [non-hacky] way of
avoiding it, so it's a need for revision. Hopefully, there won't be too many
cases of something that could be an image link, but actually is just an invalid
one (those are the cases where creating a Title object gives extra overhead).
Comment 8 Rowan Collins [IMSoP] 2004-10-05 00:46:10 UTC
Created attachment 66 [details]
Reworking of Wil's concept, only different

Right, I've finally finished my patch, based on Wil's. Basically, I think this
is the better approach, because only problem links have to go through extra
processing, not all image links; this should make it more efficient.

The differences with Wil's patch are:
* do the special case code after the Title object has been created, so that we
can reliably check the namespace
* use a couple or regexes, rather than counting the number of ]]'s on a line
* try and take every opportunity to give up, to minimise the load of the
special case code

This works for me, but then that was true of the last one. I just hope there's
nothing wrong with this, cos I've been banging my head against it for hours!
Comment 9 Wil Mahan 2004-10-05 03:57:37 UTC
Your patch looks good. I committed it, with a few cosmetic changes.
Thanks!

Could you clarify the category part of your change (not creating
$nnt)? It looks ok at first glance, but presumably it was done that
way for a reason.

Also, I'm not sure stripping tags from the alt text is the best
approach, but I committed it pending a resolution to bug 368,
since it makes the parser tests work.
Comment 10 Rowan Collins [IMSoP] 2004-10-05 12:28:55 UTC
Thanks for the vote of confidence :)

The $nnt thing was just a coincidental fix I spotted: I asked on IRC, and no-one
could come up with any purpose, or indeed any difference that it would make to
anything; Brion blamed it on a "brainfart" on Magnus' part when he first checked
in the category namespace. I couldn't make anything break without it, so I
tidied it away for efficiency's sake.

As for stripping tags, it was already done somewhere else in the code; it nicely
unwraps ext.links, and used to unwrap internal links, too, I think, but now the
placeholders are comments they just disappear altogether. I've filed that as bug
648, since we may still want to generate a title using this method, even if we
eventually get an agreement on bug 368. (Come to think, it will get rid of
things like "<small>" or "<em>", too, which would otherwise get escaped and be ugly)

Meanwhile, I think I found a case where my patch is sub-optimal: in
"[[Image:foo|    [[link]]", the [[link]] should still be parsed, but at the
moment it will get eaten in the (failed) attempt to turn it into an image
caption. Not a huge problem, since that markup is essentially broken anyway, but
probably not expected behaviour.
I haven't tested this (I haven't even tested that it breaks yet!) but I think
all we need is to add the following:
 # we couldn't find the end of this imageLink, so output it raw
+$text = $this->replaceExternalLinks($text);
 $s .= $prefix . '[[' . $link . '|' . $text;
No need to do it on the next }else{ down, because we won't have eaten any "[["s
Comment 11 Rowan Collins [IMSoP] 2004-10-05 17:04:29 UTC
Created attachment 69 [details]
trivial fix

Confirmed: the patch as committed will fail to render all links after a broken
Image caption:
"[[Image:Rowan.jpeg|thumb|This is a broken caption. But [[foo|this]] is just an
ordinary link." is rendered entirely as raw text.

Adding that line (as per this tiny patch) does indeed make that link work.
Annoyingly, it will still break any other captions with links in further down
the page, but I think that's an obscure enough case that we don't need to worry
about it.
Comment 12 Wil Mahan 2004-10-06 19:24:53 UTC
(In reply to comment #11)
> Adding that line (as per this tiny patch) does indeed make that link work.
> Annoyingly, it will still break any other captions with links in further down
> the page, but I think that's an obscure enough case that we don't need to worry
> about it.

Applied, thanks. I agree that broken image captions aren't important enough to
worry about further.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links