Last modified: 2010-07-13 17:50:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T6598, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 4598 - Order of title recognition and bold / italic rendering should be reversed
Order of title recognition and bold / italic rendering should be reversed
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: High minor (vote)
: ---
Assigned To: Nobody - You can work on this!
http://de.wikipedia.org/wiki/Benutzer...
: patch, patch-need-review
: 3749 5454 6783 11707 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-01-13 19:41 UTC by lɛʁi לערי ריינהארט
Modified: 2010-07-13 17:50 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
patch swapping quote and links renderers (627 bytes, patch)
2006-05-01 20:37 UTC, Antoine "hashar" Musso (WMF)
Details
intrusive fix using new Placeholders class (18.45 KB, patch)
2006-08-17 08:15 UTC, Wil Mahan
Details
intrusive fix using new Placeholders class (18.45 KB, patch)
2006-08-17 08:17 UTC, Wil Mahan
Details
Updated fix that corrects undefined variable notices (18.97 KB, patch)
2006-08-18 08:17 UTC, Wil Mahan
Details

Description lɛʁi לערי ריינהארט 2006-01-13 19:41:30 UTC
Hallo!

This bug is about titles as
[[de:G''t]], [[foo'''bar]], [[usage of ''''']] etc.

test cases are available at
http://de.wikipedia.org/wiki/user:Gangleri/tests/G%27%27t
http://de.wikipedia.org/wiki/user:Gangleri/tests/foo%27%27%27bar
http://de.wikipedia.org/wiki/user:Gangleri/tests/usage_of_%27%27%27%27%27

I assume that bold / italic rendering should be done *after* title recognition;
that title recognition should *not* be influenced (should not depend on) by bold
/ italic rendering wiki syntax.

Please create also corresponding parser tests.

best regards reinhardt [[user:gangleri]]
Comment 1 lɛʁi לערי ריינהארט 2006-01-13 20:04:16 UTC
Bug 4601: Exact search for titles using php meta / escape characters
is about searching these titles.

MediaZilla allows wiki links to the testcases:
[[de:user:Gangleri/tests/G''t]]
[[de:user:Gangleri/tests/foo'''bar]]
[[de:user:Gangleri/tests/usage of ''''']]
Comment 2 Antoine "hashar" Musso (WMF) 2006-01-13 20:13:11 UTC
would be great to get a parsertest and backport the fix in 1.5
Comment 3 Antoine "hashar" Musso (WMF) 2006-05-01 20:37:18 UTC
Created attachment 1638 [details]
patch swapping quote and links renderers

A simple patch that swap the renderers. The parser tests looks ok.
Comment 4 Antoine "hashar" Musso (WMF) 2006-05-01 20:38:31 UTC
*** Bug 5454 has been marked as a duplicate of this bug. ***
Comment 5 Antoine "hashar" Musso (WMF) 2006-05-02 16:25:15 UTC
commited in branches/hashar@14022
Comment 6 Brion Vibber 2006-05-28 00:06:10 UTC
Parser test added on trunk in r14425.
Patch fails, breaking existing valid markup.
Comment 7 Brion Vibber 2006-07-22 22:34:30 UTC
*** Bug 6783 has been marked as a duplicate of this bug. ***
Comment 8 Wil Mahan 2006-08-17 08:15:47 UTC
Created attachment 2239 [details]
intrusive fix using new Placeholders class

To summarize the problem: we currently parse quotes before links, so something
like [[foo ''bar'']] becomes [[foo <i>bar</i>]] after parsing quotes, which is
wrong.

But if we parsed links before quotes, things like ''[[foo bar|formatting
''here'']]'' would break, because parsing links removes the whole link and
replaces it by a placeholder; we get ''<!--LINK 0-->'' with no chance to parse
all the quotes together.

One idea for fixing this is to use more flexible link placeholders that can
contain text. So ''[[foo bar|formatting ''here'']]'' becomes ''<placeholder
i="0">formatting ''here''</placeholder>'', and we can then parse the quotes.
Text is only put inside the placeholders for piped links.

This patch creates a new class called Placeholders implementing what I
described above. Parser.php currently implements several different placeholder
mechanisms, so the new class is intended to be a step toward organizing them.
With the patch, internal links are parsed before quotes.

The patch fixes the test case for this bug without breaking any of the other
parser tests. It appears to be a little slower than without the patch on pages
with lots of links, but I haven't attempted to optimize it.
Comment 9 Wil Mahan 2006-08-17 08:17:25 UTC
Created attachment 2240 [details]
intrusive fix using new Placeholders class

To summarize the problem: we currently parse quotes before links, so something
like [[foo ''bar'']] becomes [[foo <i>bar</i>]] after parsing quotes, which is
wrong.

But if we parsed links before quotes, things like ''[[foo bar|formatting
''here'']]'' would break, because parsing links removes the whole link and
replaces it by a placeholder; we get ''<!--LINK 0-->'' with no chance to parse
all the quotes together.

One idea for fixing this is to use more flexible link placeholders that can
contain text. So ''[[foo bar|formatting ''here'']]'' becomes ''<placeholder
i="0">formatting ''here''</placeholder>'', and we can then parse the quotes.
Text is only put inside the placeholders for piped links.

This patch creates a new class called Placeholders implementing what I
described above. Parser.php currently implements several different placeholder
mechanisms, so the new class is intended to be a step toward organizing them.
With the patch, internal links are parsed before quotes.

The patch fixes the test case for this bug without breaking any of the other
parser tests. It appears to be a little slower than without the patch on pages
with lots of links, but I haven't attempted to optimize it.
Comment 10 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-08-17 14:14:45 UTC
(In reply to comment #0)
> I assume that bold / italic rendering should be done *after* title recognition;
> that title recognition should *not* be influenced (should not depend on) by bold
> / italic rendering wiki syntax.

I'm not sure about that.  In particular, if one word in a title would be
italicized in normal prose, and another wouldn't, then something like
[[Criticism of ''Harry Potter'']] would likely be intended to be interpreted as
[[Criticism of Harry Potter|Criticism of ''Harry Potter'']], not [[Criticism of
''Harry Potter''|Criticism of ''Harry Potter'']].  Do we want to assume the
latter?  Why?  A good patch should allow either to be typed out in the full
[[link|display]] form regardless, so this should be a relatively minor point.

(I know Neapolitan wants to link to '' a lot, but those problems extend beyond
links.)

(In reply to comment #9)
> One idea for fixing this is to use more flexible link placeholders that can
> contain text. So ''[[foo bar|formatting ''here'']]'' becomes ''<placeholder
> i="0">formatting ''here''</placeholder>'', and we can then parse the quotes.
> Text is only put inside the placeholders for piped links.

Clever solution.  If we want to parse the display text, don't hide it!  I'm
surprised no bugs crop up when you shift around the parsing order like that,
though; the patch will need to be tested on a wide range of pages.
Comment 11 Wil Mahan 2006-08-17 19:05:22 UTC
(Sorry for the double post and poor formatting above.)

(In reply to comment #10)
> [[Criticism of ''Harry Potter'']] would likely be intended to be interpreted as
> [[Criticism of Harry Potter|Criticism of ''Harry Potter'']], not [[Criticism of
> ''Harry Potter''|Criticism of ''Harry Potter'']].  Do we want to assume the
> latter?  Why?  A good patch should allow either to be typed out in the full
> [[link|display]] form regardless, so this should be a relatively minor point.

With my approach it's possible to have either interpretation, but after
thinking about it I chose the latter because 1) it matched the existing parser
test case and 2) I think the former wouldn't allow straightforward linking
to an article that happened to have four quotes in the title, without putting
unwanted formatting in the link. I think it's logical and in line with
existing practice to insist that if formatting is desired within a link, it must
be a piped link. I don't feel strongly about this, though.

> surprised no bugs crop up when you shift around the parsing order like that,
> though; the patch will need to be tested on a wide range of pages.

Indeed. As I mentioned, the patch passes the existing tests. The switch in style
from <!--LINK 0--> to <placeholder i="0"></placeholder> was to resolve one issue
that came up with definition lists. We have to be careful to ignore colons within
links in cases like

;[[foo|bar: baz]]: definition

Fortunately, there was already a mechanism to skip colons within matching pairs
of tags.
Comment 12 Wil Mahan 2006-08-18 08:17:01 UTC
Created attachment 2245 [details]
Updated fix that corrects undefined variable notices
Comment 13 Chad H. 2009-07-13 19:40:55 UTC
*** Bug 3749 has been marked as a duplicate of this bug. ***
Comment 14 Alexandre Emsenhuber [IAlex] 2009-07-29 20:49:16 UTC
*** Bug 11707 has been marked as a duplicate of this bug. ***
Comment 15 Platonides 2010-07-13 17:50:27 UTC
Fixed by r68491

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links