Last modified: 2010-07-13 17:50:27 UTC
Hallo! This bug is about titles as [[de:G''t]], [[foo'''bar]], [[usage of ''''']] etc. test cases are available at http://de.wikipedia.org/wiki/user:Gangleri/tests/G%27%27t http://de.wikipedia.org/wiki/user:Gangleri/tests/foo%27%27%27bar http://de.wikipedia.org/wiki/user:Gangleri/tests/usage_of_%27%27%27%27%27 I assume that bold / italic rendering should be done *after* title recognition; that title recognition should *not* be influenced (should not depend on) by bold / italic rendering wiki syntax. Please create also corresponding parser tests. best regards reinhardt [[user:gangleri]]
Bug 4601: Exact search for titles using php meta / escape characters is about searching these titles. MediaZilla allows wiki links to the testcases: [[de:user:Gangleri/tests/G''t]] [[de:user:Gangleri/tests/foo'''bar]] [[de:user:Gangleri/tests/usage of ''''']]
would be great to get a parsertest and backport the fix in 1.5
Created attachment 1638 [details] patch swapping quote and links renderers A simple patch that swap the renderers. The parser tests looks ok.
*** Bug 5454 has been marked as a duplicate of this bug. ***
commited in branches/hashar@14022
Parser test added on trunk in r14425. Patch fails, breaking existing valid markup.
*** Bug 6783 has been marked as a duplicate of this bug. ***
Created attachment 2239 [details] intrusive fix using new Placeholders class To summarize the problem: we currently parse quotes before links, so something like [[foo ''bar'']] becomes [[foo <i>bar</i>]] after parsing quotes, which is wrong. But if we parsed links before quotes, things like ''[[foo bar|formatting ''here'']]'' would break, because parsing links removes the whole link and replaces it by a placeholder; we get ''<!--LINK 0-->'' with no chance to parse all the quotes together. One idea for fixing this is to use more flexible link placeholders that can contain text. So ''[[foo bar|formatting ''here'']]'' becomes ''<placeholder i="0">formatting ''here''</placeholder>'', and we can then parse the quotes. Text is only put inside the placeholders for piped links. This patch creates a new class called Placeholders implementing what I described above. Parser.php currently implements several different placeholder mechanisms, so the new class is intended to be a step toward organizing them. With the patch, internal links are parsed before quotes. The patch fixes the test case for this bug without breaking any of the other parser tests. It appears to be a little slower than without the patch on pages with lots of links, but I haven't attempted to optimize it.
Created attachment 2240 [details] intrusive fix using new Placeholders class To summarize the problem: we currently parse quotes before links, so something like [[foo ''bar'']] becomes [[foo <i>bar</i>]] after parsing quotes, which is wrong. But if we parsed links before quotes, things like ''[[foo bar|formatting ''here'']]'' would break, because parsing links removes the whole link and replaces it by a placeholder; we get ''<!--LINK 0-->'' with no chance to parse all the quotes together. One idea for fixing this is to use more flexible link placeholders that can contain text. So ''[[foo bar|formatting ''here'']]'' becomes ''<placeholder i="0">formatting ''here''</placeholder>'', and we can then parse the quotes. Text is only put inside the placeholders for piped links. This patch creates a new class called Placeholders implementing what I described above. Parser.php currently implements several different placeholder mechanisms, so the new class is intended to be a step toward organizing them. With the patch, internal links are parsed before quotes. The patch fixes the test case for this bug without breaking any of the other parser tests. It appears to be a little slower than without the patch on pages with lots of links, but I haven't attempted to optimize it.
(In reply to comment #0) > I assume that bold / italic rendering should be done *after* title recognition; > that title recognition should *not* be influenced (should not depend on) by bold > / italic rendering wiki syntax. I'm not sure about that. In particular, if one word in a title would be italicized in normal prose, and another wouldn't, then something like [[Criticism of ''Harry Potter'']] would likely be intended to be interpreted as [[Criticism of Harry Potter|Criticism of ''Harry Potter'']], not [[Criticism of ''Harry Potter''|Criticism of ''Harry Potter'']]. Do we want to assume the latter? Why? A good patch should allow either to be typed out in the full [[link|display]] form regardless, so this should be a relatively minor point. (I know Neapolitan wants to link to '' a lot, but those problems extend beyond links.) (In reply to comment #9) > One idea for fixing this is to use more flexible link placeholders that can > contain text. So ''[[foo bar|formatting ''here'']]'' becomes ''<placeholder > i="0">formatting ''here''</placeholder>'', and we can then parse the quotes. > Text is only put inside the placeholders for piped links. Clever solution. If we want to parse the display text, don't hide it! I'm surprised no bugs crop up when you shift around the parsing order like that, though; the patch will need to be tested on a wide range of pages.
(Sorry for the double post and poor formatting above.) (In reply to comment #10) > [[Criticism of ''Harry Potter'']] would likely be intended to be interpreted as > [[Criticism of Harry Potter|Criticism of ''Harry Potter'']], not [[Criticism of > ''Harry Potter''|Criticism of ''Harry Potter'']]. Do we want to assume the > latter? Why? A good patch should allow either to be typed out in the full > [[link|display]] form regardless, so this should be a relatively minor point. With my approach it's possible to have either interpretation, but after thinking about it I chose the latter because 1) it matched the existing parser test case and 2) I think the former wouldn't allow straightforward linking to an article that happened to have four quotes in the title, without putting unwanted formatting in the link. I think it's logical and in line with existing practice to insist that if formatting is desired within a link, it must be a piped link. I don't feel strongly about this, though. > surprised no bugs crop up when you shift around the parsing order like that, > though; the patch will need to be tested on a wide range of pages. Indeed. As I mentioned, the patch passes the existing tests. The switch in style from <!--LINK 0--> to <placeholder i="0"></placeholder> was to resolve one issue that came up with definition lists. We have to be careful to ignore colons within links in cases like ;[[foo|bar: baz]]: definition Fortunately, there was already a mechanism to skip colons within matching pairs of tags.
Created attachment 2245 [details] Updated fix that corrects undefined variable notices
*** Bug 3749 has been marked as a duplicate of this bug. ***
*** Bug 11707 has been marked as a duplicate of this bug. ***
Fixed by r68491