Last modified: 2013-09-19 02:56:53 UTC
Similar to {{urlencode: }}, I'd like a parserfunction for stripping wikimarkup and HTML from text. For instance: The quick brown fox --> The quick brown fox The [[quick]] [[brown]] [[fox]] --> The quick brown fox CO<sub>2</sub> --> CO2 My specific application is for generating machine-readable COinS tags from citation templates. For instance, if someone cites the book: title = [[Aristotle for Everybody]]: Difficult Thought Made Easy edition = 6<sup>th</sup> edition which we have an article for, it shows up in the citation template with a link, which is great. But in the machine-readable citation information, it needs to become plain text: Aristotle for Everybody: Difficult Thought Made Easy 6th edition This would also be useful for templates where parameters need to be linked in one place but not in another, are linked by the template itself, but people often link their parameters by accident, etc. It might be useful for automated linking to section anchors with markup, too? == Test with <sub>sub</sub> and <sup>sup</sup> == has the anchor #Test_with_sub_and_sup for instance. I'm sure there are many other template-related functions that would be helped by this, too.
I'd be concerned about the time this might require on large chunks of text.
(In reply to comment #1) > I'd be concerned about the time this might require on large chunks of text. If that's a limitation, could it just be limited to short strings? Does the urlencode function have the same problem?
URL-encoding is less work.
Does a similar function already exist for section anchors?
Yes, but there'd still be the potential for some moron to shove a load of wikitext into the parser function and increase the amount of processing time. I could just be being paranoid, of course; Tim Starling's probably the best person to consult about this...
(In reply to comment #5) > Yes, but there'd still be the potential for some moron to shove a load of > wikitext into the parser function and increase the amount of processing time. Yeah. The applications I'm imagining are only short snippets of text, though, so limiting it to 100 characters or so per instance would be fine. But then do you have to worry about many multlipe instances? > I could just be being paranoid, of course; Tim Starling's probably the best > person to consult about this... Yes, I was mentioning the urlencode and anchor name functions so that their processing time and server impact could be compared.
Image alt text may be a better comparison. i.e [[Image:wiki.png|some text]] will parse "some text" for the caption, then strip the tags for the alt text. I don't think you can directly strip wiki markup, so it would seem a bit wasteful to parse that just to discard the results, but I don't think it would be that much slower than normal parsing.
(In reply to comment #7) > Image alt text may be a better comparison. i.e [[Image:wiki.png|some text]] > will parse "some text" for the caption, then strip the tags for the alt text. Oh. You mean like: [[Image:Ant.jpg|thumb|Here is an [[ant]] with {{carbon}}{{oxygen|2}} and 3.63×10<sup>24</sup> things]] will have alt text of: Here is an ant with CO2 and 3.63×1024 things I hadn't thought of that. So, in actuality, we already have a function that does *exactly* what I'm looking for? We've had it for years, it's in use on a very large number of articles, multiple times each, and any moron can come along and put inordinate amounts of complex wikicode into it (http://en.wikipedia.org/wiki/User:Omegatron/Sandbox) and no one's ever complained about it causing server load problems? :-) How easy would it be to make this into a user-accessible ParserFunction?
(In reply to comment #8) > (In reply to comment #7) > > Image alt text may be a better comparison. i.e [[Image:wiki.png|some text]] > > will parse "some text" for the caption, then strip the tags for the alt text. > > Oh. You mean like: > > [[Image:Ant.jpg|thumb|Here is an [[ant]] with {{carbon}}{{oxygen|2}} and > 3.63×10<sup>24</sup> things]] > > will have alt text of: > > Here is an ant with CO2 and 3.63×1024 things > > I hadn't thought of that. So, in actuality, we already have a function that > does *exactly* what I'm looking for? > > We've had it for years, it's in use on a very large number of articles, multiple > times each, and any moron can come along and put inordinate amounts of complex > wikicode into it (http://en.wikipedia.org/wiki/User:Omegatron/Sandbox) and no > one's ever complained about it causing server load problems? > > :-) Yeah, that's my thought. > > How easy would it be to make this into a user-accessible ParserFunction? Shouldn't be too hard. I don't think a parserfunction, though, since it's harder to pass arbitrary text to them, and it would return text anyway. Something like <stripmarkup>Here is an [[ant]] with {{carbon}}{{oxygen|2}} and 3.63×10<sup>24</sup> things</stripmarkup> would seem reasonable.
(In reply to comment #9) > Shouldn't be too hard. I don't think a parserfunction, though, since it's > harder to pass arbitrary text to them, and it would return text anyway. I'm not sure what you mean by this, but a stripmarkup tag (or something shorter to type) would make me just as happy. Just as long as I can do things like <strip>{{{parameter}}}</strip> inside a template.
Created attachment 2831 [details] strip markup extension I thank that's a bit simpler to add random text, since you don't have to worry about something like a stray | terminating the argument. Here's a quick extension I just put together.
(In reply to comment #11) > I thank that's a bit simpler to add random text, since you don't have to worry > about something like a stray | terminating the argument. Very good point. I agree that the pseudo-html tags are better.
Changed summary from "ParserFunction for stripping HTML and wiki markup" to "Syntax for stripping HTML and wiki markup" to reflect Attachment #2831 [details].
Not to clutter up this bug, but are there plans for testing this/implementing it on en?
Note that due to bug 2257, I believe this patch would not presently work for template parameters, the intended use. Please correct me if I'm wrong.
(In reply to comment #15) > Note that due to bug 2257, I believe this patch would not presently work for > template parameters, the intended use. Please correct me if I'm wrong. Most of the examples are like <strip>{{thing}}</strip>, which would work fine; but I see there is one example like <strip>{{{thing}}}</strip>, which wouldn't work with the XML tag, but should be doable with a parser function.
(In reply to comment #16) > Most of the examples are like <strip>{{thing}}</strip>, which would work fine; > but I see there is one example like <strip>{{{thing}}}</strip>, which wouldn't > work with the XML tag, but should be doable with a parser function. All of the things I want to use this for are inside templates, like the <strip>{{{thing}}}</strip> style.
*necromancy* I contribute to a third party and we use tooltips to enhance the user experience. The problem is that they are an attribute, so all wiki markup has to be processed and all resulting HTML markup stripped. This wouldn't be a problem if we weren't using complex templates and Extension:VariablesExtension. Here is an example page: https://wiki.secondlife.com/wiki/PRIM_TEXTURE Its annoying to have to supply and handle alternate text. I'd be more than willing to limit the execution time of this function if it could reduce the complexity of our code.