Last modified: 2013-09-19 02:56:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 8161 - Syntax for stripping HTML and wiki markup
Syntax for stripping HTML and wiki markup
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Extensions requests (Other open bugs)
unspecified
All All
: Low enhancement with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://www.mediawiki.org/wiki/Extensi...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-05 16:35 UTC by Omegatron
Modified: 2013-09-19 02:56 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
strip markup extension (1.34 KB, text/plain)
2006-12-07 16:58 UTC, Steve Sanbeg
Details

Description Omegatron 2006-12-05 16:35:07 UTC
Similar to {{urlencode: }}, I'd like a parserfunction for stripping wikimarkup
and HTML from text.  For instance:

The quick brown fox             --> The quick brown fox
The [[quick]] [[brown]] [[fox]] --> The quick brown fox

CO<sub>2</sub>                  --> CO2

My specific application is for generating machine-readable COinS tags from
citation templates.  For instance, if someone cites the book:

title = [[Aristotle for Everybody]]: Difficult Thought Made Easy
edition = 6<sup>th</sup> edition

which we have an article for, it shows up in the citation template with a link,
which is great.  But in the machine-readable citation information, it needs to
become plain text:

Aristotle for Everybody: Difficult Thought Made Easy
6th edition

This would also be useful for templates where parameters need to be linked in
one place but not in another, are linked by the template itself, but people
often link their parameters by accident, etc.  It might be useful for automated
linking to section anchors with markup, too?

== Test with <sub>sub</sub> and <sup>sup</sup> ==

has the anchor

#Test_with_sub_and_sup

for instance.

I'm sure there are many other template-related functions that would be helped by
this, too.
Comment 1 Rob Church 2006-12-05 17:35:54 UTC
I'd be concerned about the time this might require on large chunks of text.
Comment 2 Omegatron 2006-12-05 17:44:15 UTC
(In reply to comment #1)
> I'd be concerned about the time this might require on large chunks of text.

If that's a limitation, could it just be limited to short strings?  Does the
urlencode function have the same problem?
Comment 3 Rob Church 2006-12-05 17:44:51 UTC
URL-encoding is less work.
Comment 4 Omegatron 2006-12-05 17:56:31 UTC
Does a similar function already exist for section anchors?
Comment 5 Rob Church 2006-12-05 18:41:10 UTC
Yes, but there'd still be the potential for some moron to shove a load of
wikitext into the parser function and increase the amount of processing time.

I could just be being paranoid, of course; Tim Starling's probably the best
person to consult about this...
Comment 6 Omegatron 2006-12-05 19:29:36 UTC
(In reply to comment #5)
> Yes, but there'd still be the potential for some moron to shove a load of
> wikitext into the parser function and increase the amount of processing time.

Yeah.  The applications I'm imagining are only short snippets of text, though,
so limiting it to 100 characters or so per instance would be fine.

But then do you have to worry about many multlipe instances?

> I could just be being paranoid, of course; Tim Starling's probably the best
> person to consult about this...

Yes, I was mentioning the urlencode and anchor name functions so that their
processing time and server impact could be compared.
Comment 7 Steve Sanbeg 2006-12-05 19:41:17 UTC
Image alt text may be a better comparison.  i.e [[Image:wiki.png|some text]]
will parse "some text" for the caption, then strip the tags for the alt text.  

I don't think you can directly strip wiki markup, so it would seem a bit
wasteful to parse that just to discard the results, but I don't think it would
be that much slower than normal parsing.
Comment 8 Omegatron 2006-12-07 06:30:56 UTC
(In reply to comment #7)
> Image alt text may be a better comparison.  i.e [[Image:wiki.png|some text]]
> will parse "some text" for the caption, then strip the tags for the alt text.  

Oh.  You mean like:

[[Image:Ant.jpg|thumb|Here is an [[ant]] with {{carbon}}{{oxygen|2}} and
3.63&times;10<sup>24</sup> things]]

will have alt text of:

Here is an ant with CO2 and 3.63×1024 things

I hadn't thought of that.  So, in actuality, we already have a function that
does *exactly* what I'm looking for?

We've had it for years, it's in use on a very large number of articles, multiple
times each, and any moron can come along and put inordinate amounts of complex
wikicode into it (http://en.wikipedia.org/wiki/User:Omegatron/Sandbox) and no
one's ever complained about it causing server load problems?

:-)

How easy would it be to make this into a user-accessible ParserFunction?
Comment 9 Steve Sanbeg 2006-12-07 16:28:09 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Image alt text may be a better comparison.  i.e [[Image:wiki.png|some text]]
> > will parse "some text" for the caption, then strip the tags for the alt text.  
> 
> Oh.  You mean like:
> 
> [[Image:Ant.jpg|thumb|Here is an [[ant]] with {{carbon}}{{oxygen|2}} and
> 3.63&times;10<sup>24</sup> things]]
> 
> will have alt text of:
> 
> Here is an ant with CO2 and 3.63×1024 things
> 
> I hadn't thought of that.  So, in actuality, we already have a function that
> does *exactly* what I'm looking for?
> 
> We've had it for years, it's in use on a very large number of articles, multiple
> times each, and any moron can come along and put inordinate amounts of complex
> wikicode into it (http://en.wikipedia.org/wiki/User:Omegatron/Sandbox) and no
> one's ever complained about it causing server load problems?
> 
> :-)

Yeah, that's my thought.

> 
> How easy would it be to make this into a user-accessible ParserFunction?

Shouldn't be too hard.  I don't think a parserfunction, though, since it's
harder to pass arbitrary text to them, and it would return text anyway. 
Something like

<stripmarkup>Here is an [[ant]] with {{carbon}}{{oxygen|2}} and
3.63&times;10<sup>24</sup> things</stripmarkup>

would seem reasonable.
Comment 10 Omegatron 2006-12-07 16:52:28 UTC
(In reply to comment #9)
> Shouldn't be too hard.  I don't think a parserfunction, though, since it's
> harder to pass arbitrary text to them, and it would return text anyway. 

I'm not sure what you mean by this, but a stripmarkup tag (or something shorter
to type) would make me just as happy.  Just as long as I can do things like
<strip>{{{parameter}}}</strip> inside a template.
Comment 11 Steve Sanbeg 2006-12-07 16:58:08 UTC
Created attachment 2831 [details]
strip markup extension

I thank that's a bit simpler to add random text, since you don't have to worry
about something like a stray | terminating the argument.

Here's a quick extension I just put together.
Comment 12 Omegatron 2006-12-07 17:10:19 UTC
(In reply to comment #11)
> I thank that's a bit simpler to add random text, since you don't have to worry
> about something like a stray | terminating the argument.

Very good point.  I agree that the pseudo-html tags are better.
Comment 13 Minh Nguyễn 2007-02-01 05:03:49 UTC
Changed summary from "ParserFunction for stripping HTML and wiki markup" to
"Syntax for stripping HTML and wiki markup" to reflect Attachment #2831 [details].
Comment 14 Omegatron 2007-04-23 23:34:02 UTC
Not to clutter up this bug, but are there plans for testing this/implementing it
on en?
Comment 15 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-04-24 00:16:21 UTC
Note that due to bug 2257, I believe this patch would not presently work for
template parameters, the intended use.  Please correct me if I'm wrong.
Comment 16 Steve Sanbeg 2007-04-24 15:12:45 UTC
(In reply to comment #15)
> Note that due to bug 2257, I believe this patch would not presently work for
> template parameters, the intended use.  Please correct me if I'm wrong.

Most of the examples are like <strip>{{thing}}</strip>, which would work fine;
but I see there is one example like <strip>{{{thing}}}</strip>, which wouldn't
work with the XML tag, but should be doable with a parser function.

Comment 17 Omegatron 2007-04-24 15:40:56 UTC
(In reply to comment #16)
> Most of the examples are like <strip>{{thing}}</strip>, which would work fine;
> but I see there is one example like <strip>{{{thing}}}</strip>, which wouldn't
> work with the XML tag, but should be doable with a parser function.

All of the things I want to use this for are inside templates, like the
<strip>{{{thing}}}</strip> style.
Comment 18 BlindWanderer 2009-05-14 12:05:12 UTC
*necromancy*
I contribute to a third party and we use tooltips to enhance the user experience. The problem is that they are an attribute, so all wiki markup has to be processed and all resulting HTML markup stripped. This wouldn't be a problem if we weren't using complex templates and Extension:VariablesExtension.

Here is an example page:
https://wiki.secondlife.com/wiki/PRIM_TEXTURE

Its annoying to have to supply and handle alternate text. I'd be more than willing to limit the execution time of this function if it could reduce the complexity of our code.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links