Last modified: 2014-09-26 01:06:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 6455 - Set $wgPFEnableStringFunctions = true on WMF wikis
Set $wgPFEnableStringFunctions = true on WMF wikis
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Highest enhancement with 86 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://www.mediawiki.org/wiki/Extensi...
:
: 7654 9023 9979 10231 13895 15658 31136 (view as bug list)
Depends on: 26092
Blocks: 29087
  Show dependency treegraph
 
Reported: 2006-06-26 22:04 UTC by Aryeh Gregor (not reading bugmail, please e-mail directly)
Modified: 2014-09-26 01:06 UTC (History)
55 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
New version of string functions (26.59 KB, patch)
2009-03-30 11:07 UTC, Robert Rohde
Details
Merge string functionality into ParserFunctions (11.17 KB, patch)
2009-04-06 07:44 UTC, Robert Rohde
Details

Description Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-26 22:04:46 UTC
Uses of string-related functions would doubtless be myriad.  One in particular
that would be handy, from my point of view, is checking whether input to a
hatnote contains "[[" and "]]": if it does, then the wrapping "[[" and "]]"
could be dropped.  This is useful because not requiring [[]] in the parameter
input has become the norm on enwiki, and this makes it impossible to replace an
intended single link with "[[Link1]] or [[Link2]]" due to parsing oddness.  If
StringFunctions were installed, "[[Link1]] or [[Link2]]" could be used as the
parameter, and the template would drop the default brackets.

Okay, so perhaps that's a slightly pathetic reason, but there's no request open
for this yet, so let others add their own reasons.
Comment 1 Polonium 2006-10-17 23:50:49 UTC
There apparantly is a problem with these functions in MediaWiki 8.0, but these
where fixed. The current version is 9.0, so they would need to be tested first.
However, this problem was solved (see
[http://meta.wikimedia.org/wiki/Talk:StringFunctions#Update_for_1.8.0 this]) and
since these functions would be very useful, they should be added as soon as
possible.
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-10-21 23:25:46 UTC
*** Bug 7654 has been marked as a duplicate of this bug. ***
Comment 3 jquinn 2007-01-10 17:58:40 UTC
These would be very useful for a number of tasks - above all #strlen and
#substr. There has been much concern that functions such as repeat and
regexp-related functions could be above O[n] on input length and thus usable for
DOS attacks. However, these functions are NOT included in StringFunctions; all
the functions there are O[n] and, while they would contribute to server load,
would not break caching and thus would be an incremental increase.

Please please include these. I can think of several places where they would be
immediately useful, not just as toys - above all for any task involving
formatting, as mentioned above.
Comment 4 Polonium 2007-01-11 21:42:45 UTC
I totally agree with the above comment. The most important functions to add are
strlen and substr. Several other functions could be derived from them, and they
would be useful for many applications. Since the are DOS safe and O(n), I cannot
see any reason not to install them. They should be installed right away (the
code already exists). Beyond this, other functions, like the
[[m:VariablesExtension|variables]] could be added.
Comment 5 Polonium 2007-01-12 20:10:25 UTC
There already is a way to find the length of a string, but it is very limited
and is a hack instead of a proper solution. See
http://en.wikipedia.org/wiki/Template:Strlen
Comment 6 Alon Lischinsky 2007-01-12 21:54:46 UTC
(In reply to comment #5)
> There already is a way to find the length of a string, but it is very limited
> and is a hack instead of a proper solution. See
> http://en.wikipedia.org/wiki/Template:Strlen

But it works only for ASCII-only strings, due to padright: and the like
measuring byte-length rather than apparent number of characters.
Comment 7 jquinn 2007-01-14 00:38:44 UTC
See my comments at the end of [[m:Talk:Stringfunctions#Proposals]] for why some
of these are NOT currently safely O(n) and how that could be easily fixed.
Comment 8 Antoine "hashar" Musso (WMF) 2007-01-14 00:40:56 UTC
jquinn > please paste your comment here. The above link does not
work and might be deleted one day.
Comment 9 Polonium 2007-01-14 01:45:42 UTC
(In reply to comment #8)
> jquinn > please paste your comment here. The above link does not
> work and might be deleted one day.

The correct link is to [[Talk:StringFunctions#Proposals]]. Go and view it, it is
too long to be posted directly. Still, strlen and substr are totally O(n) and
are the most important string functions. That is why the consensus is to install
them now and install other functions later.
Comment 10 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-01-14 02:23:28 UTC
The quote is:

"See the page - pad is O(10^n) and replace is O(n^2). Suggest limiting the "from" and "to" values 
of replace and the "delimiter" of explode to 30 characters in length (as with pos - I would 
support anything from 10 to 30 characters, I'd vote against anything outside this range) and the 
length value of pad to 99. I'd also recommend limiting "value" or "string" for everything EXCEPT 
len and sub to some length limit in the range of .5K - 5K, preferably 1K. Even O(n) is a DOS 
attack if it is easy to make n be a 100K page templated from somewhere. And finally, even with 
all these limits, it would be easy to make a "replace" call that returned a 30K-long string - 
there needs to be a further limit targeted to the output of replace (and possibly urlencode?). 
For ease of programming (just check limits then expose the PHP, rather than building your own 
function), this could be that len(to)*len(value)/len(from) can't be over twice the limit on 
len(value) - that is, the "to" can only be twice as long as the "from" unless you're sure that 
your "value" string is a fraction of the length limit. The "len" in question is byte-length, ie 2 
for each standard-codepage unicode char and 1 for each ascii."

Pad is O(10^n)?  o_O
Comment 11 Polonium 2007-01-14 17:07:24 UTC
When the StringFunctions are installed, string length should be measured in
characters, not bytes, so that they work with unicode characters.
Comment 12 jquinn 2007-01-24 09:59:43 UTC
The really correct link is [[m:Talk:StringFunctions#Proposals]]. Dratted
case-sensitivity. 

Yes, pad is O(10^n) because n is the length of the argument - the number of
digits, not the value. But by the same token, you can limit n to just 2 or 3 and
still get 99 or 999 characters of padding.

Note that the issue is the memory & processor usage of the internal PHP call,
because on the level of the visible wiki the template buffer is limited anyway.
Comment 13 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-01-24 15:53:11 UTC
The maximum pad length permitted in current code is 500 characters, so it's O(1) worst-case.
Comment 14 Juraj Simlovic 2007-01-24 16:21:37 UTC
I shall limit the length of input parameters of the features in question ASAP.
Then, the posibility of DOS attack through them will be removed.
Comment 15 Jared 2007-02-03 21:48:27 UTC
This set of Functions is very useful, and so I advocate for the approval of this
proposed change.
Comment 16 Juraj Simlovic 2007-02-04 19:06:14 UTC
All parser functions in question has been finally limited in the latest code.
Also, full support for utf-8 was added (and other small fixes/improvements).
Comment 17 Bogumił "A_Bach" Cieniek 2007-02-04 21:24:37 UTC
Those functions will be wery helpfull in wiktionary.
Comment 18 hillgentleman 2007-02-05 17:31:50 UTC
#sub  would  be very useful for poetry annotations. (C.f. [[w:zh:WP:VPT]]).
Comment 19 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-05 17:43:02 UTC
We're aware that it will be useful.  It needs to be reviewed by Tim or someone.  Please do not 
comment on this bug just to say you support it; there's a "vote for this bug" link near the bottom 
of the page for that that *doesn't* spam everyone who's voted for the bug plus half the developers 
with your support.

A couple of comments: 1) urlencode and urldecode are now available as core parser functions, so they 
don't need to be part of this.  2) It should fail gracefully if mbstring is not installed.  Either 
it should define workalikes for mbstring functions it uses, or fall back to non-UTF-8 versions, or 
just die if mbstring isn't installed.
Comment 20 百楽兎 2007-02-06 03:58:26 UTC
Agree this vote. Actually I plan to make a template for Chinese Wikipedia and Wikisource, and this 
fuction will be useful.
Comment 21 Jared 2007-02-06 15:09:26 UTC
As I'm unfamiliar with this site, how long will it take for this to be 
approved? Are there steps that need to be taken? The code is already 
written, so it shouldn't take too long.
Comment 22 Juraj Simlovic 2007-02-06 20:34:47 UTC
> It should fail gracefully if mbstring is not installed.

You are right about the mbstring.. It is a hidden prerequisite to the extension
right
now. However, if you do not have mbstring support at hand, all you have to do is to
replace all occurences of "mb_" with an empty string all over the source code..

All wikipedia servers does have mbstrings installed, doesn't they?
Comment 23 Rob Church 2007-02-06 21:17:39 UTC
(In reply to comment #22)
> now. However, if you do not have mbstring support at hand, all you have to do
is to
> replace all occurences of "mb_" with an empty string all over the source code..

This is not how we like our code to work; we like our code so that people don't
have to hack at it to make it work. Requiring that mbstring functions be
installed for the extension to work is acceptable.

As far as I know, mbstring is available throughout the Wikimedia cluster.
Comment 24 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-06 22:58:43 UTC
(In reply to comment #21)
> As I'm unfamiliar with this site, how long will it take for this to be 
> approved? Are there steps that need to be taken?

Basically, someone needs to get on IRC and nag Tim Starling (or Brion, but he tends to be more 
overworked).  Or you could e-mail him, I suppose.  It just needs to be brought to his attention so 
he can review it.
Comment 25 Rob Church 2007-02-18 16:10:22 UTC
*** Bug 9023 has been marked as a duplicate of this bug. ***
Comment 26 fdcn 2007-04-13 08:24:24 UTC
waitting
Comment 27 Rob Church 2007-04-13 14:56:00 UTC
(In reply to comment #26)
> waitting

I am aware that users are waiting for a response to this, but it isn't as
straight forward as might be desired; enabling an extension like this requires a
thorough review in terms of suitability for inclusion, and also a full
performance test. Some patience is therefore required; that patience might need
to stretch to several months, but that's the way of the world.

Any further comments on this bug are to be restricted to discussing the
technical issues raised above, and not, "we're waiting".
Comment 28 Juraj Simlovic 2007-04-13 18:39:03 UTC
I was quick-searching the above comments and all technical issues seem to be
resolved and closed by now, aren't they? I hope that there is nothing forgotten
currently. If I am wrong, please, draw my attention. Otherwise I shall just
continue to wait, 'till it is approved; or 'till Tim gets to me with new issues.
Comment 29 Raimond Spekking 2007-05-20 16:07:22 UTC
*** Bug 9979 has been marked as a duplicate of this bug. ***
Comment 30 Arath 2007-05-23 11:56:57 UTC
Another month has passed since comment #26, but the string functions have not been installed yet. What does "several months" mean?
Comment 31 Eno1 2007-05-27 14:55:16 UTC
Can someone please ask whoever needs to be asked to make these functions available on wikipedia as a matter of prioity?

There are more legimatate uses for these than you think. Most important are #sub, #pos and #len.
Comment 32 Duken 2007-05-30 17:28:00 UTC
I'm waiting for #sub and #len too, for Wiktionary Templates. It really would help me, so... I really hope you'll fix this bug. ^^'
Comment 33 Raimond Spekking 2007-06-12 19:36:16 UTC
*** Bug 10231 has been marked as a duplicate of this bug. ***
Comment 34 百楽兎 2007-09-19 06:58:20 UTC
Is this request still alive? Any new information regarding this issue?
Comment 35 Tisza Gergő 2007-11-01 14:25:25 UTC
#rpos would also be useful, especially in [[agglutinative language]]s with [[vowel harmony]] where you can't just put a number or date between two words and get a grammatically correct sentence, but need to select the matching suffix too.
Comment 36 Tim Starling 2007-11-14 04:35:29 UTC
Can someone just add these functions to the ParserFunctions extension please? 
Comment 37 Tim Starling 2007-11-15 03:25:49 UTC
On second thoughts: presumably these functions, especially substr, would have the potential to destroy strip markers, and even to capture the marker prefix and make fraudulent markers. That is a bug.
Comment 38 Juraj Simlovic 2007-11-15 15:22:52 UTC
> the potential to destroy strip markers

And how do we fix that?
Comment 39 Steve Sanbeg 2007-11-15 17:36:15 UTC
(In reply to comment #38)
> > the potential to destroy strip markers
> 
> And how do we fix that?
> 

Probably, a substr operation would be something like; get input wiki string, parse & convert to HTML, strip HTML to plain text, take substring of text.

a substring probably isn't meaningful, or at least not simple, on something other than plain text.
Comment 40 Ross M 2007-11-16 08:54:25 UTC
(In reply to comment #37)
> On second thoughts: presumably these functions, especially substr, would have
> the potential to destroy strip markers, and even to capture the marker prefix
> and make fraudulent markers. That is a bug.
> 

If this is the case, then surely it's not a bug with this extension, but rather with the extension parser, isn't it?  If third-party extensions should never be permitted to affect strip markers, then those markers clearly shouldn't be provided to them.
Comment 41 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-11-16 19:13:01 UTC
Some extensions may wish to fiddle with strip markers, for some reason.  This is not one of them.
Comment 42 Ross M 2007-11-17 02:01:41 UTC
After looking further through the code, I have to ask: why would having the potential to alter strip markers qualify as a bug?  As far as I can tell, creating a fraudulent marker would do nothing apart from putting some unpleasant gibberish in the output, and removing a marker altogether would simply prevent that marker's content from being displayed.  Neither of these outcomes appear to break the code in any meaningful way.  What am I missing here?
Comment 43 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-11-17 23:54:49 UTC
Input: {{#sub:<nowiki>Hello!!!!!!111</nowiki>|10}}

Expected output: Either "<nowiki>He" or "<nowiki>Hello!!!!!111</nowiki>" or "Hello!!!!!" (not sure which, but something like one of those)

Actual output: ef021ae6742-nowiki-00000013-QINU (notice the 0x7 byte at the end, for good measure!)

That looks like a bug.  Shouldn't be difficult at all to fix.
Comment 44 Tim Starling 2007-11-18 08:51:26 UTC
(In reply to comment #41)
> Some extensions may wish to fiddle with strip markers, for some reason.  This
> is not one of them.

#if, for example, "fiddles" with strip markers, by passing through the strip markers in the followed branch and discarding the ones in the non-followed branch. It's not an exotic scenario.

Yes, you could unstrip the string and then apply the substring operation, but that would cause the output to be escaped. So it would destroy the output of tags such as <gallery> -- in fact pretty much all the extension tags other than <nowiki>. And if you unstripped <nowiki>, the content escaped by <nowiki> would be erroneously parsed by the main parser stage, defeating its purpose. 

The solution is to identify strip markers, and then to skip them when you are counting characters. But that might be inefficient. 
Comment 45 Ross M 2007-11-19 01:21:11 UTC
Yes, that's inefficient.  Since PHP's mb_string methods operate in O(n) time, stepping through each string character by character would make these methods O(n^2) rather than O(n).  Once PHP6 is rolled out, we'll be able to use TextIterator to perform this operation efficiently.  But do we really need to wait until then before adding these functions?  Allowing the occasional marker to be corrupted won't break the system, and the benefits far outweigh this minor flaw.
Comment 46 Tim Starling 2007-11-21 11:50:14 UTC
You could just use the preg_match_all /./u trick, that would be O(n). Or you could split the string at the markers with preg_split(), and then loop through the fragments, counting characters with mb_strlen(). Or use an strpos() loop for marker identification instead of preg_split() to conserve memory -- maybe that'd even be faster. There's lots of ways to do it, there's no need to throw your hands up and wait for PHP 6.
Comment 47 Brion Vibber 2007-11-30 21:14:15 UTC
My own inclination would be to unstrip markers and do a wiki-text-escape on output... possibly with HTML tag stripping on the expanded markers first. (Eg, a big ol' table will be reduced to the text contained in it.)

Is there a reason one would want string functions to output non-plaintext or otherwise that this wouldn't be appropriate?
Comment 48 Steve Sanbeg 2007-11-30 22:06:33 UTC
(In reply to comment #47)
> My own inclination would be to unstrip markers and do a wiki-text-escape on
> output... possibly with HTML tag stripping on the expanded markers first. (Eg,
> a big ol' table will be reduced to the text contained in it.)
> 
> Is there a reason one would want string functions to output non-plaintext or
> otherwise that this wouldn't be appropriate?
> 

Yeah, that was my thought.  Transplant a few LOC from http://www.mediawiki.org/wiki/Extension:Strip_Markup, maybe implement the stripping as a degenerate case of #substr.  It would kill two bugs with one stone, and the string functions don't even imply it can do more than that.
Comment 49 Ross M 2007-11-30 23:52:34 UTC
(In reply to comment #47)
> My own inclination would be to unstrip markers and do a wiki-text-escape on
> output... possibly with HTML tag stripping on the expanded markers first. (Eg,
> a big ol' table will be reduced to the text contained in it.)
> 
> Is there a reason one would want string functions to output non-plaintext or
> otherwise that this wouldn't be appropriate?
> 

I can think of situations where one might want to splice wikiformatting into a string, or to simply trim a formatted string by a few characters.  There's no need to convert to plaintext in order to solve this bug; Tim's suggestion to overlook markers works perfectly well.

I've coded up an alpha version of these functions which employs preg_match_all to split the string into characters while keeping strip markers separate.  It's up and running at http://www.undefined.net/w/index.php?title=User:Algorithm/StringFunctions2 -- feedback is welcome.
Comment 50 Ross M 2007-12-10 10:03:03 UTC
StringFunctions 2.0 has just been released on MediaWiki.  This version completely fixes the strip marker problem, and removes the reliance on mb_string as well.  Unless there are any other flaws in the implementation, it should be ready to install.
Comment 51 Anders Einar Hilden 2008-03-14 21:07:11 UTC
4 months, what's happening? We could really need the functions in some complicated date-templates on no.wikipedia.
Comment 52 Raimond Spekking 2008-05-02 17:26:41 UTC
*** Bug 13895 has been marked as a duplicate of this bug. ***
Comment 53 Wersad Daverbelt 2008-06-11 03:56:24 UTC
Question: Who exactly is the person who could implement this, and what exactly is he waiting for?

Wikipedia could seriously use these functions.
Comment 54 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-06-11 21:58:26 UTC
(In reply to comment #53)
> Question: Who exactly is the person who could implement this

Anyone with commit access, if the course of action is (as Tim has suggested) merging them into ParserFunctions.

> and what exactly is he waiting for?

Remarkably, developers are not all automatons whose only goal in life is to serve Wikipedia template programmers.  Some may actually wish to spend their time on doing other things, maybe even not MediaWiki-related!  If you want this done, either ask nicely in recommended channels (e.g., wikitech-l or irc://irc.freenode.org/mediawiki, not Bugzilla), or get commit access and do it yourself
Comment 55 Daniel Friesen 2008-06-12 07:27:43 UTC
(In reply to comment #54)
> (In reply to comment #53)
> > Question: Who exactly is the person who could implement this
> 
> Anyone with commit access, if the course of action is (as Tim has suggested)
> merging them into ParserFunctions.

I don't think that's a very good idea. Just because someone wants StringFunctions, doesn't mean they want ParserFunctions. Those two extensions don't belong together.

If you want an extension that has both, then it should be a new extension, not merging of one extension's functionality from one extension into the other. That's one of the many purposes of the WikiCode extension I'm working on anyways:
http://svn.nadir-point.com/viewvc/mediawiki-extensions/trunk/WikiCode/
http://wiki-tools.com/wiki/WikiCode/Drafting
Comment 56 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-06-12 13:35:43 UTC
(In reply to comment #55)
> I don't think that's a very good idea. Just because someone wants
> StringFunctions, doesn't mean they want ParserFunctions. Those two extensions
> don't belong together.

You could argue that just because someone wants {{#expr}} doesn't mean they want {{#if}}, too.  I find it very unlikely, in fact, that anyone would want things like {{#strlen}} without {{#expr}}, because as soon as you're automatically generating numbers, you almost certainly want to do basic arithmetic with them.  ParserFunctions is an extension that adds advanced computational capabilities of various sorts to MediaWiki; if you only want a subset of that functionality, tell your users not to use the rest, or manually disable it.  If you really want, some config options could be added to disable parts of it, but first I'd be interested to hear of anyone who actually *wants* such a feature.
Comment 57 Fran Rogers 2008-08-18 22:21:03 UTC
Merged the functionality of StringFunctions into ParserFunctions in r39618. :)
Comment 58 Brion Vibber 2008-08-19 18:54:30 UTC
Reverted in r39653. These functions look *extremely* inefficient, for instance reimplementing mb_strlen() by apparently splitting the entire input string into an array of individual characters and counting up the elements.
Comment 59 Juraj Simlovic 2008-08-26 23:29:00 UTC
I finally got some free time to look into it and rewrite the functions into something more efficient. Is anyone working on it right now? Anyway, any comments are welcome, sooner rather than later. I will place my rewrite at   http://www.mediawiki.org/wiki/Extension:StringFunctions/Code, since I do not have svn ci access at wikimedia.
Comment 60 Daniel Friesen 2008-08-27 06:20:30 UTC
Updates have been committed as of r40068.

However, I'm not convinced it's good enough to put into ParserFunctions. It duplicates parser code in ugly ways, and uses preg_replace_all (Tim pokes me saying the _all is evil...)

I've actually been working on string handling inside of WikiCode. I to found the StringFunctions code extremely ugly. I actually implemented my own functions. They still need some tweaks, however I believe the code is far better than that inside of StringFunctions.
Comment 61 Juraj Simlovic 2008-08-29 17:26:40 UTC
Thank you for you input. Actually, you got me hooked, so I put together a simple benchmarking script of a few different implementations of the strlen: http://www.mediawiki.org/wiki/Extension:StringFunctions/Bench

Turns out that Brion was more than right about extreme inefficiency. The preg_match_all version, which splits the input into an array of individual chars can be more than hundred times slower than other implementations and can be more than thousands times slower than the native mb_strlen().

Now, rising the length of benching data put other implementations aside as well. So, here are the results for the best two only (plus simple mb_strlen() version for comparison):

  benching 256 loops of length: 1120kB
    runLen0: 2.3116    ..using mb_strlen() only
    runLen2: 18.6722   ..using preg_match_all() through markers
    runLen4: 10.3728   ..using strpos()

So, as it turns out, pregs, even when they are not abused, take some time to process the input. On the other side, using strpos() for counting markers seems to be quite efficient, considering the native mb_strlen() is only 5 times faster for the above bench case. I did more benches and found out that when the length of data is 112kB only, mb_strlen() is upto 10 times faster; when the length of data is 28kB, mb_strlen() is upto 12 times faster.

Any other ideas of how to implement such a simple thing as strlen()? ;))
Comment 62 Soroush 2008-09-19 20:57:43 UTC
Somebody wanted me to make a template in fa.wiktionary and I got informed that this Extension is required for there. I read its potential usage for en.wiki and I agree to its installation on en.wiki as well.
Comment 63 Raimond Spekking 2008-09-20 14:01:43 UTC
*** Bug 15658 has been marked as a duplicate of this bug. ***
Comment 64 Mike.lifeguard 2009-03-19 15:53:20 UTC
The extension isn't ready, so I've removed the shell keyword.
Comment 65 Robert Rohde 2009-03-30 11:07:10 UTC
Created attachment 5978 [details]
New version of string functions

The attached file is complete rewrite of the StringFunctions extension.

It implements the parser functions:

#len - string length
#pos - finding substring position
#rpos - reverse oriented #pos
#sub - fetch a substring specified by start and length
#replace - substring replacement
#explode - partition string by a delimiter and find a specific piece

The other functions, which are mostly already in the core, have been dropped.

In addition, I implemented it so that the unique markers generated by <nowiki>, <gallery>, <math>, etc. are universally stripped (this is a partial change in behavior from prior versions).  So the behavior will be more uniform and predictable than prior versions and there is no risk of partial or unexpected markers bleeding through.

Where possible PHP's built-in multi-byte string functions are used provide fast results.  If the mb_ functions are unavailable, their behavior is simulated in regex in order to provide a graceful (if slower) failure mode.

A global variable is used to define a hard limit for the size of a string to operate on.  I've set this 1000 characters for now, but I haven't experimented too much to decide what is reasonable or whether different limits should be enforced for different functions.  #replace is armored against replacements that would generate strings longer than this limit.

I believe that this version of StringFunctions (or something close to it) should be suitable for implementation on WMF sites.
Comment 66 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-03-30 13:22:51 UTC
(In reply to comment #65)
> The attached file is complete rewrite of the StringFunctions extension.

Shouldn't they just be added to ParserFunctions?

> In addition, I implemented it so that the unique markers generated by <nowiki>,
> <gallery>, <math>, etc. are universally stripped (this is a partial change in
> behavior from prior versions).  So the behavior will be more uniform and
> predictable than prior versions and there is no risk of partial or unexpected
> markers bleeding through.

I'm not sure if stripping them outright is the best solution, but I can't think of a better one.

> Where possible PHP's built-in multi-byte string functions are used provide fast
> results.  If the mb_ functions are unavailable, their behavior is simulated in
> regex in order to provide a graceful (if slower) failure mode.

We already have some of these compatibility functions in GlobalFunctions.php (mb_strlen and mb_substr).  You should use those, and add any additional ones there.
Comment 67 Ted Kandell 2009-03-30 16:34:54 UTC
A good use of the String functions would be to parse Newick tree format (Newick notation) files which is the standard way of minimally representing phylogenetic trees. Trees are now an important data structure in Wikipedia, and it's very difficult to edit these by hand and to get them to align and display properly. A simple {{newick}} template could then convert a Newick string into a properly displayed tree.

This may seem trivial compared to the other reasons, but just check the myriad ways that trees are now represented in MediaWiki. Having this template would allow trees to be created and edited in external tools and just dropped in.

I don't see any other way of parsing such a format without the String Functions.
Comment 68 Robert Rohde 2009-03-30 17:05:37 UTC
(In reply to comment #66)
> (In reply to comment #65)
> > The attached file is complete rewrite of the StringFunctions extension.
> 
> Shouldn't they just be added to ParserFunctions?

I'm happy to write it up that way instead, though I don't know which is preferred.  Given how long we and other sites have gone without working StringFunctions, it almost feels more natural to segregate them so that site operators have a choice.  

My main interest though is getting an implementation somewhere that is sufficiently reasonable that it can be used on the WMF sites.

> > Where possible PHP's built-in multi-byte string functions are used provide fast
> > results.  If the mb_ functions are unavailable, their behavior is simulated in
> > regex in order to provide a graceful (if slower) failure mode.
> 
> We already have some of these compatibility functions in GlobalFunctions.php
> (mb_strlen and mb_substr).  You should use those, and add any additional ones
> there.

Okay.  The one caveat is that my functions more or less assume they are being passed valid UTF-8 strings, and the encoding parameter for mb_strpos, etc. is not implemented.  It appears that mb_strlen in GlobalFunctions is making the same assumption, so I'll assume that is okay for Mediawiki's purposes.

Comment 69 Robert Rohde 2009-03-30 19:20:18 UTC
(In reply to comment #68)
> (In reply to comment #66)
> > We already have some of these compatibility functions in GlobalFunctions.php
> > (mb_strlen and mb_substr).  You should use those, and add any additional ones
> > there.
> 
> Okay.  The one caveat is that my functions more or less assume they are being
> passed valid UTF-8 strings, and the encoding parameter for mb_strpos, etc. is
> not implemented.  It appears that mb_strlen in GlobalFunctions is making the
> same assumption, so I'll assume that is okay for Mediawiki's purposes.

Added the necessary mb_ fallbacks to GlobalFunctions in r49043.

Figuring out the merge with ParserFunctions will take more time.

I'll probably post that as an alternative patch here and let someone with more familiarity decide whether it is better to build StringFunctions as a separate stand-alone or to merge it into the ParserFunctions.

Comment 70 Robert Rohde 2009-04-06 07:44:01 UTC
Created attachment 5993 [details]
Merge string functionality into ParserFunctions

Comments suggested it may be preferably to merge string functionality into ParserFunctions.  The attached patch would accomplish that.  The logic should be the same as the other StringFunctions patch, so one should choose one patch or the other depending on whether it is preferred for StringFunctions to operates as a separate stand-alone extension or as a component of ParserFunctions.  I'm not sure which approach is preferable.  Minor tweaks were made to more or less follow the existing layout conventions in ParserFunctions.

Also note that StringFunctions and ParserFunctions were originally written under different copyleft schemes.  I asked for and received permission from the referenced authors to GPL the StringFunction code in order to facilitate the merge.
Comment 71 Le Chat 2009-05-14 16:25:06 UTC
Sorry, I'm bemused. Every programming language I've met (admittedly that's not very many) has these string functions as absolute basic standard. How does it take three years to find a way to expose them through MW? 
Comment 72 Minh Nguyễn 2009-05-14 19:57:18 UTC
The wiki syntax (especially the subset used on Wikimedia sites) isn't quite intended as a full-fledged programming language, though it's getting to be one. Think of it more as a language for macros. Notice that there's no built-in support for iteration, either, and that's an absolute basic standard for programming languages too.
Comment 73 Le Chat 2009-05-14 21:00:54 UTC
I think you missed my point - I don't mean MW has to have something because programming languages have it, I mean if programming languages have it as standard, AND we want to have it (as we clearly do in this case), then it surely must be a pretty trivial matter to code. Surely there are standard php libraries which have all these functions? 
Comment 74 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-05-14 22:03:37 UTC
It's already implemented.  Robert has a patch, which he can commit if he likes.  He hasn't so far.
Comment 75 Roan Kattouw 2009-05-15 10:59:44 UTC
(In reply to comment #73)
> I think you missed my point - I don't mean MW has to have something because
> programming languages have it, I mean if programming languages have it as
> standard, AND we want to have it (as we clearly do in this case), then it
> surely must be a pretty trivial matter to code. Surely there are standard php
> libraries which have all these functions? 
> 

If you read the comments (granted, 74 is a lot), you'll see that there were issues with previous implementations, such as the need to use Unicode-aware string functions, the need to fall back to alternative implementations if those functions aren't available (they're a PHP extension) and the need to do all this efficiently.

Robert has attached a patch, which he could (and probably should, or maybe already has?) committed to StringFunctions in SVN; Tim or Brion can then review that and, if it passes, enable it on Wikipedia.
Comment 76 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-05-15 13:45:52 UTC
The patch is to ParserFunctions, so it wouldn't need review beyond the normal process.
Comment 77 Robert Rohde 2009-05-26 00:49:03 UTC
I made some additional tweaks to the second patch and committed it as r50997.

Comment 78 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-05-26 00:52:01 UTC
Marking FIXED, then.  Close enough to the original request.
Comment 79 Happy-melon 2009-06-19 11:51:01 UTC
The spirit of this bug is clearly "enable StringFunctions on WMF wikis".  So now we need $wgEnableStringFunctions = true; to be set on WMF wikis.  But the substance of this bug is not resolved.  Reopening.
Comment 80 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-06-19 16:54:26 UTC
Tim has stated pretty clearly that string functions will not be enabled on Wikimedia wikis, so I'll mark this WONTFIX.
Comment 81 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-06-19 17:36:06 UTC
(In reply to comment #80)
> Tim has stated pretty clearly that string functions will not be enabled on
> Wikimedia wikis, so I'll mark this WONTFIX.

. . . that's in r51497.  Quote from diff:

+/**
+ * Enable string functions.
+ *
+ * Set this to true if you want your users to be able to implement their own 
+ * parsers in the ugliest, most inefficient programming language known to man: 
+ * MediaWiki wikitext with ParserFunctions.
+ *
+ * WARNING: enabling this may have an adverse impact on the sanity of your users.
+ * An alternative, saner solution for embedding complex text processing in 
+ * MediaWiki templates can be found at: http://www.mediawiki.org/wiki/Extension:Lua
+ */

It's pretty clear this isn't going to be enabled on Wikimedia.
Comment 82 Tisza Gergő 2009-06-19 18:36:10 UTC
Opened bug 19298 for enabling Lua as per Tim's suggestion.
Comment 83 Kevin Norris 2009-06-24 16:21:19 UTC
Can we mark this bug as LATER instead of WONTFIX given the disagreement with Tim's decision expressed in the comments for the Lua bug?
Comment 84 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-06-24 21:42:53 UTC
The people disagreeing with Tim don't get to make decisions like this, Tim does.  So not much point.  Any WONTFIX could be revisited later, of course.
Comment 85 Robert Rohde 2009-06-24 22:27:14 UTC
(In reply to comment #84)
> The people disagreeing with Tim don't get to make decisions like this, Tim
> does.  So not much point.  Any WONTFIX could be revisited later, of course.
> 

Well, in point of fact, they are sort of disagreeing with you Aryeh, since you are the one who tagged it WONTFIX.

Tim's comments are discouraging, but it isn't clear to me that they represent a final conclusion on the subject.  That's doubly true since Brion has already said Lua won't be installed in the near-term (if ever), so Tim's preferred solution is pretty much no solution at all.

While Tim's concern for the sanity of wikicode is well-intentioned, I've yet to see any template coder (i.e. the people who would really be working with this) come forward to say that the incremental burden of enabling this would be terrible.  Given the evident desire of the community, and the fact that Tim's alternative isn't really available, I am wondering if this should be reopened and given more developer discussion.
Comment 86 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-06-24 23:07:45 UTC
I was taking Tim's statement as a fait accompli.  If you want to reopen this and maybe start a wikitech thread, go ahead, I don't agree with him.
Comment 87 Rich Farmbrough 2009-09-04 17:50:49 UTC
Hm there seem to be facilities for manipulating stings enabled now. So this is either fixed or it is being done by an almighty kludge and probably far less efficiently than "fixing" this. See my comment to 19298.
Comment 88 Happy-melon 2009-09-04 21:17:35 UTC
(In reply to comment #87)
> either fixed or it is being done by an almighty kludge 

Oh yes.  Those templates put all other hacks to shame. But they work, and they're now very widely used.  Which demonstrates the need for this functionality to be supported *somehow*.
Comment 89 Kevin Norris 2009-09-08 22:44:42 UTC
(In reply to comment #88)
> But they work

Really?  IIRC we don't have substrings (yet)...
Comment 90 Rich Farmbrough 2009-09-12 09:44:53 UTC
I wrote {{Sub right}} and  another one recently - trying to get  a title case template to work.  See Category:String manipulation templates, most have been around for a while.
Comment 91 Rich Farmbrough 2010-11-17 17:41:24 UTC
This really needs some attention.  We have perfectly good templates for doing minor stuff that work, provided there are less than "X" of them on a page, where "X" is a small number.  Wontfix is not a good status for this bug.  Reopening.
Comment 92 Gurch 2010-11-17 17:58:43 UTC
(In reply to comment #91)
> This really needs some attention.  We have perfectly good templates for doing
> minor stuff that work, provided there are less than "X" of them on a page,
> where "X" is a small number.  Wontfix is not a good status for this bug. 
> Reopening.

I take issue at the description "perfectly good".

What happened was, a while back well-meaning people asked for "padleft" and "padright" string functions. The devs decided to add support for these specific functions, assuming -- foolishly -- that they wouldn't be abused within an inch of their life.

Since then, various string functions (length, string search functions, sub-string based function) have been implemented using unmaintainable, indecipherable nested MediaWiki templates IN TERMS OF PADLEFT AND PADRIGHT. This is something I didn't even know was possible and probably constitutes in interesting academic exercise... oh, except this is in production use on one of the world's busiest websites.

The algorithms involves are so hideously inefficient that given the huge overhead incurred by having to parse wikitext every step of the way.

Have a look at how "str len" is implemented. This:

http://en.wikipedia.org/w/index.php?title=Template:Str_len/core&action=edit

is just part of it.

When you've finished washing your eyes out with bleach, look at the "str find" template. Note its reliance on the aformentioned "str len", as well as "str left" and various other horrendous string functions. Note that at the bottom of this hierarchy of {{{{}{}{}{}{}{}{}{{}}}} lies #padleft, #titleparts and various other functions that you wouldn't normally expect to be roped into string searching, unless you were in a batshit insane environment where they were the only primitive functions available... oh wait.

It probably takes as long to evaluate one of these string functions on a modern, top-of-the-range multicore server machine as it would to evaluate a sane implementation on a 1980s home computer. The algorithm for "str find" wouldn't even be too bad if it was implemented directly in C or something, but don't pretend that MediaWiki template syntax isn't the least efficient programming language ever created. Including several joke ones.

Come to think of it, yes, there is a really fucking good argument for enabling StringFunctions on Wikimedia wikis. And also for tracking down the people who implemented templates like [[Template:Str find]] and murdering them for crimes against programming.
Comment 93 Max Semenik 2010-11-17 18:06:08 UTC
(In reply to comment #92)

> Come to think of it, yes, there is a really fucking good argument for enabling
> StringFunctions on Wikimedia wikis. And also for tracking down the people who
> implemented templates like [[Template:Str find]] and murdering them for crimes
> against programming.

No, it's a great reason to disable #padleft and friends instead. Things ParserFunctions are (ab)used for are insane, and the more of them are there, the more insane things they allow. This spiral dive has to stop somewhere.
Comment 94 Phillip Patriakeas 2010-11-17 18:33:53 UTC
There will be a lot of angry people (and broken functionality, with no obvious way to fix, replace, or remove it) if the only currently enabled way to implement string parsing in wikicode on WMF wikis is simply disabled or removed. It is not the template coders' fault for abusing the hell out of padleft and padright, they are simply making do with the only tool they can use, and would certainly use something else if it were available (and I do mean they'd use pretty much *anything* else, as just about anything would be an improvement over the current situation). It's not like padleft and company are being blindly used either, these templates are massively optimized and however bad it is, it could be far, far worse.
Comment 95 Le Chat 2010-11-17 20:23:47 UTC
Of course the use of padleft and so on shouldn't be happening, but it's not the fault of the people who worked out those hacks. This really is a no-brainer - PLEASE *enable the efficient string functions*, and we won't be using the mind-blowingly inefficient ones any more. (Notice that the servers are still up and running in spite of the use of the inefficient hacks, so replacing them with more efficient functions will certainly not be any kind of performance hit.)
Comment 96 MZMcBride 2010-11-18 00:00:48 UTC
(In reply to comment #91)
> This really needs some attention.  We have perfectly good templates for doing
> minor stuff that work, provided there are less than "X" of them on a page,
> where "X" is a small number.  Wontfix is not a good status for this bug. 
> Reopening.

I don't have any problem with users overturning a WONTFIX with a valid reason. I've certainly done so a number of times. However, this bug as currently summarized reads "Enable StringFunctions on WMF wikis" and the most senior active sysadmin and developer has (essentially) said this is never going to happen. Re-reading comment 0 (way the hell up there), this bug was not originally about a specific extension, just about the functionality.

Either this bug should be re-closed as WONTFIX or the bug summary should be genericized. The current match-up is disingenuous and misleading.
Comment 97 Phillip Patriakeas 2010-11-18 01:57:51 UTC
Looking at this bug's history, the very first entry is Rob Church changing the summary from "Install StringFunctions" to "Install the StringFunctions extension". Unless there's missing history here (which is doubtful, since the change was made less than a half-hour after the bug report was filed), this bug was indeed originally about a specific extension, and a careful reading of comment 1 and comment 3 support this.
Comment 98 Kevin Norris 2010-11-18 13:56:20 UTC
MZ, are you seriously suggesting that the developers will completely re-implement an extension, when the concerns about the original are *not* implementation-specific?  I seriously doubt that.
Comment 99 MZMcBride 2010-11-18 20:16:23 UTC
(In reply to comment #98)
> MZ, are you seriously suggesting that the developers will completely
> re-implement an extension, when the concerns about the original are *not*
> implementation-specific?  I seriously doubt that.

I'm suggesting that the sysadmins in charge of running Wikimedia wikis have said rather unequivocally that this extension is not going to be installed. The StringFunctions extension is a means to an end. There are plenty of other ways to implement string manipulation. For years, there has been discussion of implementing a proper programming language into MediaWiki. The current preferred favorite is not Lua, but JavaScript, actually.

I don't believe that there is any legitimate objection to letting users manipulate strings. However, there are legitimate objections to enabling this extension on Wikimedia wikis. This bug is about enabling a specific extension on Wikimedia wikis. Unless there is some reason to believe this is ever going to happen, this bug should be re-closed as WONTFIX. A subsequent, generic bug should be filed about the ability to manipulate strings on Wikimedia wikis (though there's little hope of that bug being resolved anytime soon). Keeping this bug unresolved in the REOPENED state does not change the reality of the situation. It just misleads people into believing that this is still up for debate.
Comment 100 Tisza Gergő 2010-11-18 20:57:54 UTC
(In reply to comment #99)
> I don't believe that there is any legitimate objection to letting users
> manipulate strings. However, there are legitimate objections to enabling this
> extension on Wikimedia wikis.

Actually, what are those? Tim's oft-cited comment stated that StringFunctions should be deprecated in favor of Lua, but since then it was decided that Lua is an even worse option. As for a hypothetical server-side Javascript-based string manipulation extension, it has most of the drawbacks of Lua (denial-of-service vulnerability, incompatibility of Wikipedia with a MediaWiki at an average web host), with the added bonus that Lua at least exists and does not need to be implemented from scratch.

More importantly, what are the disadvantages of StringFunctions compared to the current situation? #padleft-based string manipulation is slower, less reliable, harder to understand and maintain, and more limited in its abilities. It used to be said that SF should not be enabled because then a lot of pages will depend on it, and it will be difficult to switch to a superior solution when one is found, but we already crossed that river a long time ago.
Comment 101 Victor Vasiliev 2010-11-18 21:29:42 UTC
(In reply to comment #100)
> Actually, what are those? Tim's oft-cited comment stated that StringFunctions
> should be deprecated in favor of Lua

Tim's comment was that we are not going to expand the parser functions in any way and all our further development should be concentrated at the development of sensible scripting engine instead of turning parser functions into programming language. As far as I am aware he did not change his mind about that so this bug is closed and should not be reopened unless the policy mentioned above is changed as a result of discussion among the developers (you may initiate it).
Comment 102 Krinkle 2010-11-18 21:35:27 UTC
A request once was made to implement padleft to pad left. Then more advanced functions were wanted and existing functions (ab)used to achieve it.
I can imagine developers not wanting to natively those now wanted advanced functions as it will likely lead to history repeating itself, namely some other advanced thing wanted being implemented with these etc etc.

There are way too many scripts and templates that should be and can be written as an Extension instead.

So how about opening bugs for the actual functionality wikipedians want instead of requesting functions to achieve them in templates ? The same was done with Babel, instead of creating lots and lots of templates and decentralized stuff all over the place it was written into a native Extension and everybody's happy.

I realise this is not a solution for everything though )
Comment 103 Krinkle 2010-11-18 21:36:31 UTC
mid-air collision mistake. Self-reverting status change
Comment 104 Marcus Buck 2010-11-18 21:52:35 UTC
(In reply to comment #102)
> So how about opening bugs for the actual functionality wikipedians want instead
> of requesting functions to achieve them in templates ? The same was done with
> Babel, instead of creating lots and lots of templates and decentralized stuff
> all over the place it was written into a native Extension and everybody's
> happy.

??? Have I missed some developments? The extension was created, then not reviewed by the developers and everybody is still unhappy with the old system.

And I'm suspecting the exact same thing will happen with StringFunctions...
Comment 105 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-11-18 22:08:20 UTC
Note that string function support was added to ParserFunctions proper in r50997, and disabled by default by Tim in r51497 -- a separate extension is no longer needed.  I don't know if anything has happened since June 2009 to cause him to reconsider his opinion.  I personally have thought for a long time that enabling string functions is the lesser evil here, given the givens, but it's not my call.
Comment 106 Kevin Norris 2010-11-19 02:04:16 UTC
Could Tim Sterling please indicate whether his veto on this bug is still outstanding?

If it is, I intend to bring it up on [[WP:VPT]] or somewhere and make the following proposal:

"There is community consensus to enable StringFunctions; if the developers do not enable it themselves, the community hereby requests that the WMF instruct the developers to do so."

I really hate to go over your heads on this one, but it appears to be necessary.  As Aryeh said, SF is clearly "the lesser evil" and it's patently ridiculous that the nth most popular site in the world (n=whatever our current Alexa rank is) is using [[Category:String manipulation templates]] instead of native php, especially when the functionality to do so is available and tested.
Comment 107 msh210 2010-11-19 07:21:39 UTC
(In reply to Kevin Norris's comment #106)
> If it is, I intend to bring it up on [[WP:VPT]] or somewhere and make the
> following proposal:
> 
> "There is community consensus to enable StringFunctions; if the developers do
> not enable it themselves, the community hereby requests that the WMF instruct
> the developers to do so."

If you do bring up such a proposal on-wiki, please link to it in a comment on this bug so that people on other wikis know. Thanks.
Comment 108 Rich Farmbrough 2010-11-19 14:49:13 UTC
I'm pretty sure it would garner extensive support.  

#Pro: less server load
#Pro: less page breakage
#Pro: easier template programming
#Pro: faster page load/render times
#Pro: less obscure limits on lengths
#Pro: less templates which work fine in test but are useless on an actual page

#Con: If we implement a scripting language we may need to migrate some stuff - which we would anyway.

Things have moved on in four years, but we are still struggling with ancient functionality. Wikia has more powerful facilities than the WMF projects.

I'm disappointed someone re-closed the bug, it was not re-opened lightly.

Anyone in doubt as to the importance of this bug is invited to look at VP(T) where I believe almost half the threads are related to it.
Comment 109 Happy-melon 2010-11-19 15:49:52 UTC
(In reply to comment #108)
> I'm pretty sure it would garner extensive support.  
> ...
> Anyone in doubt as to the importance of this bug is invited to look at VP(T)
> where I believe almost half the threads are related to it.

You[1] are making a mistake in assuming that if the enwiki community supports a technical change then, ipso facto, that change should be implemented, irrespective of any 'big picture' considerations.  You're[1] not in Kansas any more; the consensus of the enwiki community is not sovereign here.  

> I'm disappointed someone re-closed the bug, it was not re-opened lightly.

It was re-opened mistakenly under a [[WP:BRD]] principle which just doesn't apply here.  It is perfectly acceptable to comment, where appropriate, on closed bugs; the status applies only to the bug title, not to the discussion underneath.  Tim has said that the status of the request "Set $wgPFEnableStringFunctions=true on WMF wikis" is WONTFIX; that conclusion stands until something (maybe discussion under the closed bug, maybe something else) convinces *him* or *another sysadmin of equal standing* to reconsider it.  Someone else changing the status does not somehow reshape the world to make it so.

[1] I'm speaking generally, not to anyone specifically.
Comment 110 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-11-19 16:33:30 UTC
(In reply to comment #108)
> #Pro: less server load
> #Pro: faster page load/render times

Experience has shown that people will just write pages that use up whatever the resource limits are.  They'll use the functions to write still more complicated templates, which currently they can't write because of preinclusion size limits.  It's not at all obvious it will make anything faster, it will just allow more complexity for the same length limit.

In support of this, observe that ParserFunctions was only introduced to provide a sane replacement for [[Template:Qif]], much as this bug requests that StringFunctions be enabled to replace [[Template:Str len]] and friends.  The explosion of template complexity after ParserFunctions were turned on would have been impossible (given performance limits) with template hacks.  It's a certainty that that will happen again if we enable StringFunctions, with template editing becoming even more arcane.

Maybe we should enable the string functions, but reduce preinclusion length limit, or impose other limits on template complexity.

> #Pro: less page breakage

How so?

> #Pro: easier template programming

Not if things get even more complicated to compensate, which they will.

> #Pro: less obscure limits on lengths

The limits on length will be the same, it's just people will write even more complicated templates to use up the length limits.

#Con: Templates like {{str len}} will no longer count as much against the length limit, so the effective limit will be higher and people will be able to make even more complicated and unmaintainable wikitext pages for things that should have been written in a real language to start with.


I agree that enabling string functions is the lesser evil, but it's still evil.  People shouldn't have been writing programs in wikitext to begin with, they should use proper scripts of some type -- extensions or bots or such.  Personally I'd also be okay with restricting or disabling any functions that people are abusing to emulate string functions, like padright/left, but that would be much more disruptive, and people will always find ways to abuse innocent functionality.  So unless someone is willing to implement a systematic solution like a Lua extension, we may as well resign ourselves to making template programming less painful.
Comment 111 Le Chat 2010-11-19 17:21:09 UTC
>...abuse...

It's not abuse (which would be putting good tools to bad use), this is putting bad tools to good use.

>...proper scripts...extensions...

Yes, this seems to be the vicious circle we're in... someone *has* written an extension, but what good did it do him - we're now deprived of the use of the extension, just in case someone "abuses" it by making better use of it than was anticipated. It would obviously be much much better to have non-trivial logic compiled into the software than to do it via templates, but what choice are we given?
Comment 112 Marcus Buck 2010-11-19 19:35:53 UTC
As far as I can see all of the StringFunctions are already present in template-implemented versions now. Just in an inefficient way. So any "abuse" (quotation marks because of Le Chat's good remark) would be possible already now.

Does anybody have any ideas in which direction possible "abuse" could go? I cannot think of any new class of functionality that would become possible if we allowed StringFunctions. The template-based string functions too were not enabled by ParserFunctions alone. Template-based string functions would be impossible without "padleft:" and "padright:". These two are string functions. It's clear that when you provide a single string function and simple logic, that other string functions can be emulated. That door was left open and people walked through. But if StringFunctions do not open new doors nothing bad can happen. I don't see open doors in them. If you do see them, please report.

I guess we can safely assume that when you provide functionality people will _always_ test the limits of the functionality. It doesn't matter how few or how amazingly much functionality you provide. They will test it limits. It's almost a law of nature. That's normal and we will never have success with "We provide this functionality but please don't fully utilize it".

We have to put limitations on functionality because we need to limit the computation cost and rendering time. If we replace the template-based string functions with extension-based StringFunctions we will reduce computation cost and rendering time. That's a good thing. If you want to secure that this gain will not be consumed by increased use of the functions then set limitations on how many instances of the functions can be called on a single page.

By the way, I'm sure there are wikis with activated StringFunctions. Are there any reports that these wikis had problems with it? If there are any open doors in them, I'm sure somebody must have discovered them already!?
Comment 113 Phillip Patriakeas 2010-11-19 20:01:14 UTC
(In reply to comment #112)
> By the way, I'm sure there are wikis with activated StringFunctions. Are there
> any reports that these wikis had problems with it? If there are any open doors
> in them, I'm sure somebody must have discovered them already!?

Wikia - *all* of Wikia - has had StringFunctions enabled for years. I've never heard about any problems they've had as a result of this.
Comment 114 Tisza Gergő 2010-11-20 11:35:35 UTC
(In reply to comment #110)
> Experience has shown that people will just write pages that use up whatever the
> resource limits are.  They'll use the functions to write still more complicated
> templates, which currently they can't write because of preinclusion size
> limits.  It's not at all obvious it will make anything faster, it will just
> allow more complexity for the same length limit.
> [...]
> Maybe we should enable the string functions, but reduce preinclusion length
> limit, or impose other limits on template complexity.

You make it sound as if complexity would be a bad thing in itself. That is not so - complex tasks require complex solutions, most of the time. MediaWiki itself has become much more complex along the years, the editing workflow became more complex, Vector was a huge jump in the complexity of the editing GUI, and so on. Everyone accepts these as necessary, so why not the same for complex templates? Seems like a bit of NIH syndrome to me (or more precisely, Not Invented By Us, because it *is* invented here, just not by the developers).

I sense a good amount of developer hubris in the debates about templates - "you should leave this stuff to us, we could do it better". Sure you could - but you could do much less of it. By the same account, we should leave writing encyclopedia articles to professionals, because they are much better at it (except that Nupedia had some 100 articles after three years). This line of thinking is completely contrary to Wikipedia philosophy. Wikipedia is about generativity, community empowerment and ultra-low barriers to entry - you can't seriously suggest that making a feature request and waiting for some developer to pick it up every time someone needs a new template would be a scalable approach.

> People shouldn't have been writing programs in wikitext to begin with, they
> should use proper scripts of some type -- extensions or bots or such. 

This gets thrown around a lot, but how those proper scripts could replace the current template system is never demonstrated. Bots are not much help with dynamic text (and do have problems of their own, like littering page histories). Extensions, as I tried to point above, are not scalable (whatever you might think of the template syntax, it is a lot easier to learn than writing secure and scalable MediaWiki extensions, and we didn't even consider yet the epic fail of code review). The conclusion of the bug about Lua was that templates using scripts interpreted by some external tool are out of the question - they have security issues, and they would break compatibility of Wikipedia with pretty much all other MediaWiki installations. What is left then? Inventing another template language and writing another parser in PHP? IIRC Werdna actually offered to do that and was turned down, because that is still not a "proper" solution. The proper solution, apparently, is to deny the Wikimedia community of a useful tool, out of purely aesthetic reasons.
Comment 115 Victor Vasiliev 2010-11-20 12:20:21 UTC
(In reply to comment #114)
> What is left then? Inventing another template language and writing another
> parser in PHP? IIRC Werdna actually offered to do that and was turned down,
> because that is still not a "proper" solution.

I was working on a template scripting extension called InlineScripts. It is in Subversion and it was working last time I checked (it's most severe problem was the documentation, or, to be more specific, the absence of it). It was discussed on the developers' conference in April and the only reason I stopped working on it was the lack of time.
Comment 116 Ted Kandell 2010-11-21 02:18:18 UTC
I would like to add a concrete example to this debate, an actual use case.

Many entries in Wikipedia describe some sort of phylogenetic data, from genealogies, to the phylogenies of language families, to Y and mitochondrial DNA haplogroups.

A standard way of representing such trees is through the Newick format:
http://en.wikipedia.org/wiki/Newick_format

There are all sorts of template hacks in Wikipeida to represent family trees, genetic trees, language families, and all sorts of other related information.

Wouldn't it be better to just add the standard Newick format tree representation to articles, and then use templates to display the data in various sorts of ways? The fundamental information would then be preserved in a standardized display-independent format. Also there are a large number of tools out there that can generate graphic images based on the Newick format. 

It isn't very difficult to parse a Newick format string and create a basic tree display template from it. However, all this really would need is a full set of string functions. It's true that PHP and MediaWiki wasn't designed to be a kind of parser or compiler, but what sort of alternative can anyone think of?
Should we put in a request for MediaWiki developers to support the Newick format, and any number of other important display-independent representations of data widely used in Wikipedia? Who decides, and who does the work?

There is a good argument to "doing it right" and implementing a full scripting language (aside from Javascript?) but in the meantime, all sorts of important data that can't quite be represented as text is being added to Wikipedia in the form of templates. How can all the various sorts of tree data now in Wikipedia be extracted - or just redisplayed using whatever new and better display template comes along?

I don't know if this can be added as a "bug" in and of itself, but it it does point out the fundamental problem. MediaWiki has text, graphic, audio, and video formats, but is missing the ability to parse certain other critical basic information storage formats that the developers never considered.
Comment 117 Gurch 2010-11-21 04:34:31 UTC
(In reply to comment #106)
> "There is community consensus to enable StringFunctions; if the developers do
> not enable it themselves, the community hereby requests that the WMF instruct
> the developers to do so."

That's not really how it works. The developers *are* WMF, or at least a subset thereof. (Or were you under the impression that volunteer devs opinions' mattered in such cases? LOL)
Comment 118 Alex Z. 2010-11-21 06:23:56 UTC
(In reply to comment #116)
> It isn't very difficult to parse a Newick format string and create a basic tree
> display template from it. However, all this really would need is a full set of
> string functions. It's true that PHP and MediaWiki wasn't designed to be a kind
> of parser or compiler, but what sort of alternative can anyone think of?
> Should we put in a request for MediaWiki developers to support the Newick
> format, and any number of other important display-independent representations
> of data widely used in Wikipedia? 
> 
...
> I don't know if this can be added as a "bug" in and of itself, but it it does
> point out the fundamental problem. MediaWiki has text, graphic, audio, and
> video formats, but is missing the ability to parse certain other critical basic
> information storage formats that the developers never considered.

This is kind of the main argument against string functions. Letting users create parsers in wikitext is pretty much exactly the kind of thing that those against it want to avoid. Wikitext is not supposed to be a programming language.[1] This is also a good example of what Aryeh was talking about in comment #110.

A well-defined language that has applications in thousands of pages is an excellent candidate for something that should be handled by an extension.

> Who decides, and who does the work?

The same person who decides whether or not to enable string functions would decide to enable a Newick extension. Anyone who knows PHP can do the work.

[1] http://lists.wikimedia.org/pipermail/wikitech-l/2009-June/043609.html
Comment 119 Kevin Norris 2010-11-21 08:08:12 UTC
(In reply to comment #117)
> (In reply to comment #106)
> > "There is community consensus to enable StringFunctions; if the developers do
> > not enable it themselves, the community hereby requests that the WMF instruct
> > the developers to do so."
> 
> That's not really how it works. The developers *are* WMF, or at least a subset
> thereof. (Or were you under the impression that volunteer devs opinions'
> mattered in such cases? LOL)

The non-volunteer devs *work for* the WMF.  If the WMF decides to listen to the community (and that's a big if), I don't think the devs can reasonably say no.

What's more, the developers are primarily responsible for making functionality the community wants available to the community.  They aren't doing that here, and that's a Bad Thing.
Comment 120 Happy-melon 2010-11-21 11:44:02 UTC
(In reply to comment #119)
> The non-volunteer devs *work for* the WMF.  If the WMF decides to listen to the
> community (and that's a big if), I don't think the devs can reasonably say no.
> 
> What's more, the developers are primarily responsible for making functionality
> the community wants available to the community.  They aren't doing that here,
> and that's a Bad Thing.

You're confusing developers (who write code for new features) with sysadmins (who manage the servers and turn features on and off).  The developers are their own community around their own project: the MediaWiki software.  That community is structured slightly differently to a wiki community (there is a clear hierarchy of authority and other different ways of doing things) but fundamentally it is a volunteer project like any of the WMF's others: developers code things that interest them.  Most developers work on areas of MediaWiki which will be of use on Wikimedia wikis, as seeing their code in action on the world's 6th largest website is the most tangible reward for their time, but neither the paid nor unpaid devs are beholden to the other WMF communities (and please remember that enwiki is just one of 800 such groups); any more than one wiki community is beholden to another.  Many developers work on parts of MediaWiki which will never be installed on Wikimedia wikis.  To say that ""the developers are primarily responsible for making functionality the community wants available to the community"" is arrogant and false.

The *sysadmins*, most (but not all) of whom are also active developers, are the ones who decide which components of MediaWiki are installed on WMF wikis.  There is a strict hierarchy amongst sysadmins, and most of them are WMF paid staff.  They *are* expected to take the communities' sentiments into account when making changes, and they are indeed accountable to the Foundation.  The sysadmin you're talking about here reports directly to the Foundations' CTO; the CTO reports to the CEO, and the CEO reports to the board.  The sysadmin who has made this decision is 'above' 90% of the Foundations' paid staff in the organisational hierarchy.  Where, exactly, are you planning to go to get this decision overturned?
Comment 121 Le Chat 2010-11-21 12:00:04 UTC
>Where, exactly, are you planning to go to get this
decision overturned?

Rather than initiate some kind of power battle, I think we ought simply to politely draw the sysadmin's attention to this discussion and the apparently strong arguments in favour of changing this decision, and hope that he'll now be persuaded. (If it's Tim Starling, then I've already left a note on his en.wp user page, though others may know of more effective ways of giving him a friendly poke.)
Comment 122 Max Semenik 2010-11-21 12:42:08 UTC
(In reply to comment #121)
> Rather than initiate some kind of power battle, I think we ought simply to
> politely draw the sysadmin's attention to this discussion and the apparently
> strong arguments in favour of changing this decision, and hope that he'll now
> be persuaded. (If it's Tim Starling, then I've already left a note on his en.wp
> user page, though others may know of more effective ways of giving him a
> friendly poke.)

Thinking that he doesn't know about this bug or that he is not watching it is way too naive, so all your pokes do nothing but annoyance.
Comment 123 Juraj Simlovic 2010-11-22 23:22:45 UTC
(In reply to comment #122)
> > If it's Tim Starling, then I've already left a note on his en.wp user page,
> Thinking that he doesn't know about this bug or that he is not watching it is
> way too naive, so all your pokes do nothing but annoyance.

Actually, based on my experience with other big projects I (used to) be part of, this bug reads 123 comments as of right now. My humble guess is that Tim no longer bothers to read this bug, probably has it on his ignore list for a long time already. And I'd fully understand him. The decision has been made (I hope it was not taken lightly) and none of the above changes that (though it pollutes what should have been a technical discussion). The only reason I still read this bug is that it is getting funny, and not because I am interested in it as a dev..

jsimlo


ps. Yes, this comment also pollutes this bug. But I simply no longer see any cons of doing it.. :)
Comment 124 Le Chat 2010-11-23 05:13:36 UTC
>none of the above changes that 

It should change it really, as we now know that (a) there is continuing user demand for this functionality (b) nothing is happening or likely to happen towards providing it in any other sensible way than the one proposed (c) the use of the very inefficient workarounds without ill effect, the use of the proposed functions on Wikia, etc. prove that this functionality will not (as feared) damage performance. Presumably sysadmins don't have completely closed minds, and are capable of listening to users and arguments and taking a second look at past decisions...
Comment 125 MZMcBride 2010-11-23 06:16:45 UTC
(In reply to comment #124)
> It should change it really, as we now know that (a) there is continuing user
> demand for this functionality (b) nothing is happening or likely to happen
> towards providing it in any other sensible way than the one proposed (c) the
> use of the very inefficient workarounds without ill effect, the use of the
> proposed functions on Wikia, etc. prove that this functionality will not (as
> feared) damage performance. Presumably sysadmins don't have completely closed
> minds, and are capable of listening to users and arguments and taking a second
> look at past decisions...

Hahahahaha

You're obviously not very familiar with Wikimedia's software development processes. Right now, some of this ParserFunctions mess (and its use in high use templates like "Template:Cite") cause page renderings to take upward of 30 seconds on a large article. And still nobody cares.™  If you think a bit of whining (or is it whinging?) in bug comments or attempting to rally some folks on a village pump is going to push anything forward, you're insane. You'd be better off trying to raise some money for a grant, to be honest. (Though not really; Wikimedia is apparently trying to stop accepting money with strings attached.)

If you wrote an extension that implemented JavaScript into MediaWiki templates that also doubled as donation-related software, you might be able to attract some attention to this bug before the 12th of Never. ;-)  Otherwise, it's probably best to save your energy for battles you can possibly win.
Comment 126 Le Chat 2010-11-23 07:29:13 UTC
Don't see any reason for the negativity and sarcasm concerning this bug (as in comment above) - it's just a perfectly normal and well-reasoned feature request, which will actually *reduce* these page-rendering times you mention, and will hopefully be considered on its technical merits.
Comment 127 Juraj Simlovic 2010-11-23 22:42:44 UTC
(In reply to comment #126)
> Don't see any reason for the negativity and sarcasm concerning this bug
> (as in comment above) - it's just a perfectly normal and well-reasoned
> [...] and will hopefully be considered on its technical merits.

Simply put: I do see one. No, it is not. I thought it was back then.

The long story short: I've developed these StringFunctions (not all by myself of course, there were subsequently three of us:) because I needed them back then in my own wikies. Then someone started this bug and Tim said no. Then we, out of interest, tried to optimize the extension to be more "suitable" for wikimedia cluster. And again, Tim said no. Then someone managed to merge StringFunctions into ParserFunctions, which were/are installed on wikimedia cluster. And guess what happend: Tim said no. If it ain't clear already, Tim had his chance to reconsider.

The only thing left now is: Let it go. The more comments are posted into this bug, the more it becomes and unusable kid chat wall. No developer is probably going to invest into reading thru a hundred of comments, even if a nugget of gold was lost somewhere within. ...Ahh, who am I kiddin? This attempt of explanation is pointless anyway..
Comment 128 Phillip Patriakeas 2010-11-24 02:45:54 UTC
I've filed bug 26092 for *some* form of string parsing functionality to be enabled on WMF wikis, could we please maybe try to keep from turning it into the same mess this bug is (i.e. if you have something *useful* to contribute, by all means do, but if not, no comments saying "we need this soooo bad, the devs aren't being [fair/reasonable/humane/etc]")?

Not sure if this bug should be marked as blocking it, but it probably doesn't matter anyways since this one is closed.
Comment 129 Juraj Simlovic 2010-11-24 19:56:25 UTC
(In reply to comment #128)
> I've filed bug 26092 for

Unbelievable! :)) Yesterday, I was kinda wondering if there was any way of luring someone into creating a brand new bug as a copy of this one. Despicable me, sorry about that.. :) Of course this solves nothing, but right now I am $50 richer! And all it took was mentioning the devs' reluctancy to read some hundred of comments.. :)))))

ps. Perhaps I should be banned for disrupting, but it was worth it.
Comment 130 Phillip Patriakeas 2010-11-24 23:48:21 UTC
(In reply to comment #129)
> (In reply to comment #128)
> > I've filed bug 26092 for
> 
> Unbelievable! :)) Yesterday, I was kinda wondering if there was any way of
> luring someone into creating a brand new bug as a copy of this one. Despicable
> me, sorry about that.. :) Of course this solves nothing, but right now I am $50
> richer! And all it took was mentioning the devs' reluctancy to read some
> hundred of comments.. :)))))
> 
> ps. Perhaps I should be banned for disrupting, but it was worth it.

Actually, I'd been thinking about it for a while, I just finally decided to stop being lazy and do it already. =)
Comment 131 Dmitriy Sintsov 2010-12-29 08:09:35 UTC
(In reply to comment #99)
> (In reply to comment #98)
> > MZ, are you seriously suggesting that the developers will completely
> > re-implement an extension, when the concerns about the original are *not*
> > implementation-specific?  I seriously doubt that.
> 
> I'm suggesting that the sysadmins in charge of running Wikimedia wikis have
> said rather unequivocally that this extension is not going to be installed. The
> StringFunctions extension is a means to an end. There are plenty of other ways
> to implement string manipulation. For years, there has been discussion of
> implementing a proper programming language into MediaWiki. The current
> preferred favorite is not Lua, but JavaScript, actually.
> 
If JavaScript is the language of choice, there is PHP SpiderMonkey extension. It still is not absolutely stable (only a beta), however I know that some WMF programmers are good in C, so it is probably possible to make few fixes. The question is, how to make these scripts run at "ordinary" hosters, where there will be no such PHP extension. In such case, one might try client-side JavaScript (in browser), however passing of function / template parameters from server side to client side might become too inefficient. Perhaps one might limit the JS language features to basic subset. Then to run it through PHP mod, when available, slowly interpret in PHP otherwise. Co-location (where you can compile and install PHP mod yourself) have become more affordable in last years, anyway.
Comment 132 Dmitriy Sintsov 2010-12-29 09:05:23 UTC
The mod can also register PHP classes in JS:
http://devzone.zend.com/article/4704

There is also interesting JavaScript-based server Jaxer:
http://jaxer.org/

It allows to share a lot of server-side and client-side code. For example, it allows to run server-side jQuery. Things like parsers could be written in JavaScript then used at both sides, thus minimizing the code duplication.
Comment 133 MZMcBride 2010-12-29 09:10:37 UTC
(In reply to comment #132)

These are interesting, yes. However, these comments are really outside the scope of this bug. File a separate bug (if there isn't one already) or start a thread on the wikitech-l@lists.wikimedia.org mailing list if you're interested in further discussion about this.
Comment 134 Mark A. Hershberger 2011-09-24 18:02:19 UTC
*** Bug 31136 has been marked as a duplicate of this bug. ***
Comment 135 Daniel A. R. Werner 2011-11-17 13:44:38 UTC
One argument brought up a few times, against string functions, that people would always go to the limits of whats possible in template programming and just write more complicated templates with string functions enabled might be true. So why not simply scale down the limits after installing these functions?
Existing string templates can be re-written as wrappers for using string functions, functionality wouldn't even be broken, we would have lower limits for whats possible using templates and functions but we would have more powerful and sane functions provided. They could be used in a sane way as they are being used right now with less load on the servers.
Comment 136 Happy-melon 2011-11-17 16:21:12 UTC
(In reply to comment #135)
> So why not simply scale down the limits after installing these functions?
> Existing string templates can be re-written as wrappers for using string
> functions, functionality wouldn't even be broken, we would have lower limits
> for whats possible using templates and functions but we would have more
> powerful and sane functions provided. They could be used in a sane way as they
> are being used right now with less load on the servers.

Template limits are not just hit using string functions, indeed they're not even the major cause.  The citation templates used on a large article consume much more of the template resources than string functions, as well as stupid things like the innumerable {{SubatomicParticle}} calls (and their endless subtemplates) on [[List of baryons]] etc.  Reducing the template limits would break all these cases, and they're not scenarios which could be 'fixed' with proper string functions.
Comment 137 Dan Wolff 2011-11-17 16:28:16 UTC
A solution would be to define how expensive a parser function is, and set the string functions as "expensive" while not changing anything else. That way, other parser functions would work as they currently do, while we get the power of string functions, just that you can't use so many.

(I think there is already something like this in place already for some parser functions - not sure though)
Comment 138 Rich Farmbrough 2011-12-30 17:07:15 UTC
There is, but we have seen no evaluation of "expensive" - result is that stuff that is essential is split over several pages...

Cite templates would benefit enormously from parser functions, instead of jumping through hoops, simple tests can be made about whether something has a full stop at the end already or not.

The bug should be changed form WONTFIX, keeping it at that status because of a stray comment on a mailing list years ago, when at Wikimania 2011 Tim was undecided as to which solution (parser functions, scripting language or Victor's extension) was best. 

Really I have looked at all three, ANY ONE WILL DO. And if you change your mind from parser functions to one of the others, I WILL PERSONALLY MIGRATE ALL TEMPLATES TO THE NEW SOLUTION. 

I am re-opening this bug.  Please do not casually re-close it.
Comment 139 Ted Kandell 2011-12-31 03:40:25 UTC
Finally, some common sense here.

There are a huge number of templates that now do pretty much everything. My personal interest is in displaying trees and phylogenies. These are incredibly hard to edit now, not even worth it. I've tried to edit genealogical trees, and have given up, because the "presentation" is mixed up with the data. My browser would crash before I could even get part of it right by repeated experimentation. 

"Expensive"? All of these "hoops" that everyone has to go though to validate templates without any sort of parser functions really has a collective impact on MediaWiki and Wikipedia. "NO solution" is much much worse than a an attempt at a "bad solution". 

I don't think anyone even realizes the *lack* of editing by knowledgeable people that is taking place, because of the sheer difficulty in editing data that is not text or inline images. There's a price here, and it isn't whether "this or that implementation of trim()" regular expressions is more or less efficient.

It's been 5 1/2 years since this bug was first opened. 
Maybe someone can get moving on it before a decade has passed?
Comment 140 Daniel Friesen 2011-12-31 04:11:05 UTC
Are string functions "really" the solution to the difficulty of editing specialized data.

To me that sounds like a really horrible solution that won't actually solve the issue. If data really is complex then string functions sound like something that will only allow a change to 'another' string based data format that will still be too complex for the knowledgeable people to edit.

I'd like to see some of those complex data formats. I'm pretty sure that for the most of them the real optimal thing they need is specialized code in a proper programming language to create a format that knowledgeable people can actually understand. And perhaps even add in a ui to make that possible.
Comment 141 Jim Craigie 2011-12-31 04:34:05 UTC
String functions certainly are a solution to the problem that brought me here - attempting to construct a template to create the slightly unusual URLs used by an external site, which requires replacing each instance of a non-alphanumeric character by an underbar. Easy to do with {{#replace:}}

I was horrified to discover that a perfectly good solution has been implemented but its activation is being blocked for reasons I still cannot understand.
Comment 142 Bawolff (Brian Wolff) 2011-12-31 04:55:28 UTC
This is pointless. Can we stop beating the dead horse already?
Comment 143 Ted Kandell 2011-12-31 05:31:37 UTC
Yes, string functions they are *a* solution, that can work right now. 
Why? How would you implement parsing of say a Newick file, or any specialized data format that you didn't know about yourself, beforehand? 

There are hundreds of such data formats. Some may be very useful for common sorts of representations in Wikipedia. Will we have to open a bug for each and every one, then hardcode a parser for it, then have someone update that parser whenever a slight change in the format comes out? Or would you rather just implement AJAX and Java instead? 

BTW, how complex is it to parse a phylogenetic tree format which merely uses nested parentheses, and then display it, when these can be copied from anywhere?
http://en.wikipedia.org/wiki/Newick_format

The point is that often data in these specialize formats *already exists* out there, somewhere, and just needs to be displayed. 

If you mean "stop beating a dead horse and just release these functions" I say yes. But if you mean "stop asking for them, you'll never ever get them, forget it  ... "
Comment 144 Ted Kandell 2011-12-31 05:38:11 UTC
Examples?
Here is the complete grammar for the "complex specialized" Newick format:

The grammar rules

Note, "|" separates alternatives.

   Tree --> Subtree ";" | Branch ";"
   Subtree --> Leaf | Internal
   Leaf --> Name
   Internal --> "(" BranchSet ")" Name
   BranchSet --> Branch | BranchSet "," Branch
   Branch --> Subtree Length
   Name --> empty | string
   Length --> empty | ":" number

Examples:

(,,(,));                               no nodes are named
(A,B,(C,D));                           leaf nodes are named
(A,B,(C,D)E)F;                         all nodes are named
(:0.1,:0.2,(:0.3,:0.4):0.5);           all but root node have a distance to parent
(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;       all have a distance to parent
(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);       distances and leaf names (popular)
(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;     distances and all names
((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;    a tree rooted on a leaf node (rare)
Comment 145 Ted Kandell 2011-12-31 05:40:53 UTC
Here is an example of a current genealogical tree, using templates:

http://fr.wikipedia.org/wiki/Rachi#G.C3.A9n.C3.A9alogie

=== Généalogie ===
<center>
{{Arbre généalogique/début|style=font-size:75%;}}
{{Arbre généalogique | SAM | | | | | | | | | | | | | | |RSH| | | |SAM=Samuel|RSH='''Rachi ([[1040]]-[[1104]])'''}}
{{Arbre généalogique | |!| | | | |,|-|-|-|-|-|-|-|v|-|-|-|^|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|.}}
{{Arbre généalogique |SMH| | |RHL|-|AZR| |KVD|v|RMB| | | | | | | | |SMA| | |MRM|v|YBN||SMH=Simha ben Samuel de Vitry| RHL=Rachel ''Bellassez''| AZR=Eliézer ''Jocelyn'' | KVD=Yokheved | RMB=Meïr ben Samuel | MRM=Myriam | YBN=Judah ben Nathan | SMA=Shémaiah}}
{{Arbre généalogique | |!| | | |,|-|-|-|v|-|-|-|v|-|-|^|-|-|-|-|v|-|-|-|.| | | |!| | | | |,|-|^|-|.}}
{{Arbre généalogique |SAM|v|HAN| |SLM| |RTM|v|MRM| |RVM| |SBM|v|INC| | |YTV| |AZR|SAM=Samuel de Vitry|HAN=Hanna| SLM=Salomon |RTM=[[Rabbénou Tam]] (~[[1100]]-[[1171]])|MRM=Myriam |RVM=Isaac Rivam|SBM=Samuel [[Rashbam]] (~[[1085]]-[[1158]])|INC=?|YTV=Yom Tov de Falaise|AZR=Eléazar }}
{{Arbre généalogique | | | |!| | | | | |,|-|-|-|v|-|^|-|v|-|-|-|.| | | | | |!| | | |,|-|-|^|-|-|-|.}}
{{Arbre généalogique | | |RI| | | |ITS| |SLM| |MSH| |ISF| | | |ITS | |YHD| | | | |ISF|RI=[[Isaac ben Samuel de Dampierre|Isaac de Dampierre]] dit le Ri (~[[1120]]-[[1195]])|ITS=Isaac|SLM=Salomon|MSH=Moïse|ISF=Joseph| YHD=Judah | ITS=Isaac}}
{{Arbre généalogique | | | |!| | | | | | | | | | | | | | | | | | | | | | | | |,|-|-|^|.| | | |,|-|^|.}}
{{Arbre généalogique | | |HNN| | | | | | | | | | | | | | | | | | | | | | |ITS| |AZR|v|BLA| |LAH|HNN=Elhanan (mort [[1184]])|ITS=Isaac|AZR=Eléazar  | BLA=Bila | LAH=Léah}}
{{Arbre généalogique | | | |!| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |!| | | | | }}
{{Arbre généalogique | | |SML| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |YHD| SML=Samuel|YHD=Judah de Paris Sir Léon ([[1166]]-[[1224]])}}
{{Arbre généalogique/fin}}
</center>
Comment 146 Ted Kandell 2011-12-31 05:48:42 UTC
http://fr.wikipedia.org/wiki/Rachi#G.C3.A9n.C3.A9alogie

In the above tree, I need to add a father for Shémaiah and make Shémaiah the father of Eliézer Jocelyn. 

That should be a simple change using the above templates, right? 

No.

= "Lack of string functions"

In the Newick format it would take 1 second, and there are tools to create and edit such files. 

Now think of the hundreds of other easy-to-parse useful standard data formats ...
Comment 147 Daniel Friesen 2011-12-31 06:10:07 UTC
This Newick format and that genealogy stuff look like a perfect example of what string functions will NOT solve.

String functions are for simple text replacements and tests. What are you going to do, write a whole Newick parser in string functions? If, and that's a big if given that we don't have variables inside WikiText, you can manage to implement Newick parsing inside of a template. That template is going to be insanely complex, trying to make minor tweaks to the template which would be sane in a normal programming language are going to become so hard it's nearly impossible. And to top it off that template is going to be so heavy that it slows down parsing for every page you use it on (multiplied by how much you use it and how much data you input).

If we have a use for it, then what it sounds like we could use, if we actually have a use for it, would be a real Newick parser. Just as for whatever other formats there are for things that are in fact useful to Wikipedia. Yes there are hundreds of formats, but when we talk about Wikipedia and implementation we only care about the ones that will output things we want on Wikipedia, and within that only the few formats we actually need. We don't have to implement parsing for dozens of formats that do the same thing when there's one format most people can use that'll work.
But I would also like to make the point that what I see as the output of those genealogy I can't consider acceptable. It's horrible, absolutely disgusting. A complete abuse of html tables in a presentational way. I don't want to see a new template that outputs the same garbage. Not only do those need a better system of inputting the information, they need a better output. Something you can't do in templates because it likely involves building a .svg or something.

From what I see of Newick and your example your argument also falls short. Newick seams to describe trees that only branch outwards. But that genealogy tree appears to re-connect at various points. In other words, it looks like your example tree actually CAN'T be expressed in Newick.

Frankly, it looks like you could use DOT. Wonder what happened to graphviz in all this.
Comment 148 John Du Hart 2011-12-31 06:11:48 UTC
I'm reclosing as WONTFIX.

It's very clear that we're going to have a new solution in the next year to handle these situations. Whether it be Lua, built in Javascript or an extension to handle cite templates. Whatever the fix is, I think the developers have made a point that string functions simply won't be enabled.

Therefore, the bug's original request of setting $wgPFEnableStringFunctions = true on Wikimania wikis will not happen. Hence, WONTFIX.

Please don't change this unless you are a developer.

(In reply to comment #142)
> This is pointless. Can we stop beating the dead horse already?

Agreed.
Comment 149 Rich Farmbrough 2012-02-24 15:35:22 UTC
It's not about cite templates alone.  And I'm glad you say "we're going to have a new solution in the next year" - Lua has been committed to but it was also the proposed solution back in 2009, and I think it's rather "I will believe it when I see it".

I don't think injunctions like "Please don't change this unless you are a developer." are very cool. It is quite possible that Lua will be decided against (as it was before) and then this should be re-opened. 

And the previous "WONTFIX - please don't change" was predicated on a stray remark by Tim Starling in a mail list. At WikiMania 2011 Tim changed his mind several times on the best solution including Lua, parser functions, Victor's scripting extension. 

Or maybe we should change this bug to "provide some form of string handling, and soon" because otherwise we might have Lua kicking around for another 6 years and still be no further forward.
Comment 150 Happy-melon 2012-02-24 15:57:24 UTC
(In reply to comment #149)
> I don't think injunctions like "Please don't change this unless you are a
> developer." are very cool. It is quite possible that Lua will be decided
> against (as it was before) and then this should be re-opened. 

Those two comments are in no way exclusive: a developer would be best placed to know if any change occurs to the commitment to Lua.  Although as has been said before, the existence of an alternative is not a prerequisite for WONTFIXing this.
 
> Or maybe we should change this bug to "provide some form of string handling,
> and soon" because otherwise we might have Lua kicking around for another 6
> years and still be no further forward.

That would be bug 26092, of which this bug is a dependency.  Everyone knows that this is an open, important and complicated issue; if changing the title of a bug were all it took to magically untie the gordian knot, we would have done it by now.
Comment 151 Victor Vasiliev 2012-02-25 11:09:05 UTC
(In reply to comment #149)
> I don't think injunctions like "Please don't change this unless you are a
> developer." are very cool. It is quite possible that Lua will be decided
> against (as it was before) and then this should be re-opened. 

No, proper scripting language is certainly preferred to string functions in wikitext and I cannot imagine what must happen so we reconsider this.

> And the previous "WONTFIX - please don't change" was predicated on a stray
> remark by Tim Starling in a mail list. At WikiMania 2011 Tim changed his mind
> several times on the best solution including Lua, parser functions, Victor's
> scripting extension. 

Lua is an almost-final choice, made by consensus of WMF developers. Even if we change the language, the current plan is to develop infrastructure which is language-independent (so we can just plug in a different language backend without rewriting anything else).

> Or maybe we should change this bug to "provide some form of string handling,
> and soon" because otherwise we might have Lua kicking around for another 6
> years and still be no further forward.

It's WMF engineering project now, and as far as I am aware the active work on it should begin shortly after 1.19 deployment and git migration.
Comment 152 Dmitriy Sintsov 2012-02-27 07:55:47 UTC
Victor, I am off from my extension's developing due to various problems, however may I ask you to give an address of the page where the scripting project status will be updated, please? One of my extensions already needs strong scripting language and I wonder whether it is already possible to hook / bind Lua calls in separate MW extension. Basically, I need to have few custom Lua functions bound to PHP methods in extension's code and the possibility to execute Lua scripts which use these function calls.
Comment 153 Helder 2012-02-27 09:56:41 UTC
(In reply to comment #152)
> Victor, I am off from my extension's developing due to various problems,
> however may I ask you to give an address of the page where the scripting
> project status will be updated, please?
I think that would be
https://www.mediawiki.org/wiki/Lua_scripting/status
Comment 154 MZMcBride 2013-03-25 07:01:47 UTC
Just noting here in a comment that bug 26092 ("Enable or install string parsing wikimarkup functionality on WMF wikis") has now been marked resolved/fixed. Wikimedia wikis now all have proper string functions (but not StringFunctions) via [[mw:Extension:Scribunto]] and [[mw:Lua]]. :-)

Thanks to Brad Jorsch, Tim Starling, Victor Vasiliev, and many others (including the template writers who are now embarking on a massive upgrade) for all of their past, present, and future work on this. It seems we're now on the cusp of greatly reducing parse times of pages, which is really awesome.

Related bugs:

* bug 26786 – Add functionality (in an extension or MediaWiki) and implement to make English Wikipedia's [[Template:Cite]] work faster

* bug 19262 – Pages with a high number of templates suffer extremely slow rendering or read timeout for logged in users

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links