Last modified: 2011-04-15 10:08:18 UTC
Created attachment 5251 [details] unified diff file This patch allows more number formats to be detected properly when sorting tables. Specifically, numbers beginning with "-" or "+", or containing "e" or "E" are handled properly. This patch requires that the multiple "if" statements in this particular section also be changed to "if" and "if else" statements. Otherwise, the regexp for numeric values will falsely detect some values that are actually dates.
Do all browsers support all of these formats? If some browsers don't support exponential formats, it might be best not to let those through. If they've all supported them since the beginning, though, which is quite possible, it's fine.
Javascript is based on the ECMA standard. It shouldn't matter what browser you use. This has nothing to do with character entities or text encoding or anything like that.
I just want to re-emphasize that this patch requires some changes made in one of the earlier patches (and not included here) in order to work.
(In reply to comment #2) > Javascript is based on the ECMA standard. It shouldn't matter what browser you > use. This has nothing to do with character entities or text encoding or > anything like that. Yeah, theory is a wonderful thing . . . it's probably correct on this point, I'm just being paranoid. (In reply to comment #3) > I just want to re-emphasize that this patch requires some changes made in one > of the earlier patches (and not included here) in order to work. Can you post a full patch that will implement this by itself? Or at least indicate exactly which other patch is needed, and which lines of that patch?
Created attachment 5256 [details] unified diff file Sure, here you go.
The code (the new and the original) searches for the percent sign (%) if it exists. I was wondering if there were any other symbols that should be searched for? I can't think of any off the top of my head, besides maybe the degrees symbol (°). Actually, the plus-minus symbol (±) might be a good thing to catch, too. The ¼ ½ and ¾ symbols could easily be replaced with decimal values. ² and ³ are a bit tougher. They would need to be evaluated as expressions in addition to being replaced.
The tricky thing is if these ANSI codes represent something different in other character encodings.
Created attachment 5257 [details] unified diff file Added support for additional mathematics/science symbols as outlined in my previous post.
Created attachment 5259 [details] unified diff file Slight tweak.
(In reply to comment #8) > Added support for additional mathematics/science symbols as outlined in my > previous post. I think this goes much too far. We don't need to write a mini-parser here. The treatment of ± is outright wrong, too: it makes no sense to try sorting such a thing relative to numbers known to be positive or negative. I'm going to look at attachment 5256 [details] instead: >- sortfn = ts_sort_caseinsensitive; >+ var sortfn = ts_sort_caseinsensitive; > if (itm.match(/^\d\d[\/. -][a-zA-Z]{3}[\/. -]\d\d\d\d$/)) >- sortfn = ts_sort_date; >- if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d\d\d$/)) >- sortfn = ts_sort_date; >- if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d$/)) >- sortfn = ts_sort_date; >- if (itm.match(/^[\u00a3$\u20ac]/)) // pound dollar euro >- sortfn = ts_sort_currency; >- if (itm.match(/^[\d.,]+\%?$/)) >- sortfn = ts_sort_numeric; >+ var sortfn = ts_sort_date; >+ else if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d\d\d$/)) >+ var sortfn = ts_sort_date; >+ else if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d$/)) >+ var sortfn = ts_sort_date; >+ else if (itm.match(/^[\u00a3$\u20ac]/)) // pound dollar euro >+ var sortfn = ts_sort_currency; >+ else if (itm.match(/^[\d.,-+eE]+\%?$/)) >+ var sortfn = ts_sort_numeric; Why are you adding "var" before each sortfn assignment? That should only be needed on the initial assignment. Even if not, *please* keep one thing to one patch. Same for if -> else if. The substantive change here *seems* to be one line, but I'd prefer not to have to look through ten times as many lines to find it: >- if (itm.match(/^[\d.,]+\%?$/)) >+ if (itm.match(/^[\d.,-+eE]+\%?$/)) Note that this seems incorrect, "-" has a special meaning here and should be at the end of the character class. I've committed this line's change as r40348, with the character class reordered so "-" is at the end. I've also committed the stylistic changes *separately* as r40349 -- except with only the first variable declaration marked local, not all of them. If there's some reason to mark all of them with "var" separately, let me know.
(In reply to comment #10) > (In reply to comment #8) > > Added support for additional mathematics/science symbols as outlined in my > > previous post. > I think this goes much too far. We don't need to write a mini-parser here. > The treatment of ± is outright wrong, too: it makes no sense to try sorting > such a thing relative to numbers known to be positive or negative. I don't think it's overkill. There are only a few ANSI characters used in mathematics that aren't already understood by javascript. As for the ± symbol, I wouldn't expect it to present a problem the way it is typically used. > I'm going to look at attachment 5256 [details] instead: > >- sortfn = ts_sort_caseinsensitive; > >+ var sortfn = ts_sort_caseinsensitive; > > if (itm.match(/^\d\d[\/. -][a-zA-Z]{3}[\/. -]\d\d\d\d$/)) > >- sortfn = ts_sort_date; > >- if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d\d\d$/)) > >- sortfn = ts_sort_date; > >- if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d$/)) > >- sortfn = ts_sort_date; > >- if (itm.match(/^[\u00a3$\u20ac]/)) // pound dollar euro > >- sortfn = ts_sort_currency; > >- if (itm.match(/^[\d.,]+\%?$/)) > >- sortfn = ts_sort_numeric; > >+ var sortfn = ts_sort_date; > >+ else if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d\d\d$/)) > >+ var sortfn = ts_sort_date; > >+ else if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d$/)) > >+ var sortfn = ts_sort_date; > >+ else if (itm.match(/^[\u00a3$\u20ac]/)) // pound dollar euro > >+ var sortfn = ts_sort_currency; > >+ else if (itm.match(/^[\d.,-+eE]+\%?$/)) > >+ var sortfn = ts_sort_numeric; > Why are you adding "var" before each sortfn assignment? That should only be > needed on the initial assignment. Even if not, *please* keep one thing to one > patch. Same for if -> else if. The substantive change here *seems* to be one > line, but I'd prefer not to have to look through ten times as many lines to > find it: First of all, the statements are preceded by var because they *are* the first instance. There's no sense in defining it an extra time just to eliminate some characters when it's not necessary. Secondly you specifically asked me to include the additional code needed for this patch to work. I quote: > Can you post a full patch that will implement this by itself? Or at least > indicate exactly which other patch is needed, and which lines of that patch? > >- if (itm.match(/^[\d.,]+\%?$/)) > >+ if (itm.match(/^[\d.,-+eE]+\%?$/)) > Note that this seems incorrect, "-" has a special meaning here and should be at > the end of the character class. I've committed this line's change as r40348, > with the character class reordered so "-" is at the end. Good catch. > I've also committed the stylistic changes *separately* as r40349 -- except with > only the first variable declaration marked local, not all of them. If there's > some reason to mark all of them with "var" separately, let me know. As I explained, the "var"s are necessary, otherwise you are creating a *global* variable within each condition. This is unneeded additional overhead.
(In reply to comment #11) > (In reply to comment #10) > > Why are you adding "var" before each sortfn assignment? That should only be > > needed on the initial assignment. Even if not, *please* keep one thing to one > > patch. Same for if -> else if. The substantive change here *seems* to be one > > line, but I'd prefer not to have to look through ten times as many lines to > > find it: > First of all, the statements are preceded by var because they *are* the first > instance. There's no sense in defining it an extra time just to eliminate some > characters when it's not necessary. Secondly you specifically asked me to > include the additional code needed for this patch to work. I quote: Ah, I see my error now. The block of code is supposed to look like this: >+ if (itm.match(/^\d\d[\/. -][a-zA-Z]{3}[\/. -]\d\d\d\d$/)) >+ var sortfn = ts_sort_date; >+ else if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d\d\d$/)) >+ var sortfn = ts_sort_date; >+ else if (itm.match(/^\d\d[\/.-]\d\d[\/.-]\d\d$/)) >+ var sortfn = ts_sort_date; >+ else if (itm.match(/^[\u00a3$\u20ac]/)) // pound dollar euro >+ var sortfn = ts_sort_currency; >+ else if (itm.match(/^[\d.,+eE-]+\%?$/)) >+ var sortfn = ts_sort_numeric; >+ else >+ var sortfn = ts_sort_caseinsensitive;
(In reply to comment #11) > I don't think it's overkill. There are only a few ANSI characters used in > mathematics that aren't already understood by javascript. Why just [[Latin-1]] characters (more proper term than "[[ANSI]]") and not Unicode generally? It's a slippery slope here, especially ² and ³. When is anyone ever going to square tabulated data? ± might be worth it, but Unicode fractions are rarely used, and if you support any you should support all, not just the Latin-1 ones. % might be useful, but it's already treated more or less correctly by just stripping it: the only issue would be if people mix percents and normal numbers in the same table, which is weird. > First of all, the statements are preceded by var because they *are* the first > instance. There's no sense in defining it an extra time just to eliminate some > characters when it's not necessary. No, they aren't the first instance. To wit: var sortfn = ts_sort_caseinsensitive; That's executed unconditionally at the start. > Secondly you specifically asked me to > include the additional code needed for this patch to work. I quote: By that I meant include additional code needed for this patch to *work*, not additional code needed for the patch to *apply* in its unduly modified form. Of course your original patch was one line, but I didn't remember that when I wrote my last comment, what with the other ten patches by you I've been looking at. I thought you meant a meaningful dependency, not a formatting dependency, which is why I said it should be included in the patch. Patches should be *formatted* to apply against current trunk. (In reply to comment #12) > Ah, I see my error now. The block of code is supposed to look like this: Which is one line plus 4*5 = 20 characters longer. :)
(In reply to comment #13) > (In reply to comment #11) > > I don't think it's overkill. There are only a few ANSI characters used in > > mathematics that aren't already understood by javascript. > Why just [[Latin-1]] characters (more proper term than "[[ANSI]]") and not > Unicode generally? It's a slippery slope here, especially ² and ³. When is > anyone ever going to square tabulated data? ± might be worth it, but Unicode > fractions are rarely used, and if you support any you should support all, not > just the Latin-1 ones. % might be useful, but it's already treated more or > less correctly by just stripping it: the only issue would be if people mix > percents and normal numbers in the same table, which is weird. Yes, the powers of 2 and 3 were maybe overkill (and lead to some false results). I'll post a trimmed-down version. The fractions ½ ¼ ¾ require only simple substitutions, no evaluation of any kind. Percentages are also an easy case. > > First of all, the statements are preceded by var because they *are* the first > > instance. There's no sense in defining it an extra time just to eliminate some > > characters when it's not necessary. > No, they aren't the first instance. To wit: > var sortfn = ts_sort_caseinsensitive; > That's executed unconditionally at the start. Hence my correction. > > Secondly you specifically asked me to > > include the additional code needed for this patch to work. I quote: > By that I meant include additional code needed for this patch to *work*, not > additional code needed for the patch to *apply* in its unduly modified form. > Of course your original patch was one line, but I didn't remember that when I > wrote my last comment, what with the other ten patches by you I've been looking > at. I thought you meant a meaningful dependency, not a formatting dependency, > which is why I said it should be included in the patch. Patches should be > *formatted* to apply against current trunk. The changes *were* required in order for it to work. In its original form, the numerical search might have picked up a date when it wasn't supposed to. > > Ah, I see my error now. The block of code is supposed to look like this: > Which is one line plus 4*5 = 20 characters longer. :) You're missing the point. Declaring the variable at the beginning results in an additional assignment and an (expensive) additional lookup in global scope in 5 out of the 6 cases.
Created attachment 5271 [details] unified diff file Scaled-down version of the last patch.
Also, you retained a bug from the previous version when you committed the change. In /^[\d.,eE+-]+\%?$/ the e and E may appear at the beginning of the text, which is not a number.
Created attachment 5272 [details] unified diff file The previous versions of the code were picking up a lot of (admittedly unlikely) false positives. Hopefully I've eliminated them. I'll also upload the test script I've been using just in case.
Created attachment 5273 [details] unified diff file Forgot something in the last version.
Created attachment 5274 [details] test script Here's the test script I've been using to eliminate errors.
Created attachment 5275 [details] unified diff file Cought something else.
Created attachment 5276 [details] test script Update to the test script.
Please make all new patches against SVN trunk. Your current patches wouldn't have applied (even if I wanted to apply them) because of r40348 and r40349. Before you make changes to submit a patch, right-click on wikibits.js or the directory and choose "Update" or something like that, to make sure the copy you're patching against is up-to-date. (In reply to comment #14) > The fractions ½ ¼ ¾ require only > simple substitutions, no evaluation of any kind. How about the fractions ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞? And you need to consider "1 ½", "1-½", etc. Nobody even uses these, [[WP:MOS]] and most other guides recommend against them. It adds bloat for little gain here. > The changes *were* required in order for it to work. In its original form, the > numerical search might have picked up a date when it wasn't supposed to. Well, you should *say* these things, so that I don't make incorrect assumptions about what your patch is supposed to do. It wasn't obvious. > You're missing the point. Declaring the variable at the beginning results in an > additional assignment and an (expensive) additional lookup in global scope in 5 > out of the 6 cases. You're missing the point. *Performance does not matter here.* This piece of code is run *once* for every time the user clicks the button. You're talking probably under a microsecond's difference for each *click*. The physical process of clicking the mouse button takes thousands of times as long. Stop worrying about performance when you don't need to. It makes code uglier and harder to understand and maintain. (In reply to comment #16) > Also, you retained a bug from the previous version when you committed the > change. > > In /^[\d.,eE+-]+\%?$/ the e and E may appear at the beginning of the text, > which is not a number. There are a lot more possible problems than that, in fact. + and - can also appear anywhere, you can have multiple dots, etc. It doesn't support hex. I've created a new regex based on the ECMAScript standard: http://www.ecma-international.org/cgi-bin/counters/unicounter.pl?name=Ecma-262&deliver=http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf I've committed it in r40379 (with a fix in r40382), so there should be nothing incorrect detected as numbers anymore.
(In reply to comment #22) > Please make all new patches against SVN trunk. Your current patches wouldn't > have applied (even if I wanted to apply them) because of r40348 and r40349. > Before you make changes to submit a patch, right-click on wikibits.js or the > directory and choose "Update" or something like that, to make sure the copy > you're patching against is up-to-date. > (In reply to comment #14) > > The fractions ½ ¼ ¾ require only > > simple substitutions, no evaluation of any kind. > How about the fractions ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞? And > you need to consider "1 ½", "1-½", etc. Nobody even uses these, [[WP:MOS]] > and most other guides recommend against them. It adds bloat for little gain > here. It seems to be used quite frequently, and is even included in the JavaScript character insertion toolbox of the edit page. > > The changes *were* required in order for it to work. In its original form, the > > numerical search might have picked up a date when it wasn't supposed to. > Well, you should *say* these things, so that I don't make incorrect assumptions > about what your patch is supposed to do. It wasn't obvious. I did say so in message #5. > > You're missing the point. Declaring the variable at the beginning results in an > > additional assignment and an (expensive) additional lookup in global scope in 5 > > out of the 6 cases. > You're missing the point. *Performance does not matter here.* This piece of > code is run *once* for every time the user clicks the button. You're talking > probably under a microsecond's difference for each *click*. The physical > process of clicking the mouse button takes thousands of times as long. > Stop worrying about performance when you don't need to. It makes code uglier > and harder to understand and maintain. Might I remind you that my previous optimizations were similarly labeled trivial, yet I was able to achieve a twenty-fold increase in javascript performance. Also, lag in user interface elements is particularly insidious. > (In reply to comment #16) > > Also, you retained a bug from the previous version when you committed the > > change. > > > > In /^[\d.,eE+-]+\%?$/ the e and E may appear at the beginning of the text, > > which is not a number. > There are a lot more possible problems than that, in fact. + and - can also > appear anywhere, you can have multiple dots, etc. It doesn't support hex. Yes, hence my subsequent patches. I feel your lack of keeping current with the latest changes to be suspect. > I've created a new regex based on the ECMAScript standard: > http://www.ecma-international.org/cgi-bin/counters/unicounter.pl?name=Ecma-262&deliver=http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf > I've committed it in r40379 (with a fix in r40382), so there should be nothing > incorrect detected as numbers anymore. There are several issues with your changes: 1 commas can appear at the beginning of a number 2 multiple commas may appear in a sequence or separated by fewer than three digits 3 multiple periods may appear in a single number 4 commas may appear after a period 5 sequences surrounded by parentheses are stored as matches when they don't need to be All of which I have resolved in my latest changes.
(In reply to comment #22) > Please make all new patches against SVN trunk. Your current patches wouldn't > have applied (even if I wanted to apply them) because of r40348 and r40349. > Before you make changes to submit a patch, right-click on wikibits.js or the > directory and choose "Update" or something like that, to make sure the copy > you're patching against is up-to-date. > (In reply to comment #14) > > The fractions ½ ¼ ¾ require only > > simple substitutions, no evaluation of any kind. > How about the fractions ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞? And > you need to consider "1 ½", "1-½", etc. Nobody even uses these, [[WP:MOS]] > and most other guides recommend against them. It adds bloat for little gain > here. Forgot to add a link to support the frequency of the character's usage: http://www.google.com/search?hl=en&q=site%3Aen.wikipedia.org+%C2%BD&aq=f&oq=site%3Aen.wikipedia.org+%C2%BD%22 Also, I don't think "1 ½" "1-½" are valid. > There are a lot more possible problems than that, in fact. + and - can also > appear anywhere, you can have multiple dots, etc. It doesn't support hex. > I've created a new regex based on the ECMAScript standard: > http://www.ecma-international.org/cgi-bin/counters/unicounter.pl?name=Ecma-262&deliver=http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf > I've committed it in r40379 (with a fix in r40382), so there should be nothing > incorrect detected as numbers anymore. Having three separate regexps sharing multiple sequences also slows down the code.
(In reply to comment #23) > It seems to be used quite frequently, and is even included in the JavaScript > character insertion toolbox of the edit page. It still adds complication . . . I'm not sure. I'd like some more opinions on this. > I did say so in message #5. Providing a patch that also made changes that clearly weren't necessary for this to work (adding var), and where you didn't explain why changing "if" to "else if" was actually necessary -- which wasn't obvious at all. > Might I remind you that my previous optimizations were similarly labeled > trivial, yet I was able to achieve a twenty-fold increase in javascript > performance. Also, lag in user interface elements is particularly insidious. I'll reserve judgment on your other optimizations (not *all* of which anyone called trivial, only some of them) until I've reviewed them. Microseconds or less per click is not insidious, it's invisible. > Yes, hence my subsequent patches. I feel your lack of keeping current with the > latest changes to be suspect. "Suspect"? I'm a volunteer here. I'm kindly setting aside my own time to review your patches out of the goodness of my heart. It's pretty rude of you to question my commitment when I have no obligation to pay attention to you at all, and when in fact most patches to MediaWiki go totally unreviewed and are ignored for months or years. Your subsequent patches were totally unexplained, so I didn't realize what exactly you had done with them. If you had *pointed out* what problems they fixed, I might not have seen the need to make that comment. Besides, your subsequent patches make mistakes that mine didn't. They consider, for instance, the single string "." to be a number. > 1 commas can appear at the beginning of a number Yeah, I spotted that when re-reviewing one of your patches above. I've fixed that in r40611. > 2 multiple commas may appear in a sequence or separated by fewer than three > digits I'm not sure if that's good or bad. Is sorting something like "40,52" as the number 4052 better or worse than sorting it as a string? Sorting "," as 0 or NaN or whatever it evaluates to is definitely wrong, and I've fixed that. > 3 multiple periods may appear in a single number How? I don't see how that's possible with the regexes I committed. > 4 commas may appear after a period It doesn't hurt anything. I guess it would simplify the regex a bit if that didn't happen, but the commas will be stripped before evaluation anyway. > 5 sequences surrounded by parentheses are stored as matches when they don't > need to be Chucking in extra "?:" everywhere makes the code uglier for no good reason. (In reply to comment #24) > Also, I don't think "1 ½" "1-½" are valid. Why not? > Having three separate regexps sharing multiple sequences also slows down the > code. I'm simply not going to consider any claims of slowing down anything unless you can provide either benchmarks or a convincing theoretical argument for why the slowdown might be nontrivial. Some microoptimizations are worth making, but most aren't except in performance-critical code paths.
(In reply to comment #25) > (In reply to comment #23) > > I did say so in message #5. > Providing a patch that also made changes that clearly weren't necessary for > this to work (adding var), and where you didn't explain why changing "if" to > "else if" was actually necessary -- which wasn't obvious at all. I already explained earlier in this thread that the "var"s are necessary, otherwise you end up possibly overriding the regexps for the dates. > > Yes, hence my subsequent patches. I feel your lack of keeping current with the > > latest changes to be suspect. > "Suspect"? I'm a volunteer here. I'm kindly setting aside my own time to > review your patches out of the goodness of my heart. It's pretty rude of you > to question my commitment when I have no obligation to pay attention to you at > all, and when in fact most patches to MediaWiki go totally unreviewed and are > ignored for months or years. I am also a volunteer, and am no less obligated to submit patches to the software. Don't act like you're doing me a favor. > Your subsequent patches were totally unexplained, so I didn't realize what > exactly you had done with them. If you had *pointed out* what problems they > fixed, I might not have seen the need to make that comment. Each of my patches is accompanied by a comment as to the nature of the changes. They're easily enough compared with previous versions using the "diff" command that exists next to the link. > Besides, your > subsequent patches make mistakes that mine didn't. They consider, for > instance, the single string "." to be a number. No you're wrong. The only period in the regexp is escaped, and therefore only matches a literal dot. > > 1 commas can appear at the beginning of a number > Yeah, I spotted that when re-reviewing one of your patches above. I've fixed > that in r40611. > > 2 multiple commas may appear in a sequence or separated by fewer than three > > digits > I'm not sure if that's good or bad. Is sorting something like "40,52" as the > number 4052 better or worse than sorting it as a string? Sorting "," as 0 or > NaN or whatever it evaluates to is definitely wrong, and I've fixed that. It's been a couple of days since I worked on it, so I'm a bit hazy on why this might be undesirable. > > 3 multiple periods may appear in a single number > How? I don't see how that's possible with the regexes I committed. My mistake. > > 4 commas may appear after a period > It doesn't hurt anything. I guess it would simplify the regex a bit if that > didn't happen, but the commas will be stripped before evaluation anyway. > > 5 sequences surrounded by parentheses are stored as matches when they don't > > need to be Most of the speed benefits (see below) are a direct result of this optimization. > Chucking in extra "?:" everywhere makes the code uglier for no good reason. > (In reply to comment #24) > > Also, I don't think "1 ½" "1-½" are valid. > Why not? Authors would omit the space simply for the sake of readability. The minus sign in this case indicates and expression not a number, and expressions should not be picked up by the regexp. > > Having three separate regexps sharing multiple sequences also slows down the > > code. > I'm simply not going to consider any claims of slowing down anything unless you > can provide either benchmarks or a convincing theoretical argument for why the > slowdown might be nontrivial. Some microoptimizations are worth making, but > most aren't except in performance-critical code paths. I'll attach the demonstration file where you can compare my and your regular expressions side by side. After five trials the results were: 2237ms for the current method 1996ms for the current regexp plus the "?:" at the beginning of the parentheses 1943ms for the method in my latest patch
Created attachment 5309 [details] javascript demonstration Comparison of three methods of writing regular expressions. The regular expressions are tested for speed versus an identical input string. The results are sent to standard output.
(In reply to comment #26) > Each of my patches is accompanied by a comment as to the nature of the changes. > They're easily enough compared with previous versions using the "diff" command > that exists next to the link. Not an informative comment. Things like "Forgot something in the last version" and "Caught something else" don't tell me what the actual difference is. In any event, there's no point bickering about this. I'm telling you how to best get your patches reviewed. If you don't follow that advice -- although mostly you have followed it so far with regard to splitting patches up and so forth, and I thank you for that -- then you're going to drive away reviewers, including me, who are uninterested in spending the time to decipher, cut pieces out of, or reformat your patches when they could just write their own, or ignore the problem entirely. > > Besides, your > > subsequent patches make mistakes that mine didn't. They consider, for > > instance, the single string "." to be a number. > > No you're wrong. The only period in the regexp is escaped, and therefore only > matches a literal dot. Yes, that's my point. The literal string "." is counted as a number. Look at the regex: /^[+-]?(?:\d{1,3}(?:\,\d{3})*)*\.?\d*(?:[eE][+-]?\d+(?:\,\d{3})*)?[%\u00b0\u00bc\u00bd\u00be]?$/ This can match nothing: [+-]?(?:\d{1,3}(?:\,\d{3})*)* This can also match nothing: \d*(?:[eE][+-]?\d+(?:\,\d{3})*)?[%\u00b0\u00bc\u00bd\u00be]? The part in between is this: \.? and that can match the literal string ".". For that matter, your pattern also matches the empty string, since every piece is followed by either ? or *. > > > Also, I don't think "1 ½" "1-½" are valid. > > Why not? > > Authors would omit the space simply for the sake of readability. The minus sign > in this case indicates and expression not a number, and expressions should not > be picked up by the regexp. "1 ½" is the same as "1½", which is the same as "1-½" (the hyphen/dash there is not a minus sign, it just separates the fractional part from the integer part). Googling "1 ½" shows a mix of the three formats: http://www.google.com/search?q="1+½" > I'll attach the demonstration file where you can compare my and your regular > expressions side by side. After five trials the results were: > > 2237ms for the current method > 1996ms for the current regexp plus the "?:" at the beginning of the parentheses > 1943ms for the method in my latest patch Yes, if you *run the code a hundred thousand times*. It's not run 100,000 times. It's run *once* when the user clicks the sort button. The difference is 22.37 *microseconds* for my code and 19.43 *microseconds* for your code, per click. So in other words, correct me if I'm mistaking how this code is used, you're making changes to the code that save the user three microseconds when they click on the sort button. Am I misconstruing how often this code runs, or do you get my point that it makes absolutely no difference to the speed of the application? If I'm not wrong, we're basically down to the following differences between your version and mine: * Your patch treats fractions and percent signs more intelligently. I'm not sure if we really want to go in this direction. It's possibly scope creep, and there's no end to it. I'd like input from some more people on this, I'm not going to commit it by myself. * You're a little more picky about where commas go. Again, I'm not sure if this is needed or not. I won't do anything without further input from others. * Your version uses regexes that save maybe two microseconds per click. Unless I'm totally mistaken on how many times that code is executed, in which case I retract my remarks, I hope you agree we don't need to worry about complicating the regex over this.
(In reply to comment #28) > Yes, that's my point. The literal string "." is counted as a number. Look at > the regex: > /^[+-]?(?:\d{1,3}(?:\,\d{3})*)*\.?\d*(?:[eE][+-]?\d+(?:\,\d{3})*)?[%\u00b0\u00bc\u00bd\u00be]?$/ > This can match nothing: > [+-]?(?:\d{1,3}(?:\,\d{3})*)* > This can also match nothing: > \d*(?:[eE][+-]?\d+(?:\,\d{3})*)?[%\u00b0\u00bc\u00bd\u00be]? > The part in between is this: > \.? > and that can match the literal string ".". For that matter, your pattern also > matches the empty string, since every piece is followed by either ? or *. I've been trying to find a way to fix this, but the only solutions I've been able to come up with involve slapping additional tests onto the end, such as " && (itm != '') ", which I'd rather not do. > > > > Also, I don't think "1 ½" "1-½" are valid. > > > Why not? > > > > Authors would omit the space simply for the sake of readability. The minus sign > > in this case indicates and expression not a number, and expressions should not > > be picked up by the regexp. > "1 ½" is the same as "1½", which is the same as "1-½" (the hyphen/dash there > is not a minus sign, it just separates the fractional part from the integer > part). Googling "1 ½" shows a mix of the three formats: > http://www.google.com/search?q="1+½" OK, but the same might be said of the percent sign that exists in the current revision; and, at least the version with the space in between (the most common of the two) could be caught by simply adding an "\s*" beforehand. > * Your version uses regexes that save maybe two microseconds per click. Unless > I'm totally mistaken on how many times that code is executed, in which case I > retract my remarks, I hope you agree we don't need to worry about complicating > the regex over this. No, it's not a lot of time, but I like to be thorough.
Being "thorough" should not equate to writing less readable code. The extra characters are not needed and so are just clutter. Regardless, as I said, I'm not going to commit any further changes to the things brought up here (complexity creep) unless other developers are in favor. So I have nothing more to say here unless a third party comments.
Stuff like e/E and -/+ is fixed in r86088 (I think e/E was patched in the meantime as well) I agree that ½ is unneccessary