Last modified: 2011-04-14 15:11:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17677, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 15677 - New tool to delimit scientific notation


Summary:	New tool to delimit scientific notation

Status:	NEW

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low enhancement with 1 vote (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2008-09-21 17:11 UTC by Greg L
Modified:	2011-04-14 15:11 UTC (History)
CC List:	2 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Example code that would be generated (1.93 KB, text/plain) 2008-09-22 00:07 UTC, Greg L	Details
RTF file showing example code (2.44 KB, text/rtf) 2008-09-22 00:13 UTC, Greg L	Details
Show Obsolete (1) Add an attachment (proposed patch, testcase, etc.)

Description Greg L 2008-09-21 17:11:47 UTC

This is a continuation of Bugzilla 13025 and should be considered its replacement.

A template [[Template:Val]] has to use math-based methods to parse the string in order to count digits and place gaps between the digits. This technique is prone to rounding errors. Even though the template has an error-checking ability, it still generates improper strings. For instance, {{val|6.02214184||e=23}} will generate 6.022 141 839 × 10^23 (note the “39” instead of the “4”).

Note how this tool is used (with work-arounds) here on [[Kilogram]]

http://en.wikipedia.org/wiki/Kilogram#Proposed_future_definitions

What is desperately needed are new parser functions that will permit the simple counting of characters to delimit numeric strings rather than math-based parser functions. This will allow the creation of a new magic word by the name of {{delimitnum}}. Note that there is already a template by the same “delimitnum” name. However, it is even more prone to rounding errors that {{val}} because it has no error checking. {{Delimitnum}} should be replaced by a parse function as proposed herein.

Delimitnum’s functionality is largely described here:

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_(dates_and_numbers)/Archive_94#Grouping_of_digits_after_the_decimal_point_.28next_attempt.29

It was extensively discussed and voted upon here…

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_(dates_and_numbers)/Archive_94#Continuing_Discussion.2C_specifically_regarding_latest_nutshell_proposal

 on WT:MOSNUM and was well received here on WT:MOS:

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style/Archive_97#Exponential_notation

…where its functionality tweaked.

Here is a nutshell of how the new, replacement delimitnum magic word should work:

The magic word would parse as follows:

{{delimitnum: (value) | (uncertainty) | (base–ten exponent) | (unit symbol) }}

It would use span-based tags (e.g. <span style="margin-left:0.25em">) to space out characters without actually generating a separate character (so values copied and pasted into programs like Excel, where they would be treated as true numbers). The template would replace hand-coded strings such as this:

6.022<span style="margin-left:0.25em">461<span style="margin-left:0.2em">79</span></span>(30)&thinsp;×&thinsp;10<sup>23</sup>&nbsp;kg

The parsing and spacing logic would be as follows:

Q1: Are there five or more undelimited digits remaining after the decimal marker? No=Stop / Yes=Advance three digits and prepare to add span gap. Goto Q2.
Q2: Is the span gap to be added following the digit “1”? No=Add a span gap of 0.25 em and then goto Q1 / Yes=Add a span gap of 0.2 em and then goto Q1.

The exact em widths chosen above produce the best looking results on the widest range of computing platforms. Some browsers resolve to 0.05 em. Others don’t and round up whereas others don’t round up. These characteristics are exploited to our advantage. Note also that a span gap following the digit 1 is different (0.2 em) v.s. that used for the others.

As mentioned above, details can be found here:

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_(dates_and_numbers)/Archive_94#Grouping_of_digits_after_the_decimal_point_.28next_attempt.29

Note however, that since the above thread was archived, the spaces on each side of the “x” (multiply) sign have changed to thinspaces (&thinsp;) per the above cited discussions on WT:MOS

Comment 1 Greg L 2008-09-22 00:07:42 UTC

Created attachment 5356 [details]
Example code that would be generated

This is an txt file for placement into any Wikipedia test page showing example output code that would be generated.

Comment 2 Greg L 2008-09-22 00:13:30 UTC

Created attachment 5357 [details]
RTF file showing example code

This attached rtf file shows the Wiki-code that would be generated using various input options. Paste into any Wikipedia page.

Comment 3 Greg L 2008-11-04 00:43:26 UTC

Please also, it would be very nice if delimitnum would also do what {{val}} currently does with negative exponents: if an editor inputs a hand-typed hyphen/minus (character 45) from the keyboard (-), the rendered version should substitute Wikipedia’s minus sign (−) from the “insert” menu.

Comment 4 Greg L 2008-11-08 22:24:15 UTC

Per conversation with Werdna…

http://en.wikipedia.org/w/index.php?title=User_talk:Werdna&oldid=250362962

a separate parser function to count characters can not be expected. Accordingly, the only practical solution is an entire magic word. The most straightforward solution would be to rewrite {{val}} so it retains its current functionality but no longer relies upon math to parse the numbers. The best solution would be to also do the same for {{delimitnum}}.

Comment 5 Alex Z. 2008-11-19 00:59:29 UTC

I think this is a lot more complicated than it sounds on first read. It would need to take into consideration different styles of number formatting by language - http://en.wikipedia.org/wiki/Decimal_separator#Arabic_numeral_system - The way I see it, the main options are:

Option 1:
* Create several different formatting options and maintain a list of which style goes with which languages, possibly with a switch option in the function to use a non-default style
** Advantages: Easier on users, easier to code than option 2
** Disadvantages: Requires more maintenance and more initial work in determining which style goes with which languages

Option 2:
* Create a fancy syntax for the function to tell the parser function how to format it, similar to the {{#time:}} function syntax, though probably much more complicated
** Advantages: Doesn't require localization, more flexible
** Disadvantages: Harder for users to use, may have issues similar to StringFunctions (bug:6455), harder to code than option 1

Option 3:
* Create a separate extension to add the function (as opposed to putting it in the ParserFunctions extension or core) that would only format numbers in the English style (at least initially)
** Advantages: Could eventually become like option 1 without needing to be fully localized immediately, easier to code
** Disadvantages: Somewhat unfair to other languages until support for them is added, slightly more overhead code

IMO, option 3 would be the best. Option 2 would likely be far too complicated to be useful outside of complex templates. And, since option 3 requires the least work, would be most likely to actually get done.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links