Last modified: 2012-01-08 20:16:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 33571 - PHP floor( log10() ) gives wrong results and need a replacement
PHP floor( log10() ) gives wrong results and need a replacement
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.20.x
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 29169
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-06 22:44 UTC by Antoine "hashar" Musso (WMF)
Modified: 2012-01-08 20:16 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
test of the C functions (435 bytes, text/plain)
2012-01-07 14:22 UTC, Platonides
Details

Description Antoine "hashar" Musso (WMF) 2012-01-06 22:44:19 UTC
Language::formatBitrate( 1000000000000000 ) gives '1,000Tbps' when it should be '1Pbps'.

The end result is 'languages/LanguageTest.php' test suite gives a failure:

1) LanguageTest::testFormatBitrate with data set #5 (1000000000000000, '1Pbps', '1 petabit per second')
formatBitrate('1000000000000000'): 1 petabit per second
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'1Pbps'
+'1,000Tbps'


The root cause is using floor() on a float returned by log(). That might be rounded down to the wrong integer :-/

As an example on http://codepad.org/VBkuWR8m

print floor(log10(1000000000000000)) . "\n";
// outputs 15, correct
print floor(log(1000000000000000,10)) . "\n";
// outputs 14 (WRONG)

print floor(log10(1000)) . "\n";
// outputs 3, correct
print floor(log( 1000, 10 )) . "\n";
// outputs 2 (WRONG)


On a Mac with PHP:
PHP 5.3.6 with Suhosin-Patch (cli) (built: Sep  8 2011 19:34:00) 
Copyright (c) 1997-2011 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2011 Zend Technologies
    with Xdebug v2.1.2, Copyright (c) 2002-2011, by Derick Rethans

print floor(log10(1000000000000000)) . "\n";
// outputs 14 (WRONG)
print floor(log(1000000000000000,10)) . "\n";
// outputs 15, correct

print floor(log10(1000)) . "\n";
// outputs 3, correct
print floor(log( 1000, 10 )) . "\n";
// outputs 3, correct


r108284 used log( foo, 10) instead of log10() but that broke the continuous integration system which use Ubuntu :D
Comment 1 Platonides 2012-01-06 23:27:43 UTC
That's not specific to Mac OS.
Comment 2 Sam Reed (reedy) 2012-01-06 23:33:00 UTC
(In reply to comment #1)
> That's not specific to Mac OS.

It seems to only do it on Mac OS... 5.3.8 on windows locally and 5.3.6 on ubuntu don't do it...
Comment 3 Platonides 2012-01-06 23:38:16 UTC
I see it in 5.3.8, 5.2.13, 4.4.9... (x86_64) probably depends of the arithmetic processor.
Is that windows/ubunto Intel or AMD?

Maybe we could always add an appropiate epsilon.

print floor(log(1000000000000000,10)+0.000000000000001);
15
print floor(log(1000000000000000,10)+0.0000000000000001);
14
Comment 4 Sam Reed (reedy) 2012-01-06 23:40:14 UTC
(In reply to comment #3)
> I see it in 5.3.8, 5.2.13, 4.4.9... (x86_64) probably depends of the arithmetic
> processor.
> Is that windows/ubunto Intel or AMD?
> 
> Maybe we could always add an appropiate epsilon.
> 
> print floor(log(1000000000000000,10)+0.000000000000001);
> 15
> print floor(log(1000000000000000,10)+0.0000000000000001);
> 14

Mines Windows 7 x64 Intel



reedy@gallium:~$ uname -a
Linux gallium 2.6.32-35-server #78-Ubuntu SMP Tue Oct 11 16:26:12 UTC 2011 x86_64 GNU/Linux


reedy@gallium:~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 30
model name      : Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz
Comment 5 Antoine "hashar" Musso (WMF) 2012-01-07 09:39:49 UTC
PHP log() and log10() are not reliable:

php -r 'printf("%.32f\n", log(pow(10,15),10) );'
15.00000000000000355271367880050093

php -r 'printf("%.32f\n", log10(pow(10,15)) );'
14.99999999999999822364316059974954

Both should give exactly 15.


php -r 'printf("%.32f\n", log(pow(10,15)-1,10) );'
15.00000000000000177635683940025046

That last one should give 14.99<something>


So I think it is safer to reimplements floor(log10()) by avoiding float type entirely.
Comment 6 Platonides 2012-01-07 14:22:56 UTC
Created attachment 9821 [details]
test of the C functions

php -r 'printf("%.32f\n", log(pow(10,15)-1,10) );'
14.99999999999999822364316059974954

php -r 'printf("%.32f\n", log10(pow(10,15)) );'
15.00000000000000000000000000000000

php -r 'printf("%.32f\n", log(pow(10,15)-1,10) );'
14.99999999999999822364316059974954



It's more or less normal to have a few differences between machines in floating point behavior http://c-faq.com/fp/strangefp.html

log(x), log10(x) and floor(x) are simple wrappers to the C functions. However, log(x,y) is log(x) / log(10), which is where I am getting the difference.

I think we could instead of floor use round(x, 0, PHP_ROUND_HALF_DOWN);
i.e. round(log(1000000000000000,10), 0, PHP_ROUND_HALF_DOWN) it has a number of checks to get the appropiate value.
Comment 7 Brion Vibber 2012-01-08 20:05:45 UTC
Isn't it pretty normal for floating point numbers to not exactly match decimal representations?

Sounds like a comparison somewhere is assuming that numbers will be exact decimal matches when they actually aren't, and needs to be more liberal in its comparisons.
Comment 8 Brion Vibber 2012-01-08 20:16:05 UTC
Fixed in r108363.

Worked around log10 by simply duplicating/tweaking the very similar code in the function immediately below, which does the same thing with 1024 instead of 1000 unit sizes, uses only simple division, and passes the test cases.

Might want to merge those two into a common function and pass it its common parameters.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links