ES Decimal status
Sam Ruby wrote:
Description of ES 3.1m decimal support
number and decimal are distinct primitive data types. The former is based on IEEE 754 binary 64 floating point, the latter is based on IEEE 754r decimal 128 floating point.
[...]
Conversion from number to decimal is precise and will round trip.
Conversion of number to decimal is not precise.
From esdiscuss/2008-August/007251:
Decimal is wider, but while every number expressible as a binary64 will
map to a unique decimal128 value and will round-trip back to the same
binary64 number[3], it is not the case that the numbers that a binary64
number maps in decimal128 to map to exactly the same point on the real
number line as the original binary64 number[4].
[3] speleotrove.com/decimal/decifaq6.html#binapprox
[4] speleotrove.com/decimal/decifaq6.html#bindigits
I'm also not sure whether the round-trip property covers all NaNs (preserving payload), denormal, or unnormal values.
Sam Ruby wrote:
On Fri, Sep 12, 2008 at 4:02 PM, David-Sarah Hopwood <david.hopwood at industrial-designers.co.uk> wrote:
Sam Ruby wrote:
Description of ES 3.1m decimal support
number and decimal are distinct primitive data types. The former is based on IEEE 754 binary 64 floating point, the latter is based on IEEE 754r decimal 128 floating point.
[...]
Conversion from number to decimal is precise and will round trip.
Conversion of number to decimal is not precise.
I chose my words carefully :-)
I agree that decimal has a precision greater than number, but you said that "Conversion from number to decimal is precise" without qualification. In the case of a deterministic conversion rather than a measurement, I think it is quite unclear to use the term "precise" in this context (as opposed to "repeatable" or "reproducible"), and I don't think that the spec should express it in that way.
I'm also not sure whether the round-trip property covers all NaNs (preserving payload), denormal, or unnormal values.
NaN payloads should be preserved (decimal has quite a few more bits).
The results, however, wouldn't preserve the "normalness" (or lack there of) of the original input.
Thanks for the clarification. Does IEEE 754r (or the ES3.1m interpretation/ profile of it) specify that NaN payloads shall be preserved in a way that round-trips?
On Fri, Sep 12, 2008 at 7:01 PM, David-Sarah Hopwood <david.hopwood at industrial-designers.co.uk> wrote:
Sam Ruby wrote:
On Fri, Sep 12, 2008 at 4:02 PM, David-Sarah Hopwood <david.hopwood at industrial-designers.co.uk> wrote:
Sam Ruby wrote:
Description of ES 3.1m decimal support
number and decimal are distinct primitive data types. The former is based on IEEE 754 binary 64 floating point, the latter is based on IEEE 754r decimal 128 floating point.
[...]
Conversion from number to decimal is precise and will round trip.
Conversion of number to decimal is not precise.
I chose my words carefully :-)
I agree that decimal has a precision greater than number, but you said that "Conversion from number to decimal is precise" without qualification. In the case of a deterministic conversion rather than a measurement, I think it is quite unclear to use the term "precise" in this context (as opposed to "repeatable" or "reproducible"), and I don't think that the spec should express it in that way.
This was just my summary. When people see Decimal(1.1) produces 1.100000000000000088817841970012523m, they are likely to react "boy that's... um... precise".
Independent of how it is described, do you or anybody see any specific results in the following that seems inconsistent with ECMAScript:
intertwingly.net/stories/2008/09/12/estest.html
Any other tests I should add?
I'm also not sure whether the round-trip property covers all NaNs (preserving payload), denormal, or unnormal values.
NaN payloads should be preserved (decimal has quite a few more bits).
The results, however, wouldn't preserve the "normalness" (or lack there of) of the original input.
Thanks for the clarification. Does IEEE 754r (or the ES3.1m interpretation/ profile of it) specify that NaN payloads shall be preserved in a way that round-trips?
My understanding is that IEEE 754r merely encourages the payloads to be preserved, but I haven't looked recently. Whether we decide to go beyond the requirements of the specification is up to us.
-- David-Sarah Hopwood
- Sam Ruby
Conversion from number to decimal is precise and will round
trip.
Conversion of number to decimal is not precise. I chose my words carefully :-) en.wikipedia.org/wiki/Accuracy_and_precision
I agree that decimal has a precision greater than number, but you said that "Conversion from number to decimal is precise" without
qualification.
In the case of a deterministic conversion rather than a measurement, I think it is quite unclear to use the term "precise" in this context (as opposed to "repeatable" or "reproducible"), and I don't think that the spec should express it in that way.
This was just my summary. When people see Decimal(1.1) produces 1.100000000000000088817841970012523m, they are likely to react "boy that's... um... precise".
:-). 'correctly rounded' would be a good way to describe it.
I'm also not sure whether the round-trip property covers all NaNs (preserving payload), denormal, or unnormal values.
NaN payloads should be preserved (decimal has quite a few more bits).
The results, however, wouldn't preserve the "normalness" (or lack there of) of the original input.
Thanks for the clarification. Does IEEE 754r (or the ES3.1m
interpretation/
profile of it) specify that NaN payloads shall be preserved in a way
that
round-trips?
My understanding is that IEEE 754r merely encourages the payloads to be preserved, but I haven't looked recently. Whether we decide to go beyond the requirements of the specification is up to us.
IEEE 754-2008 could only go as far as using Shoulds for propagation of payloads, because of the varied implementations of binary formats (different ways of distinguishing signaling NaNs, for example). The relevant subclause is:
6.2.3 NaN propagation 6.2.3.0 An operation that propagates a NaN operand to its result and has a single NaN as an input should produce a NaN with the payload of the input NaN if representable in the destination format.
If two or more inputs are NaN, then the payload of the resulting NaN should be identical to the payload of one of the input NaNs if representable in the destination format. This standard does not specify which of the input NaNs will provide the payload.
Conversion of a quiet NaN from a narrower format to a wider format in the same radix, and then back to the same narrower format, should not change the quiet NaN payload in any way except to make it canonical.
Conversion of a quiet NaN to a floating-point format of the same or a different radix that does not allow the payload to be preserved, shall return a quiet NaN that should provide some language-defined diagnostic information.
There should be means to read and write payloads from and to external character sequences (see 5.12.1).
Mike
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
There is still a blocking issue that's been discussed a lot but left off the issues here:
- Treatment of cohorts in the default conversion of decimal to string. Despite objections, I still get the following results:
1/3m - 1/3m produces 0e-34
1e12m - 1e12m produces 0e+12
1/.1m produces 1e+1
Waldemar
There is still a blocking issue that's been discussed a lot but left off the issues here:
- Treatment of cohorts in the default conversion of decimal to string. Despite objections, I still get the following results:
1/3m - 1/3m produces 0e-34
1e12m - 1e12m produces 0e+12
1/.1m produces 1e+1
I'm still not clear what your objections are -- we can discuss on Thursday, but the intent here is that the toString conversion be reversible (i.e., from the string representation one can recreate the decimal128 from which it was derived). So I think you are really complaining about the rule for division (which is 'the best yet' that anyone has come up with, as it matches the rule for multiplication) and the rule for subtraction (which preserves the smaller exponent). Hence 1.23-1.22 -> 0.01 and 1.23-1.23 -> 0.00. The exponent is not
data-dependent unless there is rounding.
Mike
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Mike Cowlishaw wrote:
There is still a blocking issue that's been discussed a lot but left off the issues here:
- Treatment of cohorts in the default conversion of decimal to string. Despite objections, I still get the following results:
1/3m - 1/3m produces 0e-34
1e12m - 1e12m produces 0e+12
1/.1m produces 1e+1
I'm still not clear what your objections are -- we can discuss on Thursday, but the intent here is that the toString conversion be reversible (i.e., from the string representation one can recreate the decimal128 from which it was derived). So I think you are really complaining about the rule for division (which is 'the best yet' that anyone has come up with, as it matches the rule for multiplication) and the rule for subtraction (which preserves the smaller exponent). Hence 1.23-1.22 -> 0.01 and 1.23-1.23 -> 0.00. The exponent is not data-dependent unless there is rounding.
Mike
No; I'm complaining about the bizarre results that toString produces which are counter to its existing behavior. I don't care how many extra zeroes division sticks at the end of the number because I should never see them. This has been hashed out many times and also breaks things such as arrays if left uncorrected.
Note that Douglas's plan to have a decimal-only version of ECMAScript cannot work as long as this toString misbehavior continues.
Waldemar
Waldemar Horwat wrote:
Mike Cowlishaw wrote:
There is still a blocking issue that's been discussed a lot but left off the issues here:
- Treatment of cohorts in the default conversion of decimal to string. Despite objections, I still get the following results:
1/3m - 1/3m produces 0e-34
1e12m - 1e12m produces 0e+12
1/.1m produces 1e+1 I'm still not clear what your objections are -- we can discuss on Thursday, but the intent here is that the toString conversion be reversible (i.e., from the string representation one can recreate the decimal128 from which it was derived). So I think you are really complaining about the rule for division (which is 'the best yet' that anyone has come up with, as it matches the rule for multiplication) and the rule for subtraction (which preserves the smaller exponent). Hence 1.23-1.22 -> 0.01 and 1.23-1.23 -> 0.00. The exponent is not data-dependent unless there is rounding.
Mike
No; I'm complaining about the bizarre results that toString produces which are counter to its existing behavior. I don't care how many extra zeroes division sticks at the end of the number because I should never see them. This has been hashed out many times and also breaks things such as arrays if left uncorrected.
Agreed. It's fine to have an operation that maps decimal128 values to strings reversibly, but '.toString()' should not be that operation.
Mike Cowlishaw wrote:
Waldemar Horwat wrote:
Mike Cowlishaw wrote:
There is still a blocking issue that's been discussed a lot but left off the issues here:
- Treatment of cohorts in the default conversion of decimal to string. [...]
I'm still not clear what your objections are -- we can discuss on Thursday, but the intent here is that the toString conversion be reversible [...]
No; I'm complaining about the bizarre results that toString produces which are counter to its existing behavior. I don't care how many extra zeroes division sticks at the end of the number because I should never see them. This has been hashed out many times and also breaks things such as arrays if left uncorrected.
Agreed. It's fine to have an operation that maps decimal128 values to strings reversibly, but '.toString()' should not be that operation.
I don't really have an opinion on how the operation should be spelled.
That matters, because .toString() is used implicitly in primitive operations, unlike other most other methods (.valueOf() aside).
But it would seem nice, not bizarre, for the default conversion from a decimal number to a string in ES to work the same as in Java and in the equivalent functions and methods for C, C++, Python, many other languages, and in existing decimal packages.
Java, C and C++ do not have implicit conversions that call .toString(), and in particular they don't call it on the index in an array indexing operation.
David-Sarah Hopwood wrote:
Mike Cowlishaw wrote:
Waldemar Horwat wrote:
Mike Cowlishaw wrote:
There is still a blocking issue that's been discussed a lot but
left
off the issues here:
- Treatment of cohorts in the default conversion of decimal to string. [...]
I'm still not clear what your objections are -- we can discuss on Thursday, but the intent here is that the toString conversion be reversible [...]
No; I'm complaining about the bizarre results that toString produces which are counter to its existing behavior. I don't care how many extra zeroes division sticks at the end of the number because I
should
never see them. This has been hashed out many times and also breaks things such as arrays if left uncorrected.
Agreed. It's fine to have an operation that maps decimal128 values to strings reversibly, but '.toString()' should not be that operation.
I don't really have an opinion on how the operation should be spelled.
That matters, because .toString() is used implicitly in primitive operations, unlike other most other methods (.valueOf() aside).
Understood. I meant that if the function had to be provided under another name that is OK, but not ideal.
But it would seem nice, not bizarre, for the default conversion from a decimal number to a string in ES to work the same as in Java and in the
equivalent
functions and methods for C, C++, Python, many other languages, and in
existing decimal packages.
Java, C and C++ do not have implicit conversions that call .toString(),
Java certainly does (the concatenate operator, and in many common methods (println for example)). And in the new C extensions, printf does the equivalent on decimal data and gives the same string results by default.
and in particular they don't call it on the index in an array indexing operation.
This is true. But that in itself is not the problem. Currently, should a programmer write:
a[1]="first" a[1.000]="second"
it's assumed that the second case was an accidental typo and they really did not mean to type the extra '.000'. The problem occurs at that point, on the conversion from a decimal (ASCII/Unicode/whatever) string in the program to an internal representation. When the internal representation cannot preserve the distinction (as with binary doubles) there's not much that can be done about it. But a decimal internal representation can preserve the distinction, and so it should - 1m and 1.000m differ in the same was a "1" and "1.000". They are distinguishable, but when interpreted as a number, they are considered equal.
Mike
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
On Sep 24, 2008, at 3:33 AM, Mike Cowlishaw wrote:
and in particular they don't call it on the index in an array
indexing operation.This is true. But that in itself is not the problem. Currently,
should a programmer write:a[1]="first" a[1.000]="second"
it's assumed that the second case was an accidental typo and they
really did not mean to type the extra '.000'. The problem occurs at that
point, on the conversion from a decimal (ASCII/Unicode/whatever) string in
the program to an internal representation. When the internal
representation cannot preserve the distinction (as with binary doubles) there's not
much that can be done about it. But a decimal internal representation can preserve the distinction, and so it should - 1m and 1.000m differ in
the same was a "1" and "1.000". They are distinguishable, but when interpreted as a number, they are considered equal.
I'm not sure what you are getting at. a[1] and a[1.000] refer to the
same property in ECMAScript, but a[1m] and a[1.000m] would not. Are
you saying this isn't a problem?
I would agree with Waldermar that it is a serious problem. Not so much
for literals as for values that end up with varying numbers of
trailing zeroes depending on how they were computed, even though they
are numerically the same. Certainly it seems it would make arrays
unusable for someone trying to use decimal numbers only.
, Maciej
Maciej Stachowiak wrote:
On Sep 24, 2008, at 3:33 AM, Mike Cowlishaw wrote:
and in particular they don't call it on the index in an array
indexing operation. This is true. But that in itself is not the problem. Currently,
should a programmer write:a[1]="first" a[1.000]="second"
it's assumed that the second case was an accidental typo and they
really did not mean to type the extra '.000'. The problem occurs at that
point, on the conversion from a decimal (ASCII/Unicode/whatever) string in
the program to an internal representation. When the internal
representation cannot preserve the distinction (as with binary doubles) there's not
much that can be done about it. But a decimal internal representation can preserve the distinction, and so it should - 1m and 1.000m differ in
the same was a "1" and "1.000". They are distinguishable, but when interpreted as a number, they are considered equal.I'm not sure what you are getting at. a[1] and a[1.000] refer to the
same property in ECMAScript, but a[1m] and a[1.000m] would not. Are
you saying this isn't a problem?I would agree with Waldermar that it is a serious problem. Not so much
for literals as for values that end up with varying numbers of
trailing zeroes depending on how they were computed, even though they
are numerically the same. Certainly it seems it would make arrays
unusable for someone trying to use decimal numbers only.
"broken", "unusable". Given superlatives such as these, one would think that code which would change in behavior would be abundant, and readily identified.
, Maciej
- Sam Ruby
On Sep 24, 2008, at 7:31 AM, Sam Ruby wrote:
Maciej Stachowiak wrote:
On Sep 24, 2008, at 3:33 AM, Mike Cowlishaw wrote:
and in particular they don't call it on the index in an array
indexing operation. This is true. But that in itself is not the problem. Currently,
should a programmer write:a[1]="first" a[1.000]="second"
it's assumed that the second case was an accidental typo and they
really did not mean to type the extra '.000'. The problem occurs at
that point, on the conversion from a decimal (ASCII/Unicode/whatever) string
in the program to an internal representation. When the internal
representation cannot preserve the distinction (as with binary doubles) there's
not much that can be done about it. But a decimal internal representation
can preserve the distinction, and so it should - 1m and 1.000m differ
in the same was a "1" and "1.000". They are distinguishable, but when interpreted as a number, they are considered equal. I'm not sure what you are getting at. a[1] and a[1.000] refer to
the same property in ECMAScript, but a[1m] and a[1.000m] would
not. Are you saying this isn't a problem? I would agree with Waldermar that it is a serious problem. Not so
much for literals as for values that end up with varying numbers
of trailing zeroes depending on how they were computed, even
though they are numerically the same. Certainly it seems it would
make arrays unusable for someone trying to use decimal numbers only."broken", "unusable". Given superlatives such as these, one would
think that code which would change in behavior would be abundant,
and readily identified.
I would not expect there to be a wide body of existing code using the
decimal extension to ECMAScript, let alone trying to use it for all
browsers. Such code would not work at all in today's browsers, and has
probably been written by specialist experts, so I am not sure studying
it would show anything interesting.
, Maciej
Maciej Stachowiak wrote:
You probably meant to send this to the list.
Oops. Resending.
Maciej wrote:
"I'm not sure what you are getting at. a[1] and a[1.000] refer to the
same property in ECMAScript, but a[1m] and a[1.000m] would not. Are
you saying this isn't a problem?"
This is not quite true as you can see here:
var a = [];
a[1] = "foo";
a[1.00] = "bar";
WScript.Echo("length: "+a.length + "\n["+ a.join()+"]")
length: 2
[,bar]
Firefox (3.0.2) does the same
On Sep 24, 2008, at 8:41 AM, Michael wrote:
Maciej wrote: “I'm not sure what you are getting at. a[1] and a[1.000] refer to the same property in ECMAScript, but a[1m] and a[1.000m] would not. Are you saying this isn't a problem?”
This is not quite true as you can see here:
var a = []; a[1] = "foo"; a[1.00] = "bar";
WScript.Echo("length: "+a.length + "\n["+ a.join()+"]")
length: 2 [,bar]
Firefox (3.0.2) does the same
It seems to me that your test case proves that 1 and 1.00 refer to the
same property, as I described. (The reason the array is length 2 is
that there is an implicit unset 0 property.)
, Maciej
On Sep 24, 2008, at 8:28 AM, Sam Ruby wrote:
My apologies. That wasn't the question I was intending.
Can you identify code that today depends on numeric binary 64
floating point which makes use of operations such as unrounded division and depends on trailing zeros being truncated to compute array indexes?I would think that such code would be more affected by factors
such as the increased precision and the fact that 1.2-1.1 produces 0.09999999999999987 than on the presence or absence of any trailing zeros.But given the continued use of words such as "broken" and
"unusable", I'm wondering if I'm missing something obvious.
The only thing that might be called obvious here is the invariant
(modulo teeny tiny numbers) that people have pointed out (a === b => o
[a] is o[b]). Such JS invariants tend to be required by web content.
Postel's Law means you accept everything that flies in ES1, and have
trouble being less liberal in ES2+. Web authors crawl the feature
vector space and find all the edges, so at least what you did not
accept in v1 becomes law. But these are generalizations from
experience with invariants such as typeof x == "object" && !x => x
=== null and typeof x == typeof y => (x == y <=> x === y).
Beyond this conservatism in breaking invariants based on experience,
it turns out that % and / results do flow into array indexes. From
SunSpider's 3d-raytrace.js (which came from some other benchmark
suite, IIRC):
// Triangle intersection using barycentric coord method function Triangle(p1, p2, p3) { var edge1 = sub(p3, p1); var edge2 = sub(p2, p1); var normal = cross(edge1, edge2); if (Math.abs(normal[0]) > Math.abs(normal[1])) if (Math.abs(normal[0]) > Math.abs(normal[2])) this.axis = 0; else this.axis = 2; else if (Math.abs(normal[1]) > Math.abs(normal[2])) this.axis = 1; else this.axis = 2; var u = (this.axis + 1) % 3; var v = (this.axis + 2) % 3; var u1 = edge1[u]; var v1 = edge1[v]; . . . }
Triangle.prototype.intersect = function(orig, dir, near, far) { var u = (this.axis + 1) % 3; var v = (this.axis + 2) % 3; var d = dir[this.axis] + this.nu * dir[u] + this.nv * dir[v]; . . . }
but the operands of % are integers here. So long as decimal and
double don't change the results from being integral, these use-cases
should be ok (if slower).
The concern remains, though. Not due to power-of-five problems that
would lead to 0.09999999999999987 or the like, but from cases where a
number was spelled with extra trailing zeros (in external data, e.g.
a spreadsheet) but fits in an integer, or otherwise can be expressed
exactly using powers of two. The burden of proof here is on the
invariant breaker :-/.
On Sep 24, 2008, at 8:28 AM, Sam Ruby wrote:
My apologies. That wasn't the question I was intending.
Can you identify code that today depends on numeric binary 64
floating point which makes use of operations such as unrounded division and depends on trailing zeros being truncated to compute array indexes?I would think that such code would be more affected by factors
such as the increased precision and the fact that 1.2-1.1 produces 0.09999999999999987 than on the presence or absence of any trailing zeros.
I don't see how the result of 1.2 - 1.1 would affect array indexing.
Fractional values are not used as array indices.
But given the continued use of words such as "broken" and
"unusable", I'm wondering if I'm missing something obvious.
In ECMAScript all property names are nominally strings. The generic
property access syntax is also used for array indexing. The official
spec-level explanation of this is that the subscript inside the square
brackets is converted toString, and then looked up as any normal
string-named property. In practice implementations optimize the
storage of and access to such properties so there is no actual
conversion between integers and strings going on.
Given this, it seems pretty clear to me that the sanity of ECMAScript
array indexing depends on the fact that a number equal to an integer
will stringify as that integer.
In other languages, array indexing converts to integer rather than
nominally to string, so this issue does not arise.
I decline to accept the burden of proof on this; those who wish to
alter the behavior of ECMAScript array indexing should be the ones
presenting research on existing code. Though I will grant that the
burden of proof here is smaller than for truly compatibility-breaking
changes, since existing code will not be directly affected.
I would think it is pretty likely there is code out there that does
things like multiply by 0.5 (faster than dividing by 2, right?) and
use the result as an array index. But maybe you can show otherwise.
, Maciej
On Sep 24, 2008, at 8:41 AM, Michael wrote:
Maciej wrote:
“I'm not sure what you are getting at. a[1] and a[1.000] refer to the same property in ECMAScript, but a[1m] and a[1.000m] would not. Are
you saying this isn't a problem?”
This is not quite true as you can see here:
var a = [];
a[1] = "foo";
a[1.00] = "bar";
WScript.Echo("length: "+a.length + "\n["+ a.join()+"]")
length: 2
[,bar]
This shows that a[1] and a[1.00] refer to the same property as Maciej
said (a[1.000] also refers to a[1]).
What "This" did you mean "is not quite true"?
The issue is not how a value is spelled using a literal. It's that
decimal as proposed remembers its "scale" based on its spelling, and
scale affects toString() result. That's different from the case with
number (double) in JS today.
Maciej wrote: "I'm not sure what you are getting at. a[1] and a[1.000] refer to the same property in ECMAScript, but a[1m] and a[1.000m] would not. Are you saying this isn't a problem?"
I would say this is indeed a problem. I'm of the opinion that decimal128 and binary64 should behave identically in as many areas as possible, and that since cohorts are strictly equal to each other, roundtripping does not need to preserve scale, only cohort. Breaking the stringification equality of strictly equal numbers in one representation is a much bigger issue than roundtripping preserving scale. A .toScaleString or something like it should be added for the case of needing a roundtrip where the scale is preserved.
2008/9/24 Michael <Michael at lanex.com>:
This is not quite true as you can see here:
var a = []; a[1] = "foo"; a[1.00] = "bar";
WScript.Echo("length: "+a.length + "\n["+ a.join()+"]")
length: 2 [,bar]
And the length property has exactly nothing to do with what has been discussed here.
and in particular they don't call it on the index in an array indexing operation.
This is true. But that in itself is not the problem. Currently, should a programmer write:
a[1]="first" a[1.000]="second"
it's assumed that the second case was an accidental typo and they really did not mean to type the extra '.000'. The problem occurs at that point, on the conversion from a decimal (ASCII/Unicode/whatever) string in the program to an internal representation. When the internal representation cannot preserve the distinction (as with binary doubles) there's not much that can be done about it. But a decimal internal representation can preserve the distinction, and so it should - 1m and 1.000m differ in the same was a "1" and "1.000". They are distinguishable, but when interpreted as a number, they are considered equal.
I'm not sure what you are getting at. a[1] and a[1.000] refer to the same property in ECMAScript, but a[1m] and a[1.000m] would not. Are you saying this isn't a problem?
Absolutely not a problem ... many languages (and ES itself) which index 'arrays' by strings treat the index "1.000" as different from "1", and this is not considered a problem. It's desirable, in fact, when the index is (for example) sections in a document: "3.20" and "3.2" are not the same section.
If the programmer's model is an array indexed by integers, they will use 1, 2, etc. and will never use 1.000. It all works.
I would agree with Waldermar that it is a serious problem. Not so much for literals as for values that end up with varying numbers of trailing zeroes depending on how they were computed, even though they are numerically the same. Certainly it seems it would make arrays unusable for someone trying to use decimal numbers only.
All I can say is that is has never been a problem in languages such as Rexx and EXEC 2 which have had exactly that behaviour for almost 30 years. I do not recall a single problem report about that behavior.
This is, no doubt, because if one is treating array indexes as a set of integers you use integer operations on those indexes (almost exclusively +, -, and *). If one does use a divide, it would be carefully chosen to produce an integer result; anything which produced a result without an exponent of 0 would always be more likely to give a non-zero fraction that .0, .00, .000, etc. -- and those non-zero ones would fail rapidly.
Mike
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
On Sep 24, 2008, at 9:01 AM, Mike Cowlishaw wrote:
I'm not sure what you are getting at. a[1] and a[1.000] refer to the same property in ECMAScript, but a[1m] and a[1.000m] would not. Are you saying this isn't a problem?
Absolutely not a problem ... many languages (and ES itself) which
index 'arrays' by strings treat the index "1.000" as different from "1", and this is not considered a problem.
But they do not treat 1.000 as an index differently from 1. Explicit
string indexes, whether literally expressed or computed, are not the
issue here.
This is, no doubt, because if one is treating array indexes as a
set of integers you use integer operations on those indexes (almost
exclusively +, -, and *). If one does use a divide,
Maciej pointed out reciprocal multiplication as strength-reduced
division; this is done often enough in graphics and other low-level
code.
it would be carefully chosen to produce an integer result;
But what if scale is preserved?
anything which produced a result without an exponent of 0 would always be more likely to give a non-zero
fraction that .0, .00, .000, etc. -- and those non-zero ones would fail rapidly.
Sorry, I didn't follow this ("that" should be "than"?).
Brendan Eich wrote:
On Sep 24, 2008, at 8:28 AM, Sam Ruby wrote:
My apologies. That wasn't the question I was intending.
Can you identify code that today depends on numeric binary 64 floating point which makes use of operations such as unrounded division and depends on trailing zeros being truncated to compute array indexes?
I would think that such code would be more affected by factors such as the increased precision and the fact that 1.2-1.1 produces 0.09999999999999987 than on the presence or absence of any trailing zeros.
But given the continued use of words such as "broken" and "unusable", I'm wondering if I'm missing something obvious.
The only thing that might be called obvious here is the invariant (modulo teeny tiny numbers) that people have pointed out (a === b => o[a] is o[b]). Such JS invariants tend to be required by web content. Postel's Law means you accept everything that flies in ES1, and have trouble being less liberal in ES2+. Web authors crawl the feature vector space and find all the edges, so at least what you did not accept in v1 becomes law. But these are generalizations from experience with invariants such as typeof x == "object" && !x => x === null and typeof x == typeof y => (x == y <=> x === y).
Beyond this conservatism in breaking invariants based on experience, it turns out that % and / results do flow into array indexes. From SunSpider's 3d-raytrace.js (which came from some other benchmark suite, IIRC):
// Triangle intersection using barycentric coord method function Triangle(p1, p2, p3) { var edge1 = sub(p3, p1); var edge2 = sub(p2, p1); var normal = cross(edge1, edge2); if (Math.abs(normal[0]) > Math.abs(normal[1])) if (Math.abs(normal[0]) > Math.abs(normal[2])) this.axis = 0; else this.axis = 2; else if (Math.abs(normal[1]) > Math.abs(normal[2])) this.axis = 1; else this.axis = 2; var u = (this.axis + 1) % 3; var v = (this.axis + 2) % 3; var u1 = edge1[u]; var v1 = edge1[v]; . . . }
Triangle.prototype.intersect = function(orig, dir, near, far) { var u = (this.axis + 1) % 3; var v = (this.axis + 2) % 3; var d = dir[this.axis] + this.nu * dir[u] + this.nv * dir[v]; . . . }
but the operands of % are integers here. So long as decimal and double don't change the results from being integral, these use-cases should be ok (if slower).
The concern remains, though. Not due to power-of-five problems that would lead to 0.09999999999999987 or the like, but from cases where a number was spelled with extra trailing zeros (in external data, e.g. a spreadsheet) but fits in an integer, or otherwise can be expressed exactly using powers of two. The burden of proof here is on the invariant breaker :-/.
I fully appreciate the need for a high bar here.
The problem here is that there are two invariants. === to a high degree of accuracy today is an eq operator. But people have argued against 1.2 !== 1.20, because of another invariant:
a == b && typeof(a) === typeof(b) implies a === b
We can't satisfy both. My initial preference was that 1.2 !== 1.20, but as we are not aware of code that uses fractional indexes, but are aware of code that does generic typeof and equality testing, I would think that the latter would have a higher weight.
/be
- Sam Ruby
On Sep 24, 2008, at 9:01 AM, Mike Cowlishaw wrote:
I would agree with Waldermar that it is a serious problem. Not so
much for literals as for values that end up with varying numbers of trailing zeroes depending on how they were computed, even though they are numerically the same. Certainly it seems it would make arrays unusable for someone trying to use decimal numbers only.All I can say is that is has never been a problem in languages such as Rexx and EXEC 2 which have had exactly that behaviour for almost 30
years. I do not recall a single problem report about that behavior.
I'm not familiar with Rexx, but my cursory reading of the Wikipedia
article on it seems to indicate Rexx does not have arrays, but rather
"compound variables" which have a different behavior and syntax. It
seems to say that stem.1.0 implies simulation of a multidimentional
array, not an array index with more precision (and similarly stem.
1.0.0). It also seems that stem..1 would not refer to the same
location as stem.0.1 (perhaps it is a syntax error?).
I'm sure you know way more about Rexx than I do, but from basic
research it seems that its array-like feature does not look or act
like ECMAScript arrays, so I am not sure the experience is relevant. I
think most people would expect behavior more like C or Java or Perl or
Python arrays than like Rexx compound variables.
, Maciej
On Sep 24, 2008, at 9:17 AM, Brendan Eich wrote:
On Sep 24, 2008, at 9:01 AM, Mike Cowlishaw wrote:
Absolutely not a problem ... many languages (and ES itself) which index 'arrays' by strings treat the index "1.000" as different from "1",
and this is not considered a problem.But they do not treat 1.000 as an index differently from 1. Explicit string indexes, whether literally expressed or computed, are not the issue here.
I should have written "explicitly string-typed indexes", or perhaps
"intentionally string-type indexes".
So far, intentions aside, people writing JS can and arguably do count
on any integer result of an index computation being equivalent to
that index expressed literally as the === integer constant with no
trailing zeroes after the decimal point.
This a === b => o[a] is o[b] invariant (ignore the tiny number
exceptions; I agree they're undesirable spec bugs) is what folks on
the list are concerned about breaking, for integral values.
Fractional index values and strings consisting of numeric literals
with and without trailing zeroes are different use-cases, not of
concern.
On Sep 24, 2008, at 9:18 AM, Sam Ruby wrote:
[over-top-citing trimmed] The concern remains, though. Not due to power-of-five problems that would lead to 0.09999999999999987 or the like, but from cases where a number was spelled with extra trailing zeros (in external data,
e.g. a spreadsheet) but fits in an integer, or otherwise can be expressed exactly using powers of two. The burden of proof here is on the invariant breaker :-/.I fully appreciate the need for a high bar here.
Sure, so maybe that bar height is why people talk about "broken".
It's a presumption of "guilt", alien to Anglo-Saxon jurisprudence.
The Napoleonic ECMAScript code prevails here. Beware! :-P.
The problem here is that there are two invariants. === to a high
degree of accuracy today is an eq operator. But people have argued
against 1.2 !== 1.20, because of another invariant:a == b && typeof(a) === typeof(b) implies a === b
We can't satisfy both.
I do not think === is eq -- it's hard to argue degree when kind is
the issue, as others have pointed out. Any hashcode addition would
want eq, or force people to invent it at tedious and ineffecient length.
Past posts here have mixed up cohort and -0 vs. 0, but we've been
educated thanks to MFC. Really, === breaks down on NaN and the
zeroes, and there's no way to rationalize it as eq. I noted how Guy
Steele helped us get past one broken equality operator, in order to
add === and !== in ES1, and we talked about eq then. It still looms
in the future, Harmony or (some argue) 3.1.
My initial preference was that 1.2 !== 1.20, but as we are not aware of code that uses fractional indexes, but are
aware of code that does generic typeof and equality testing, I would think that the latter would have a higher weight.
Ignoring === as faux eq, the only issue here is d.toString() for
decimal d: should it preserve scale and stringify trailing zeroes and
funny exponents?
Brendan Eich wrote:
This a === b => o[a] is o[b] invariant (ignore the tiny number
exceptions; I agree they're undesirable spec bugs) is what folks on
the list are concerned about breaking, for integral values.
Fractional index values and strings consisting of numeric literals
with and without trailing zeroes are different use-cases, not of
concern.
This is most helpful. It would suggest that 1.20 is not a significant concern, but 1e+2 is a potential concern. (I'd also suggest that values with an absolute value less than 2**53 are not a concern)
Short of numeric literals explicitly expressed in such a manner, multiplication won't tend to produce such values, but division by values such as 0.1 may.
My intuition continues to be that such occurrences would exceedingly be rare. Particularly given the use case of array indexes.
/be
- Sam Ruby
This is, no doubt, because if one is treating array indexes as a set of integers you use integer operations on those indexes (almost exclusively +, -, and *).
Mike;
The claim that using only integers and only multiplication will save array indexing is false. Under your current proposal exactly the same problems occur even if you use exclusively integers. For example,
a[10m10m10m10m10m*10m]
and
a[1e6m]
refer to different array elements under your proposal.
Sorry about that. For some reason I misread the post as stating that they did not refer to the same property (Hence showing an example that in fact they did)
From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Brendan Eich Sent: Wednesday, September 24, 2008 10:56 AM To: Michael Cc: es-discuss at mozilla.org Subject: Re: ES Decimal Status
On Sep 24, 2008, at 8:41 AM, Michael wrote:
Maciej wrote:
"I'm not sure what you are getting at. a[1] and a[1.000] refer to the
same property in ECMAScript, but a[1m] and a[1.000m] would not. Are
you saying this isn't a problem?"
This is not quite true as you can see here:
var a = [];
a[1] = "foo";
a[1.00] = "bar";
WScript.Echo("length: "+a.length + "\n["+ a.join()+"]")
length: 2
[,bar]
This shows that a[1] and a[1.00] refer to the same property as Maciej said (a[1.000] also refers to a[1]).
What "This" did you mean "is not quite true"?
The issue is not how a value is spelled using a literal. It's that decimal as proposed remembers its "scale" based on its spelling, and scale affects toString() result. That's different from the case with number (double) in JS today.
2008/9/24 Maciej Stachowiak <mjs at apple.com>:
On Sep 24, 2008, at 8:41 AM, Michael wrote:
Maciej wrote:
"I'm not sure what you are getting at. a[1] and a[1.000] refer to the
same property in ECMAScript, but a[1m] and a[1.000m] would not. Are you saying this isn't a problem?"
This is not quite true as you can see here:
var a = []; a[1] = "foo"; a[1.00] = "bar";
WScript.Echo("length: "+a.length + "\n["+ a.join()+"]")
length: 2 [,bar]
Firefox (3.0.2) does the same
It seems to me that your test case proves that 1 and 1.00 refer to the same property, as I described. (The reason the array is length 2 is that there is an implicit unset 0 property.)
The reason a.length is 2 is because of the way [[Put]] works on Array.
If ToUint32(P) is >= the value of the Array's length property, the
length property gets set to ToUnit32(P) + 1.
There's no "implicit unset 0 property".
This is just a diversion from the real issue, which (if I understand right) is how decimals get converted -- what Decimal.prototype.toString does.
Garrett
[Was on airplanes since my last post, arrived late in SeaTac ... will try and consoliate replies on this thread to one e-mail :-)]
I'm not sure what you are getting at. a[1] and a[1.000] refer to the same property in ECMAScript, but a[1m] and a[1.000m] would not. Are you saying this isn't a problem?
Absolutely not a problem ... many languages (and ES itself) which index 'arrays' by strings treat the index "1.000" as different from "1", and this is not considered a problem.
But they do not treat 1.000 as an index differently from 1. Explicit string indexes, whether literally expressed or computed, are not the issue here. [and later:] I should have written "explicitly string-typed indexes", or perhaps "intentionally string-type indexes".
So far, intentions aside, people writing JS can and arguably do count on any integer result of an index computation being equivalent to that index expressed literally as the === integer constant with no trailing zeroes after the decimal point.
This a === b => o[a] is o[b] invariant (ignore the tiny number exceptions; I agree they're undesirable spec bugs) is what folks on the list are concerned about breaking, for integral values. Fractional index values and strings consisting of numeric literals with and without trailing zeroes are different use-cases, not of concern.
OK, and also liorean says:
I'm of the opinion that decimal128 and binary64 should behave
identically in as many areas as possible.
That's a valid model. I suppose I see strings and decimals as being 'closer' in concept, and in both "what you see is what you get". But for arrays, I see the problem. In that case 'reduce to shortest form, that is strip trailing zeros, might be the right thing to do for decimals used as array indices. That function is in Sam's implementation (it's called 'reduce').
This is, no doubt, because if one is treating array indexes as a set of integers you use integer operations on those indexes (almost exclusively +, -, and *). If one does use a divide,
Maciej pointed out reciprocal multiplication as strength-reduced division; this is done often enough in graphics and other low-level code.
Hmm, his example was dividing an integer by 2. Why would one multiply by 0.5 for that when the compiler would convert to a shift?
it would be carefully chosen to produce an integer result;
But what if scale is preserved?
My use of terminology without defining it was sloppy, sorry. In the decimal context, I mean by 'integer' a decimal float whose exponent is 0. (That is, the significand is the integer.) With the unnormalized decimal float representation, that set of floats is the decimal integer 'type' (a separate integer type, binary or decimal, is unnecessary). Hence the scale being preserved is 0, which is what one wants.
anything which produced a result without an exponent of 0 would always be more likely to give a non-zero fraction that .0, .00, .000, etc. -- and those non-zero ones would fail rapidly.
Sorry, I didn't follow this ("that" should be "than"?).
Yes, 'than'. Was trying to answer in a 20-minute slot at airport -- I should have waited ...
All I can say is that is has never been a problem in languages such as Rexx and EXEC 2 which have had exactly that behaviour for almost 30 years. I do not recall a single problem report about that behavior.
I'm not familiar with Rexx, but my cursory reading of the Wikipedia article on it seems to indicate Rexx does not have arrays, but rather "compound variables" which have a different behavior and syntax. It seems to say that stem.1.0 implies simulation of a multidimentional array, not an array index with more precision (and similarly stem. 1.0.0). It also seems that stem..1 would not refer to the same location as stem.0.1 (perhaps it is a syntax error?).
They are essentially string-indexed arrays/vectors. The syntax is a bit odd (square brackets were not available on all keyboards at the time) but stem.foo.bar is effectively stem[foo, bar] where foo and bar are any strings (and that syntax is allowed/used in NetRexx).
I'm sure you know way more about Rexx than I do, but from basic research it seems that its array-like feature does not look or act like ECMAScript arrays, so I am not sure the experience is relevant. I think most people would expect behavior more like C or Java or Perl or Python arrays than like Rexx compound variables.
I think the ES arrays are very similar -- but I agree, when only integer values are used, the arrays look like C arrays to many people (more on this below in response to another comment).
Waldemar wrote:
This is, no doubt, because if one is treating array indexes as a set
of
integers you use integer operations on those indexes (almost
exclusively
+, -, and *).
Mike;
The claim that using only integers and only multiplication will save
array
indexing is false. Under your current proposal exactly the same
problems
occur even if you use exclusively integers. For example,
a[10m10m10m10m10m*10m]
and
a[1e6m]
refer to different array elements under your proposal.
The problem here was my ill-defined use of 'integer', sorry. 1e+6m is not an 'integer' in the sense that I meant (although it has the same value as one). If one uses only values of the form iiiiiim (where i is a digit, not dot or 'e'), then multiplying them, adding them, subtracting them, or using divideInteger on them will always result in an integer with exponent 0 -- a value of the same form. i.e., one can carry out conventional 'integer arithmetic' safely, and the results will be integers. (divideInteger is an explicit integer-divide, but it can also be effected by a regular divide and a quantize with appropriate rounding).
The result will also be an 'integer' for regular division of j/k if k is an integer and j is a multiple of k (as would be the case when calculating array indices).
Brendan summed up:
Ignoring === as faux eq, the only issue here is d.toString() for decimal
d: should it preserve scale and stringify trailing zeroes and funny exponents?
Are there any other cases like array indices where toString of a number is used in a way such that "1.000" is materially different than "1"? Certainly toString could reduce, and there could be a differently-spelled operation to produce the 'nice' string, but the principle that toString shows you exactly what you have got is a good one. (And it would be goodness for ES to behave in the same way as other languages' toString for decimals, too.)
In particular, when dealing with currency values, the number of decimal places does correspond to the quantum (e.g., whether the value is cents, mils, etc., and similarly for positive exponents, it indicates that one is dealing in (say) $millions).
If the 'business logic' calculates a value rounded to the nearest cent then the default toString will display that correctly without any formatting being necessary (and if formatting were applied then if the business logic were later changed to round to three places, the display logic would still round to two places and hence give an incorrect result). In short: the act of converting to a string, for display, inclusion in a web page, etc., should not obscure the underlying data. If there's some path in the logic that forgot to quantize, for example, one wants to see that ASAP, not have it hidden by display formatting.
Mike
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
On Thu, Sep 25, 2008 at 10:24 AM, Mike Cowlishaw <MFC at uk.ibm.com> wrote:
OK, and also liorean says:
I'm of the opinion that decimal128 and binary64 should behave identically in as many areas as possible.
That's a valid model. I suppose I see strings and decimals as being 'closer' in concept, and in both "what you see is what you get". But for arrays, I see the problem. In that case 'reduce to shortest form, that is strip trailing zeros, might be the right thing to do for decimals used as array indices. That function is in Sam's implementation (it's called 'reduce').
Reduce is subtly different. Decimal.reduce(1000m) produces 1e+3m. I believe that what is desired is that foo[1e+3m] be the same slot as foo["1000"]. But as Brendan clarified yesterday, I think that's only necessary for decimal values which happen to be integers which contain 16 digits or less (16 digits integers being the upper bound for integers which can be exactly stored using a binary64 floating point representation).
Brendan summed up:
Ignoring === as faux eq, the only issue here is d.toString() for decimal d: should it preserve scale and stringify trailing zeroes and funny exponents?
Are there any other cases like array indices where toString of a number is used in a way such that "1.000" is materially different than "1"? Certainly toString could reduce, and there could be a differently-spelled operation to produce the 'nice' string, but the principle that toString shows you exactly what you have got is a good one. (And it would be goodness for ES to behave in the same way as other languages' toString for decimals, too.)
In particular, when dealing with currency values, the number of decimal places does correspond to the quantum (e.g., whether the value is cents, mils, etc., and similarly for positive exponents, it indicates that one is dealing in (say) $millions).
If the 'business logic' calculates a value rounded to the nearest cent then the default toString will display that correctly without any formatting being necessary (and if formatting were applied then if the business logic were later changed to round to three places, the display logic would still round to two places and hence give an incorrect result). In short: the act of converting to a string, for display, inclusion in a web page, etc., should not obscure the underlying data. If there's some path in the logic that forgot to quantize, for example, one wants to see that ASAP, not have it hidden by display formatting.
The issue is that ToString is the basis for both toString (the method) and the way that operations such as array's work. My intuition is that business logic also rarely requires scientific notation for small integers.
I believe that we have a solution that everybody might not find ideal, but hopefully can live with, which I will outline below with examples:
typeof(1m) === "decimal" 1.1m === 1.10m (1.10m).toString() === "1.10" (1e+3m).toString() === "1000"
Additionally, there will be another method exposed, say toSciString, which will produce a value which will round trip correctly, using scientific notation when necessary.
Mike
- Sam Ruby
I believe that we have a solution that everybody might not find ideal, but hopefully can live with, which I will outline below with examples:
typeof(1m) === "decimal" 1.1m === 1.10m (1.10m).toString() === "1.10" (1e+3m).toString() === "1000"
Additionally, there will be another method exposed, say toSciString, which will produce a value which will round trip correctly, using scientific notation when necessary.
This doesn't satisfy any of the criteria we want. It now breaks both arrays and round-tripping.
Waldemar
Preface:
Description of ES 3.1m decimal support
An implementation you can reference
Open questions
Decimal literals. I believe that they would be a big win at almost a negligible cost. But I fully recognize that no backwards incompatible syntax change has been introduced into the language to date.
Selection of which named methods to include. Clearly functions like quantum and compareTotal are needed. Not so clear are functions that duplicate infix and prefix operators, such as add and plus. I've heard arguments on both sides, and don't have a strong opinion on this subject myself. After we verify consensus on the broader approach described in this email, I can present a list of potential candidate methods, grouped into categories. We should be able to quickly sort through this list. This effort could be done either on the mailing list or in the F2F meeting in Redmond.
Whether the named methods are to be static or instance methods. I've heard arguments both ways, and could go either way on this. Frankly, the infix operators capture the 80% use case. Instance methods feel more OO, and some (like toString and valueOf) are required anyway. Static methods may be more consistent with Math.abs, and are asserted to be more suited to code generators and optimization, though I must admit that I never could quite follow this argument.
Should there be a “use decimal”? To me, this feels like something that could be added later. The fact that all 15 digit integers convert exactly, as well as a few common fractions such as .5 and .25, greatly reduces the need. The strategy of doing precise conversions also will tend to highlight when mixed operations occur, and will do so in a non-fatal way.
Approaches not selected:
Decimal as a library only. Reason: usability concerns and evolution of the language concerns. More specifically, the definition of the behaviors of operators like === and + need to be specified. Throwing exceptions would not merely be developer unfriendly, it would likely be perceived as causing existing libraries to break. And whatever was standardized would make latter support for such operators to be a breaking change – at the very least it would require an opt-in.
Attempting to round binary 64 values to the nearest decimal 128 value. Such approaches are fragile (e.g., 1.2-1.1) and tends to hide rather than reveal errors.
Decimal being either a “subclass” or “form” of number. Turned out to be too confusing, potentially breaking, and in general larger in scope than simply providing a separate type and wrapper class.
Type(3m) being “object”. Reason: the only false values for objects should be null, and it is highly desirable that 0.0m be considered false.
Methods naming based on Java's BigDecimal class. This was my original approach as it was initially felt that IEEE 754 was too low of level. This turned out not to be the case, and there are some conflicts (e.g. valueOf is a static method with a single argument on BigDecimal).
Having a separate context class or storing application state in Decimal class. The former is unnecessary namespace pollution in a language with a simple syntax for object literals, and the latter is against the policy of this working group.