G.2.1 Model of Floating Point Arithmetic
1
In the strict mode, the predefined operations of
a floating point type shall satisfy the accuracy requirements specified
here and shall avoid or signal overflow in the situations described.
This behavior is presented in terms of a model of floating point arithmetic
that builds on the concept of the canonical form (see
A.5.3).
Static Semantics
2
Associated with each floating point type is an infinite
set of model numbers. The model numbers of a type are used to define
the accuracy requirements that have to be satisfied by certain predefined
operations of the type; through certain attributes of the model numbers,
they are also used to explain the meaning of a user-declared floating
point type declaration. The model numbers of a derived type are those
of the parent type; the model numbers of a subtype are those of its type.
3
{model number}
The
model numbers of a floating point type
T are zero and all the values expressible in the canonical form (for
the type T), in which
mantissa has T'Model_Mantissa digits and
exponent has a value greater than or equal to T'Model_Emin. (These
attributes are defined in
G.2.2.)
3.a
Discussion: The model is capable of describing
the behavior of most existing hardware that has a mantissa-exponent representation.
As applied to a type T, it is parameterized by the values of T'Machine_Radix,
T'Model_Mantissa, T'Model_Emin, T'Safe_First, and T'Safe_Last. The values
of these attributes are determined by how, and how well, the hardware
behaves. They in turn determine the set of model numbers and the safe
range of the type, which figure in the accuracy and range (overflow avoidance)
requirements.
3.b
In hardware that is free of arithmetic anomalies,
T'Model_Mantissa, T'Model_Emin, T'Safe_First, and T'Safe_Last will yield
the same values as T'Machine_Mantissa, T'Machine_Emin, T'Base'First,
and T'Base'Last, respectively, and the model numbers in the safe range
of the type T will coincide with the machine numbers of the type T. In
less perfect hardware, it is not possible for the model-oriented attributes
to have these optimal values, since the hardware, by definition, and
therefore the implementation, cannot conform to the stringencies of the
resulting model; in this case, the values yielded by the model-oriented
parameters have to be made more conservative (i.e., have to be penalized),
with the result that the model numbers are more widely separated than
the machine numbers, and the safe range is a subrange of the base range.
The implementation will then be able to conform to the requirements of
the weaker model defined by the sparser set of model numbers and the
smaller safe range.
4
{model interval}
A
model interval of a floating point type
is any interval whose bounds are model numbers of the type.
{model
interval (associated with a value)} The
model interval of a type T
associated with a value v
is the smallest model interval of T that includes
v. (The model
interval associated with a model number of a type consists of that number
only.)
Implementation Requirements
5
The accuracy requirements for the evaluation of certain
predefined operations of floating point types are as follows.
5.a
Discussion: This subclause does not cover
the accuracy of an operation of a static expression; such operations
have to be evaluated exactly (see
4.9). It
also does not cover the accuracy of the predefined attributes of a floating
point subtype that yield a value of the type; such operations also yield
exact results (see
3.5.8 and
A.5.3).
6
{operand interval}
An
operand interval is the model interval,
of the type specified for the operand of an operation, associated with
the value of the operand.
7
For any predefined
arithmetic operation that yields a result of a floating point type T,
the required bounds on the result are given by a model interval of T
(called the result interval) defined in terms of the operand values
as follows:
8
- {result
interval (for the evaluation of a predefined arithmetic operation)}
The result interval is the smallest model interval
of T that includes the minimum and the maximum of all the values obtained
by applying the (exact) mathematical operation to values arbitrarily
selected from the respective operand intervals.
9
The result interval of an exponentiation is obtained
by applying the above rule to the sequence of multiplications defined
by the exponent, assuming arbitrary association of the factors, and to
the final division in the case of a negative exponent.
10
The result interval of a conversion of a numeric
value to a floating point type T is the model interval of T associated
with the operand value, except when the source expression is of a fixed
point type with a small that is not a power of T'Machine_Radix
or is a fixed point multiplication or division either of whose operands
has a small that is not a power of T'Machine_Radix; in these cases,
the result interval is implementation defined.
10.a
Implementation defined: The result interval
in certain cases of fixed-to-float conversion.
11
{Overflow_Check
[partial]} {check,
language-defined (Overflow_Check)} For
any of the foregoing operations, the implementation shall deliver a value
that belongs to the result interval when both bounds of the result interval
are in the safe range of the result type T, as determined by the values
of T'Safe_First and T'Safe_Last; otherwise,
12
- {Constraint_Error
(raised by failure of run-time check)} if
T'Machine_Overflows is True, the implementation shall either deliver
a value that belongs to the result interval or raise Constraint_Error;
13
- if T'Machine_Overflows is False, the
result is implementation defined.
13.a
Implementation defined: The result of
a floating point arithmetic operation in overflow situations, when the
Machine_Overflows attribute of the result type is False.
14
For any predefined relation on operands of a floating
point type T, the implementation may deliver any value (i.e., either
True or False) obtained by applying the (exact) mathematical comparison
to values arbitrarily chosen from the respective operand intervals.
15
The result of a membership test is defined in terms
of comparisons of the operand value with the lower and upper bounds of
the given range or type mark (the usual rules apply to these comparisons).
Implementation Permissions
16
If the underlying floating point hardware implements
division as multiplication by a reciprocal, the result interval for division
(and exponentiation by a negative exponent) is implementation defined.
16.a
Implementation defined: The result interval
for division (or exponentiation by a negative exponent), when the floating
point hardware implements division as multiplication by a reciprocal.
Wording Changes from Ada 83
16.b
The Ada 95 model numbers of a floating point
type that are in the safe range of the type are comparable to the Ada
83 safe numbers of the type. There is no analog of the Ada 83 model numbers.
The Ada 95 model numbers, when not restricted to the safe range, are
an infinite set.
Inconsistencies With Ada 83
16.c
{
inconsistencies with Ada 83}
Giving
the model numbers the hardware radix, instead of always a radix of two,
allows (in conjunction with other changes) some borderline declared types
to be represented with less precision than in Ada 83 (i.e., with single
precision, whereas Ada 83 would have used double precision). Because
the lower precision satisfies the requirements of the model (and did
so in Ada 83 as well), this change is viewed as a desirable correction
of an anomaly, rather than a worrisome inconsistency. (Of course, the
wider representation chosen in Ada 83 also remains eligible for selection
in Ada 95.)
16.d
As an example of this phenomenon, assume that
Float is represented in single precision and that a double precision
type is also available. Also assume hexadecimal hardware with clean properties,
for example certain IBM hardware. Then,
16.e
type T is digits Float'Digits range -Float'Last .. Float'Last;
16.f
results in T being represented in double precision
in Ada 83 and in single precision in Ada 95. The latter is intuitively
correct; the former is counterintuitive. The reason why the double precision
type is used in Ada 83 is that Float has model and safe numbers (in Ada
83) with 21 binary digits in their mantissas, as is required to model
the hypothesized hexadecimal hardware using a binary radix; thus Float'Last,
which is not a model number, is slightly outside the range of safe numbers
of the single precision type, making that type ineligible for selection
as the representation of T even though it provides adequate precision.
In Ada 95, Float'Last (the same value as before) is a model number and
is in the safe range of Float on the hypothesized hardware, making Float
eligible for the representation of T.
Extensions to Ada 83
16.g
{
extensions to Ada 83}
Giving
the model numbers the hardware radix allows for practical implementations
on decimal hardware.
Wording Changes from Ada 83
16.h
The wording of the model of floating point arithmetic
has been simplified to a large extent.