floating-pointdecimalada# Ada: Convert float to decimal

This post is linked to this one Ada 2005 access type. The goal is to use **Ada decimal type to get similar results as to hand (and calculator) computations in which 6 decimal places have been used in each intermediate step.**

As can be seen from the table below, the **values obtained with the Ada code starts to differ from the hand calculation in the last digit when further iterations with the Euler method are taken**.

One of the issues with the Ada code was with the line in the main code **diff.adb**: *return 2 * Real(X*Y)*; It doesn't matter if I leave it as *return 2 * X * Y* as well.

The differential equation (O.D.E.) is being solved using the basic Euler method (which is an approximate method which is not that accurate).
The D.E. is dy/dx = 2*x*y. The initial condition is at y0(x=x0=1) = 1. The analytical solution is y = e^((x^2)-1). The objective is to obtain y(x=1.5).

We start with the point (x0,y0) = (1,1). We use a step size h = 0.1 i.e. x is increased with each iteration in the Euler method to 1.1, 1.2, 1.3,..etc. and the corresponding value of y (the variable whose solution is being sought) is determined from the Euler algorithm which is:

y(n) = y(n-1) + h * f(x(n-1), y(n-1))

Here y(n-1) when we start the algorithm is y(0) = 1. Also x(n-1) is our starting x(0) = 1. The function f is the derivative function dy/dx given above as dy/dx = 2*x*y.

Briefly, h * f(x(n-1), y(n-1)) is the "horizontal distance between two successive x values" multiplied by the gradient. The gradient formula is dy/dx = delta y /delta x which gives delta y or (the change in y) as

delta y = delta x * dy/dx.

In the Euler formula h is the delta x and dy/dx is the gradient. So h * f(x(n-1), y(n-1)) gives delta y which is the change in the value of y i.e. delta y. This change in y is then added to the previous value of y. The Euler method is basically a first order Taylor approximation with a small change in x. A gradient line is drawn to the curve and the next value of the solution variable y is on this tangent line at the successive value of x i.e. xnew = xold + h where h is the step.

The table next shows the solution values for the variable y by the Euler method when calculated by hand (and calculator), by my Ada code and finally in the last column the exact solution.

x | y (hand) | Ada code | y (exact) |
---|---|---|---|

1.1 | 1.200000 | 1.200000 | 1.233678 |

1.2 | 1.464000 | 1.464000 | 1.552707 |

1.3 | 1.815360 | 1.815360 | 1.993716 |

1.4 | 2.287354 | 2.287353 | 2.611696 |

1.5 | 2.927813 | 2.927811 | 3.490343 |

By hand and calculator for instance, y(x=1.1) i.e y(1) at x = x(1) is calculated as y(x=1.1) = y(0) + h * f(x=1,y=1) = 1 + 0.1 * (2 * 1* 1) = 1.200000 to 6 d.p.

y(2) is calculated at x = x(2) as y(x=1.2) = y(1) + h * f(x=1.1,y=1.200000) = 1.200000 + 0.1 * (2 * 1.1* 1.200000) = 1.464000 to 6 d.p.

y(3) is calculated at x = x(3) as y(x=1.3) = y(2) + h * f(x=1.2,y=1.464000) = 1.464000 + 0.1 * (2 * 1.2* 1.464000) = 1.815360 to 6 d.p.

y(4) is calculated at x = x(4) as y(x=1.4) = y(3) + h * f(x=1.3,y=1.815360) = 1.815360 + 0.1 * (2 * 1.3* 1.815360) = 2.287354 to 6 d.p.

y(5) is calculated at x = x(5) as y(x=1.5) = y(4) + h * f(x=1.4,y=2.287354) = 2.287354 + 0.1 * (2 * 1.4* 2.287354) = 2.927813 to 6 d.p.

Now I want to modify the codes so that they work with a fixed number of decimal places which is 6 here after the decimal place.

The main code is **diff.adb**:

```
with Ada.Text_IO;
with Euler;
procedure Diff is
type Real is delta 0.000001 digits 9;
type Vector is array(Integer range <>) of Real;
type Ptr is access function (X: Real; Y: Real) return Real;
package Real_IO is new Ada.Text_IO.Decimal_IO(Num => Real);
use Real_IO;
procedure Solve is new Euler(Decimal_Type => Real, Vector_Type => Vector, Function_Ptr => Ptr);
function Maths_Func(X: Real; Y: Real) return Real is
begin
return 2 * Real(X*Y);
end Maths_Func;
Answer: Vector(1..6);
begin
Solve(F => Maths_Func'Access, Initial_Value => 1.0, Increment => 0.1, Result => Answer);
for N in Answer'Range loop
Put(1.0 + 0.1 * Real(N-1), Exp => 0);
Put( Answer(N), Exp => 0);
Ada.Text_IO.New_Line;
end loop;
end Diff;
```

Then comes **euler.ads**:

```
generic
type Decimal_Type is delta <> digits <>;
type Vector_Type is array(Integer range <>) of Decimal_Type;
type Function_Ptr is access function (X: Decimal_Type; Y: Decimal_Type) return Decimal_Type;
procedure Euler(
F: in Function_Ptr; Initial_Value, Increment: in Decimal_Type; Result: out Vector_Type);
```

and the package body **euler.adb**

```
procedure Euler
(F : in Function_Ptr; Initial_Value, Increment : in Decimal_Type; Result : out Vector_Type)
is
Step : constant Decimal_Type := Increment;
Current_X : Decimal_Type := 1.0;
begin
Result (Result'First) := Initial_Value;
for N in Result'First + 1 .. Result'Last loop
Result (N) := Result (N - 1) + Step * F(Current_X, Result (N - 1));
Current_X := Current_X + Step;
end loop;
end Euler;
```

On compilation, I get the messages pointing to **diff.adb**:

**type cannot be determined from context**

**explicit conversion to result type required**

for the line **return 2.0 times X times Y**;

Perhaps the **2.0** is causing the trouble here. How to convert this Float number to Decimal?

I believe that further down in **diff.adb**, I will get the same issue with the line:

```
Solve(F => Maths_Func'Access, Initial_Value => 1.0, Increment => 0.1, Result => Answer);
```

for it contains Floating point numbers as well.

The compilation was done on Windows with the 32-bit GNAT community edition of year 2011. Why 2011? This is because I like the IDE better for that year rather than the pale ones which come in the recent years.

**The revised codes based on trashgod codes which work are given next:**

The main file **diff.adb**

```
with Ada.Numerics.Generic_Elementary_Functions; use Ada.Numerics;
with Ada.Text_IO; use Ada.Text_IO;
with Euler;
procedure Diff is
type Real is digits 7;
type Vector is array (Positive range <>) of Real;
type Ptr is access function (X : Real; Y : Real) return Real;
type Round_Ptr is access function (V : Real) return Real;
procedure Solve is new Euler (Float_Type => Real, Vector => Vector, Function_Ptr => Ptr, Function_Round_Ptr => Round_Ptr);
package Real_Functions is new Generic_Elementary_Functions (Real);
use Real_Functions;
package Real_IO is new Ada.Text_IO.Float_IO (Real);
use Real_IO;
function DFDX (X, Y : Real) return Real is (2.0 * X * Y);
function F (X : Real) return Real is (Exp (X**2.0 - 1.0));
function Round (V : in Real) return Real is (Real'Rounding (1.0E6 * V) / 1.0E6);
XI : constant Real := 1.0;
YI : constant Real := 1.0;
Step : constant Real := 0.1;
Result : Vector (Positive'First .. 6); --11 if step = 0.05
X_Value : Real;
begin
Solve (DFDX'Access, Round'Access, XI, YI, Step, Result);
Put_line(" x calc exact delta");
for N in Result'Range loop
X_Value := 1.0 + Step * Real (N - 1);
Put (X_Value, Exp => 0);
Put (" ");
Put (Result (N), Exp => 0);
Put (" ");
Put (F (X_Value), Exp => 0);
Put (" ");
Put (Result (N) - F (X_Value), Exp => 0);
Ada.Text_IO.New_Line;
end loop;
end Diff;
```

The file **euler.ads**

```
generic
type Float_Type is digits <>;
type Vector is array (Positive range <>) of Float_Type;
type Function_Ptr is access function (X, Y : Float_Type) return Float_Type;
type Function_Round_Ptr is access function (V : Float_Type) return Float_Type;
procedure Euler
(DFDX : in Function_Ptr; Round : Function_Round_Ptr; XI, YI, Step : in Float_Type; Result : out Vector);
```

The file **euler.adb**

```
procedure Euler
(DFDX : in Function_Ptr; Round : Function_Round_Ptr; XI, YI, Step : in Float_Type; Result : out Vector)
is
H : constant Float_Type := Step;
X : Float_Type := XI;
begin
Result (Result'First) := YI;
for N in Result'First + 1 .. Result'Last loop
Result (N) := Round(Result (N - 1)) + Round(H * DFDX (X, Result (N - 1)));
X := X + Step;
end loop;
end Euler;
```

giving the output with **step h = 0.1 **

x | calc (Ada) | exact | delta |
---|---|---|---|

1.1 | 1.200000 | 1.233678 | 1.233678 |

1.2 | 1.464000 | 1.552707 | -0.033678 |

1.3 | 1.815360 | 1.993716 | -0.088707 |

1.4 | 2.287354 | 2.611696 | -0.178356 |

1.5 | 2.927813 | 3.490343 | -0.562530 |

The **calc (Ada)** results agree with **hand (and calculator)** computations.

Solution

First, `2.0`

is a real numeric literal. It has type `universal_real`

and may be implicitly converted to any real type, including your type `Real`

. So this problem has nothing to do with floating <-> decimal conversions.

Canonically, fixed-point types are represented by an integer; the equivalent value is the stored integer multiplied by the delta for the type.* So for a value `X`

there is an integer `X'`

such that `X = X' * Real'delta`

. Fixed point types have some issues with multiplication and division: an expression such as `X * Y`

is equivalent to `(X' * Real'delta) * (Y' * Real'delta) = X' * Y' * Real'delta ** 2`

, which has no obvious type. Thus all calls to '"*"' invoke the operation on type `universl_fixed`

yielding `universl_fixed`

and requiring either a context that defines the result type or an explicit conversion to the result type.

The multiplying operators are defined in ARM 4.5.5. Note Para 19.a/2, "we require an explicit conversion to be inserted in expressions like A * B * C if A, B, and C are each of some fixed point type." So you have to explicitly convert the result of one of the multiplications to Real; the context will then define the result type of the other multiplication.

But this canonical definition also allows a loophole for multiplication by integer values: `I * X`

, where `I`

is some integer type, is equivalent to `(I * X') * Real'delta`

which has type Real (provided `I * X'`

is in range). These are defined in Para 14. So for this case, your return expression could be `2 * X * Y`

, which should work. No explicit conversion should be required.

In the general case, you will require a conversion to the desired type: `Real (2.7 * X) * Y`

.

To get the same results from your code that you do by hand, they need to do the same thing. By hand, you're calculating a value to a large number of significant digits, then rounding it to 6 places. For your code to do this, you'd need something like

```
with Ada.Text_IO;
procedure Diff is
type Real is digits 9;
type Vector is array (Positive range <>) of Real;
package Real_IO is new Ada.Text_IO.Float_IO (Num => Real);
function F (X : Real; Y : Real) return Real is
(2.0 * X * Y);
function Round (V : in Real) return Real is
(Real'Rounding (1.0E6 * V) / 1.0E6);
Step : constant := 0.1;
Current_X : Real := 1.0;
Answer : Vector (1 .. 6);
begin -- Diff
Answer (1) := 1.0;
Solve : for N in 2 .. Answer'Last loop
Answer (N) := Round (Answer (N - 1) + Step * F (Current_X, Answer (N - 1) ) );
Current_X := Current_X + Step;
end loop Solve;
Print : for N in Answer'Range loop
Real_IO.Put (Item => 1.0 + 0.1 * Real (N - 1), Aft => 1, Exp => 0);
Real_IO.Put (Item => Answer (N), Aft => 8, Exp => 0);
Ada.Text_IO.New_Line;
end loop Print;
end Diff;
```

I've used `digits 9`

to allow for the extra significant digits presumably present in your hand calculations before rounding. I've output with `Aft => 8`

to show that the rounding works; you'd want to use `Aft => 6`

to avoid the extra zeroes.

This outputs

```
1.0 1.00000000
1.1 1.20000000
1.2 1.46400000
1.3 1.81536000
1.4 2.28735400
1.5 2.92781300
```

The lesson here is that decimal fixed-point is not a drop-in substitute for floating-point with rounding.

*Technically by the *small* for the type, but for decimal types this is equal to the delta.

- Instruction FYL2XP1
- Why can we convert float in binary by multiply 2?
- What does gcc's ffast-math actually do?
- In Python, can I round all printed floats with a single command?
- What kind of optimizations are included in -funsafe-math-optimizations?
- sin and cos give unexpected results for well-known angles
- bash printf broken for floats
- Clang float infinity divided by infinity produces different result with optimization flag
- Find which equally-sized square contains a given coordinate
- How to convert Decimal to Double in C#?
- Maximum float value in php
- Java Implementation of Math.exp()
- How to parse float with two decimal places in javascript?
- Python JSON serialize a Decimal object
- Float values behaving differently across the release and debug builds
- Rounding when ULP 1 - Java and C++
- ValueError: could not convert string to float: id
- Sort multiple NSArray at the same time - iOS
- Product of two exact represented doubles. Is exact?
- Purpose of struct.unpack
- Python round a float to nearest 0.05 or to multiple of another float
- Why fmt.Println formats float64 values with different number of decimal places differently?
- Is there a reliable way in JavaScript to obtain the number of decimal places of an arbitrary number?
- Extract fractional part of double *efficiently* in C
- How do I use floating-point arithmetic in bash?
- Is sys.float_info machine specific?
- Is floating point expression contraction allowed in C++?
- Double floating point operations using four IEEE rounding modes implemented in terms of round to nearest ties to even
- How to convert torch tensor to float?
- What is the maximum precision of the double and float datatypes in MySQL