floating-pointdecimalada

Ada: Convert float to decimal


This post is linked to this one Ada 2005 access type. The goal is to use Ada decimal type to get similar results as to hand (and calculator) computations in which 6 decimal places have been used in each intermediate step.

As can be seen from the table below, the values obtained with the Ada code starts to differ from the hand calculation in the last digit when further iterations with the Euler method are taken.

One of the issues with the Ada code was with the line in the main code diff.adb: return 2 * Real(XY)*; It doesn't matter if I leave it as return 2 * X * Y as well.

The differential equation (O.D.E.) is being solved using the basic Euler method (which is an approximate method which is not that accurate). The D.E. is dy/dx = 2xy. The initial condition is at y0(x=x0=1) = 1. The analytical solution is y = e^((x^2)-1). The objective is to obtain y(x=1.5).

We start with the point (x0,y0) = (1,1). We use a step size h = 0.1 i.e. x is increased with each iteration in the Euler method to 1.1, 1.2, 1.3,..etc. and the corresponding value of y (the variable whose solution is being sought) is determined from the Euler algorithm which is:

y(n) = y(n-1) + h * f(x(n-1), y(n-1))

Here y(n-1) when we start the algorithm is y(0) = 1. Also x(n-1) is our starting x(0) = 1. The function f is the derivative function dy/dx given above as dy/dx = 2xy.

Briefly, h * f(x(n-1), y(n-1)) is the "horizontal distance between two successive x values" multiplied by the gradient. The gradient formula is dy/dx = delta y /delta x which gives delta y or (the change in y) as

delta y = delta x * dy/dx.

In the Euler formula h is the delta x and dy/dx is the gradient. So h * f(x(n-1), y(n-1)) gives delta y which is the change in the value of y i.e. delta y. This change in y is then added to the previous value of y. The Euler method is basically a first order Taylor approximation with a small change in x. A gradient line is drawn to the curve and the next value of the solution variable y is on this tangent line at the successive value of x i.e. xnew = xold + h where h is the step.

The table next shows the solution values for the variable y by the Euler method when calculated by hand (and calculator), by my Ada code and finally in the last column the exact solution.

x y (hand) Ada code y (exact)
1.1 1.200000 1.200000 1.233678
1.2 1.464000 1.464000 1.552707
1.3 1.815360 1.815360 1.993716
1.4 2.287354 2.287353 2.611696
1.5 2.927813 2.927811 3.490343

By hand and calculator for instance, y(x=1.1) i.e y(1) at x = x(1) is calculated as y(x=1.1) = y(0) + h * f(x=1,y=1) = 1 + 0.1 * (2 * 1* 1) = 1.200000 to 6 d.p.

y(2) is calculated at x = x(2) as y(x=1.2) = y(1) + h * f(x=1.1,y=1.200000) = 1.200000 + 0.1 * (2 * 1.1* 1.200000) = 1.464000 to 6 d.p.

y(3) is calculated at x = x(3) as y(x=1.3) = y(2) + h * f(x=1.2,y=1.464000) = 1.464000 + 0.1 * (2 * 1.2* 1.464000) = 1.815360 to 6 d.p.

y(4) is calculated at x = x(4) as y(x=1.4) = y(3) + h * f(x=1.3,y=1.815360) = 1.815360 + 0.1 * (2 * 1.3* 1.815360) = 2.287354 to 6 d.p.

y(5) is calculated at x = x(5) as y(x=1.5) = y(4) + h * f(x=1.4,y=2.287354) = 2.287354 + 0.1 * (2 * 1.4* 2.287354) = 2.927813 to 6 d.p.

Now I want to modify the codes so that they work with a fixed number of decimal places which is 6 here after the decimal place.

The main code is diff.adb:

with Ada.Text_IO;
with Euler;
procedure Diff is

  type Real is delta 0.000001 digits 9;
  type Vector is array(Integer range <>) of Real;
  type Ptr is access function (X: Real; Y: Real) return Real;
   
  package Real_IO is new Ada.Text_IO.Decimal_IO(Num => Real); 
  use Real_IO; 

  procedure Solve is new Euler(Decimal_Type => Real, Vector_Type => Vector, Function_Ptr => Ptr);

  function Maths_Func(X: Real; Y: Real) return Real is
  begin
    return 2 * Real(X*Y);
  end Maths_Func;

  
  Answer: Vector(1..6);

begin
  Solve(F => Maths_Func'Access, Initial_Value => 1.0, Increment => 0.1, Result => Answer);
  for N in Answer'Range loop
    Put(1.0 + 0.1 * Real(N-1), Exp => 0);
    Put( Answer(N), Exp => 0);
    Ada.Text_IO.New_Line;
  end loop;
end Diff;

Then comes euler.ads:

generic
  type Decimal_Type is delta <> digits <>;
  type Vector_Type is array(Integer range <>) of Decimal_Type;
  type Function_Ptr is access function (X: Decimal_Type; Y: Decimal_Type) return Decimal_Type;
procedure Euler(
  F: in Function_Ptr; Initial_Value, Increment: in Decimal_Type; Result: out Vector_Type);

and the package body euler.adb

procedure Euler
  (F : in Function_Ptr; Initial_Value, Increment : in Decimal_Type; Result : out Vector_Type)
is
   Step : constant Decimal_Type := Increment;
   Current_X : Decimal_Type := 1.0;

begin
   Result (Result'First) := Initial_Value;
   for N in Result'First + 1 .. Result'Last loop
      Result (N) := Result (N - 1) + Step * F(Current_X, Result (N - 1));
      Current_X := Current_X + Step;
   end loop;
end Euler;

On compilation, I get the messages pointing to diff.adb:

type cannot be determined from context

explicit conversion to result type required

for the line return 2.0 times X times Y;

Perhaps the 2.0 is causing the trouble here. How to convert this Float number to Decimal?

I believe that further down in diff.adb, I will get the same issue with the line:

Solve(F => Maths_Func'Access, Initial_Value => 1.0, Increment => 0.1, Result => Answer);

for it contains Floating point numbers as well.

The compilation was done on Windows with the 32-bit GNAT community edition of year 2011. Why 2011? This is because I like the IDE better for that year rather than the pale ones which come in the recent years.

The revised codes based on trashgod codes which work are given next:

The main file diff.adb

with Ada.Numerics.Generic_Elementary_Functions; use Ada.Numerics;
with Ada.Text_IO;                               use Ada.Text_IO;
with Euler;

procedure Diff is

   type Real is digits 7;
   type Vector is array (Positive range <>) of Real;
   type Ptr is access function (X : Real; Y : Real) return Real;
   type Round_Ptr is access function (V : Real) return Real;

   procedure Solve is new Euler (Float_Type => Real, Vector => Vector, Function_Ptr => Ptr, Function_Round_Ptr => Round_Ptr);
   package Real_Functions is new Generic_Elementary_Functions (Real);
   use Real_Functions;
   package Real_IO is new Ada.Text_IO.Float_IO (Real);
   use Real_IO;

   function DFDX (X, Y : Real) return Real is (2.0 * X * Y);
   function F (X : Real) return Real is (Exp (X**2.0 - 1.0));
   function Round (V : in Real) return Real is (Real'Rounding (1.0E6 * V) / 1.0E6);

   XI      : constant Real := 1.0;
   YI      : constant Real := 1.0;
   Step    : constant Real := 0.1;
   Result  : Vector (Positive'First .. 6); --11 if step = 0.05
   X_Value : Real;

begin
   Solve (DFDX'Access, Round'Access, XI, YI, Step, Result);
   Put_line("        x      calc     exact     delta");
   for N in Result'Range loop
      X_Value := 1.0 + Step * Real (N - 1);
      Put (X_Value, Exp => 0);
      Put (" ");
      Put (Result (N), Exp => 0);
      Put (" ");
      Put (F (X_Value), Exp => 0);
      Put (" ");
      Put (Result (N) - F (X_Value), Exp => 0);
      Ada.Text_IO.New_Line;
   end loop;
end Diff;

The file euler.ads

generic
   type Float_Type is digits <>;
   type Vector is array (Positive range <>) of Float_Type;
   type Function_Ptr is access function (X, Y : Float_Type) return Float_Type;
   type Function_Round_Ptr is access function (V : Float_Type) return Float_Type;
procedure Euler
  (DFDX : in Function_Ptr; Round : Function_Round_Ptr; XI, YI, Step : in Float_Type; Result : out Vector);

The file euler.adb

procedure Euler
  (DFDX : in Function_Ptr; Round : Function_Round_Ptr; XI, YI, Step : in Float_Type; Result : out Vector)
is
   H : constant Float_Type := Step;
   X : Float_Type          := XI;
begin
   Result (Result'First) := YI;
   for N in Result'First + 1 .. Result'Last loop
       Result (N) :=  Round(Result (N - 1)) + Round(H * DFDX (X, Result (N - 1)));
       X          := X + Step;
   end loop;
end Euler;

giving the output with **step h = 0.1 **

x calc (Ada) exact delta
1.1 1.200000 1.233678 1.233678
1.2 1.464000 1.552707 -0.033678
1.3 1.815360 1.993716 -0.088707
1.4 2.287354 2.611696 -0.178356
1.5 2.927813 3.490343 -0.562530

The calc (Ada) results agree with hand (and calculator) computations.


Solution

  • First, 2.0 is a real numeric literal. It has type universal_real and may be implicitly converted to any real type, including your type Real. So this problem has nothing to do with floating <-> decimal conversions.

    Canonically, fixed-point types are represented by an integer; the equivalent value is the stored integer multiplied by the delta for the type.* So for a value X there is an integer X' such that X = X' * Real'delta. Fixed point types have some issues with multiplication and division: an expression such as X * Y is equivalent to (X' * Real'delta) * (Y' * Real'delta) = X' * Y' * Real'delta ** 2, which has no obvious type. Thus all calls to '"*"' invoke the operation on type universl_fixed yielding universl_fixed and requiring either a context that defines the result type or an explicit conversion to the result type.

    The multiplying operators are defined in ARM 4.5.5. Note Para 19.a/2, "we require an explicit conversion to be inserted in expressions like A * B * C if A, B, and C are each of some fixed point type." So you have to explicitly convert the result of one of the multiplications to Real; the context will then define the result type of the other multiplication.

    But this canonical definition also allows a loophole for multiplication by integer values: I * X, where I is some integer type, is equivalent to (I * X') * Real'delta which has type Real (provided I * X' is in range). These are defined in Para 14. So for this case, your return expression could be 2 * X * Y, which should work. No explicit conversion should be required.

    In the general case, you will require a conversion to the desired type: Real (2.7 * X) * Y.

    To get the same results from your code that you do by hand, they need to do the same thing. By hand, you're calculating a value to a large number of significant digits, then rounding it to 6 places. For your code to do this, you'd need something like

    with Ada.Text_IO;
    procedure Diff is
       type Real is digits 9;
       type Vector is array (Positive range <>) of Real;
       
       package Real_IO is new Ada.Text_IO.Float_IO (Num => Real); 
    
       function F (X : Real; Y : Real) return Real is
          (2.0 * X * Y);
          
       function Round (V : in Real) return Real is
          (Real'Rounding (1.0E6 * V) / 1.0E6);
         
       Step : constant := 0.1;
      
       Current_X : Real := 1.0;
       Answer    : Vector (1 .. 6);
    begin -- Diff
      Answer (1) := 1.0;
      
       Solve : for N in 2 .. Answer'Last loop
          Answer (N) := Round (Answer (N - 1) + Step * F (Current_X, Answer (N - 1) ) );
          Current_X := Current_X + Step;
       end loop Solve;
       
       Print : for N in Answer'Range loop
          Real_IO.Put (Item => 1.0 + 0.1 * Real (N - 1), Aft => 1, Exp => 0);
          Real_IO.Put (Item => Answer (N), Aft => 8, Exp => 0);
          Ada.Text_IO.New_Line;
       end loop Print;
    end Diff;
    

    I've used digits 9 to allow for the extra significant digits presumably present in your hand calculations before rounding. I've output with Aft => 8 to show that the rounding works; you'd want to use Aft => 6 to avoid the extra zeroes.

    This outputs

     1.0 1.00000000
     1.1 1.20000000
     1.2 1.46400000
     1.3 1.81536000
     1.4 2.28735400
     1.5 2.92781300
    

    The lesson here is that decimal fixed-point is not a drop-in substitute for floating-point with rounding.

    *Technically by the small for the type, but for decimal types this is equal to the delta.