I'm trying to implement, for learning purposes, the core part of an algorithm that converts decimal strings to 64 bit floating point numbers.
I'm using the explanation from this page as a guide: https://www.exploringbinary.com/correct-decimal-to-floating-point-using-big-integers
I'm doing this in Ruby because Ruby's Integer type is implemented as big integers.
This is what I have so far:
# frozen_string_literal: true
def create_float sign, exponent, mantissa
v = String.new
v << sign.to_s(2)
v << exponent.to_s(2).rjust(11, '0')
v << mantissa.to_s(2).rjust(52, '0')
[v].pack('B*').unpack1('G')
end
def get_scale value
scale = 0
if value >= (2**53)
while value >= (2**53)
value /= 2
scale += 1
end
else
while value < (2**52)
value *= 2
scale += 1
end
end
scale
end
def to_double value, exponent
if exponent >= 0
e = (10**exponent)
t = value * e
s = get_scale t
q = t.div(2**s)
r = t - q * 2**s
z = (52 + s) + 1023
else
e = (10**-exponent)
s = get_scale value.div(e)
t = value * (2**s)
q = t / e
r = t - q * e
z = (52 + -s) + 1023
end
h = e / 2
q += 1 if r > h || r == h && q.odd?
m = q - (2**52)
puts "T: #{t}"
puts "S: #{s}"
puts "Q: #{q}"
puts "R: #{r}"
puts "H: #{h}"
puts "Z: #{z}"
puts "M: #{m}"
create_float 0, z, m
end
# Expected: 1.7976931348623157e+308
# Actual: 1.348269851146737e+308
puts to_double(17_976_931_348_623_158, 292)
The algorithm works fine for the numbers used as examples on the page (3.14159
, and 1.2345678901234567e22
) but fails for 1.7976931348623158e308
.
I think that my problem may have to do with rounding part. A q
of 9007199254740992
will fail but a q
of 9007199254740991
will give me the correct answer.
Your Error is here:
h = e / 2
h
is meant to represent half of the denominator so this should be
h = s ** 2 / 2
A few other notes, not exhaustive, meant to be helpful not critical:
q = t.div(2**s) r = t - q * 2**s
Can be simplified to
quotient, remainder = t.divmod(2 ** s)
becuasedivmod
returns anArray
of[quotient, modulus]