Package org.djutils.float128
Class Float128
- java.lang.Object
-
- java.lang.Number
-
- org.djutils.float128.Float128
-
- All Implemented Interfaces:
Serializable
public class Float128 extends Number
Float128 stores immutable floating point values, with a 16 bits signed exponent, 120 bits fraction, and one sign bit. It has arithmetic for addition, subtraction, multiplication and division, as well as several Math operators such as signum and abs. The fraction follows the implementation of the IEEE-754 standard, which means that the initial '1' is not stored in the fraction.
Copyright (c) 2020-2021 Delft University of Technology, Jaffalaan 5, 2628 BX Delft, the Netherlands. All rights reserved. See for project information https://djutils.org. The DJUTILS project is distributed under a three-clause BSD-style license, which can be found at https://djutils.org/docs/license.html.
- Author:
- Alexander Verbraeck, Peter Knoppers
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description Float128(double d)
Create a Float128 based on a double.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static String
doubleTotring(double d)
A test for a toString() method for a double.double
doubleValue()
boolean
equals(Object obj)
float
floatValue()
int
hashCode()
int
intValue()
boolean
isFinite()
Return whether the stored value is finite.boolean
isInfinite()
Return whether the stored value is infinite.boolean
isNaN()
Return whether the stored value is NaN.boolean
isPositive()
Return whether the stored value is positive.boolean
isZero()
Return whether the stored value is a signed zero.long
longValue()
static void
main(String[] args)
test code.static Float128
of(double d)
Create a Float128 from this double with a significand precision of 52 bits.static Float128
of(String sd)
Create a Float128 represented by this String with a significand precision up to 120 bits.Float128
plus(double value)
Add a double value to this value.Float128
plus(Float128 value)
Add a Float128 value to this value.protected void
shift(long[] v, int bits)
Shift the bits to the right for the variable v.String
toBinaryString()
Return the binary string representation of this Float128 value.String
toPaddedBinaryString()
Return the binary string representation of this Float128 value.String
toString()
-
Methods inherited from class java.lang.Number
byteValue, shortValue
-
-
-
-
Constructor Detail
-
Float128
public Float128(double d)
Create a Float128 based on a double. The IEEE-754 double is built up as follows:- bit 63 [0x8000_0000_0000_0000L]: sign bit(1-bit)
- bits 62-52 [0x7ff0_0000_0000_0000L]: exponent (11-bit), stored as a the 2-exponent value + 1022.
- - exponent 000 and fraction == 0: signed zero
- - exponent 000 and fraction != 0: underflow
- - exponent 111 and fraction == 0: infinity
- - exponent 111 and fraction != 0: NaN
- bits 51-0 [0x000f_ffff_ffff_ffffL]: fraction (52-bit)
- Parameters:
d
- double; the double to store
-
-
Method Detail
-
plus
public Float128 plus(Float128 value)
Add a Float128 value to this value. Addition works as follows: suppose you add 10 and 100 (decimal).
v1 = 10 = 0x(1)01000000p3 and v2 = 0x(1)100100000p6. These are the numbers behind the initial (1) before the decimal point that is part of the Float128 in bit 60.
Shift the lowest value (including the leading 1) 3 bits to the right, and add:0x(0)0010100000p6 0x(1)1001000000p6 -----------------+ 0x(1)1011100000p6
The last number indeed represents the value 110.- Parameters:
value
- Float128; the value to add- Returns:
- Float128; the sum of this Float128 and the given value
-
shift
protected void shift(long[] v, int bits)
Shift the bits to the right for the variable v.- Parameters:
v
- long[]; the variable stored as two longsbits
- int; the number of bits to shift 'down'. bits HAS to be >= 0.
-
plus
public Float128 plus(double value)
Add a double value to this value.- Parameters:
value
- double; the value to add- Returns:
- Float128; the sum of this Float128 and the given value
-
floatValue
public float floatValue()
- Specified by:
floatValue
in classNumber
-
doubleValue
public double doubleValue()
- Specified by:
doubleValue
in classNumber
-
isZero
public boolean isZero()
Return whether the stored value is a signed zero.- Returns:
- boolean; whether the stored value is signed zero
-
isNaN
public boolean isNaN()
Return whether the stored value is NaN.- Returns:
- boolean; whether the stored value is NaN
-
isInfinite
public boolean isInfinite()
Return whether the stored value is infinite.- Returns:
- boolean; whether the stored value is infinite
-
isFinite
public boolean isFinite()
Return whether the stored value is finite.- Returns:
- boolean; whether the stored value is finite
-
isPositive
public boolean isPositive()
Return whether the stored value is positive.- Returns:
- boolean; whether the stored value is positive
-
toPaddedBinaryString
public String toPaddedBinaryString()
Return the binary string representation of this Float128 value.- Returns:
- String; the binary string representation of this Float128 value
-
toBinaryString
public String toBinaryString()
Return the binary string representation of this Float128 value.- Returns:
- String; the binary string representation of this Float128 value
-
doubleTotring
public static String doubleTotring(double d)
A test for a toString() method for a double.- Parameters:
d
- double; the value- Returns:
- String; the decimal 17-digit scientific notation String representation of the double
-
of
public static Float128 of(double d)
Create a Float128 from this double with a significand precision of 52 bits.- Parameters:
d
- double; the double value- Returns:
- Float128; a Float128 from this double with a significand precision of 52 bits
-
of
public static Float128 of(String sd)
Create a Float128 represented by this String with a significand precision up to 120 bits. Up to 39 significant digits will be used to represent this value as a Float128. The only representation that is parsed right now is the scientific notation; regular notation will follow.- Parameters:
sd
- String; a String representation of a double value- Returns:
- Float128; a Float128 from this string representation with a significand precision up to 120 bits
-
main
public static void main(String[] args)
test code.- Parameters:
args
- String[] not used
-
-