# Improving floating point

Read a post this week in Reddit pointing to an article that was from The Next Platform (New approach could sink floating point computation). It was all about changing IEEE floating point format to something better called posits, which was designed by noted computer architect, John Gustafson, et al, (see their paper Beating floating point at its own game: Posit arithmetic, for more info).

But first please take our new poll:

The problems with standard floating point have been known since they were first defined, in 1985 by the IEEE. As you may recall, an IEEE 754 floating point number has three parts a sign, an exponent and a mantissa (fraction or significand part). Both the exponent and mantissa can be negative.

## IEEE defined floating point numbers

The IEEE 754 standard defines the following formats (see Floating point Floating -point arithmetic, for more info)

• Half precision floating point, (added in 2008), has 1 sign bit (for the significand or mantissa), 5 exponent bits (indicating 2**-62 to 2**+64) and 10 significand bits for a total of 16 bits.
• Single precision floating point, has 1 sign bit, 8 exponent bits (indicating 2**-126 to 2**+128) and 23 significand bits for a total of32 bits.
• Double precision floating point, has 1 sign bit, 11 exponent bits (2**-1022 to 2**+1024) and 52 significand bits.
• Quadrouple precision floating point, has 1 sign bit, 15 exponent bits (2**-16,382 to 2**+16,384) and 112 significand bits.

I believe Half precision was introduced to help speed up AI deep learning training and inferencing.

Some problems with the IEEE standard include, it supports -0 and +0 which have different representations and -∞ and +∞ as well as can be used to represent a number of unique, Not-a-Numbers or NaNs which are illegal floating point numbers. So when performing IEEE standard floating point arithmetic, one needs to check to see if a result is a NaN which would make it an illegal result, and must be wary when comparing numbers such as -0, +0 and -∞ , +∞. because, sigh, they are not equal.

## Posits to the rescue

It’s all a bit technical (read the paper to find out) but posits don’t support -0 and +0, just 0 and there’s no -∞ or +∞ in posits either, just ∞. Posits also allow for a variable number of exponent bits (which are encoded into Regime scale factor bits [whose value is determined by a useed factor] and Exponent scale factor bits) which means that the number of significand bits can also vary.

So, with a 32 bit, single precision Posit, the number range represented can be quite a bit larger than single precision floating point. Indeed, with the approach put forward by Gustafson, a single 32 bit posit has more numeric range than a single precision IEEE 754 float and about as 1/2 as much range as double precision IEEE floating point number but only uses 32 bits.

Presently, there’s no commercial hardware implementations of posits, but there’s a lot of interest. Mostly because, the same number of bits can represent a lot more numeric range than equivalently sized IEEE 754 floats. And for HPC environments, AI deep learning applications, scientific computing, etc. having more numeric range (or precision), in less space, means they can jam more data in the same storage, transfer more data over the same networking bandwidth and save more numbers in limited amounts of DRAM.

Although, commercial implementations do not exist, there’s been some FPGA simulations of posit floating point arithmetic. Those simulations have shown it to be more energy efficient than IEEE 754 floating point arithmetic for the same number of bits. So, you need to add better energy efficiency to the advantages of posit arithmetic.

Is it any wonder that HPC/big science (weather prediction, Square Kilometer Array, energy simulations, etc.) and many AI hardware accelerator chip designers are examining posits as a potential way to boost precision, reduce storage/memory footprint and reduce energy consumption.

~~~~

Yet, standards have a way of persisting. Just look at how long the QWERTY keyboard has lasted. It was originally designed in the 1870’s to slow down typing and reduce jamming, when typewriters were mechanical devices. But ever since 1934, when the DVORAK keyboard was patented, there’s been much better layouts for keyboards. And there’s no arguing that the DVORAK keyboard is better for typing on non-mechanical typewriters. Yet today, I know of no computer vendor that ships DVORAK labeled keyboards. Once a standard becomes set, it’s very hard to dislodge.