| |
| <HTML> |
| |
| <HEAD> |
| <TITLE>Berkeley TestFloat General Documentation</TITLE> |
| </HEAD> |
| |
| <BODY> |
| |
| <H1>Berkeley TestFloat Release 3: General Documentation</H1> |
| |
| <P> |
| John R. Hauser<BR> |
| 2014 ______<BR> |
| </P> |
| |
| <P> |
| *** CONTENT DONE. |
| </P> |
| |
| <P> |
| *** REPLACE QUOTATION MARKS. |
| <BR> |
| *** REPLACE APOSTROPHES. |
| <BR> |
| *** REPLACE EM DASH. |
| </P> |
| |
| |
| <H2>Contents</H2> |
| |
| <P> |
| *** CHECK.<BR> |
| *** FIX FORMATTING. |
| </P> |
| |
| <PRE> |
| Introduction |
| Limitations |
| Acknowledgments and License |
| What TestFloat Does |
| Executing TestFloat |
| Operations Tested by TestFloat |
| Conversion Operations |
| Basic Arithmetic Operations |
| Fused Multiply-Add Operations |
| Remainder Operations |
| Round-to-Integer Operations |
| Comparison Operations |
| Interpreting TestFloat Output |
| Variations Allowed by the IEEE Floating-Point Standard |
| Underflow |
| NaNs |
| Conversions to Integer |
| Contact Information |
| </PRE> |
| |
| |
| <H2>1. Introduction</H2> |
| |
| <P> |
| Berkeley TestFloat is a small collection of programs for testing that an |
| implementation of binary floating-point conforms to the IEEE Standard for |
| Floating-Point Arithmetic. |
| All operations required by the original 1985 version of the IEEE Floating-Point |
| Standard can be tested, except for conversions to and from decimal. |
| The following binary formats can be tested: <NOBR>32-bit</NOBR> |
| single-precision, <NOBR>64-bit</NOBR> double-precision, <NOBR>80-bit</NOBR> |
| double-extended-precision, and/or <NOBR>128-bit</NOBR> quadruple-precision. |
| TestFloat cannot test decimal floating-point. |
| </P> |
| |
| <P> |
| Included in the TestFloat package are the <CODE>testsoftfloat</CODE> and |
| <CODE>timesoftfloat</CODE> programs for testing the Berkeley SoftFloat software |
| implementation of floating-point and for measuring its speed. |
| Information about SoftFloat can be found at the SoftFloat Web page, |
| <A HREF="http://www.jhauser.us/arithmetic/SoftFloat.html"><CODE>http://www.jhauser.us/arithmetic/SoftFloat.html</CODE></A>. |
| The <CODE>testsoftfloat</CODE> and <CODE>timesoftfloat</CODE> programs are |
| expected to be of interest only to people compiling the SoftFloat sources. |
| </P> |
| |
| <P> |
| This document explains how to use the TestFloat programs. |
| It does not attempt to define or explain much of the IEEE Floating-Point |
| Standard. |
| Details about the standard are available elsewhere. |
| </P> |
| |
| <P> |
| The current version of TestFloat is <NOBR>Release 3</NOBR>. |
| The set of TestFloat programs as well as the programs' arguments and behavior |
| have changed some compared to earlier TestFloat releases. |
| </P> |
| |
| |
| <H2>2. Limitations</H2> |
| |
| <P> |
| TestFloat output is not always easily interpreted. |
| Detailed knowledge of the IEEE Floating-Point Standard and its vagaries is |
| needed to use TestFloat responsibly. |
| </P> |
| |
| <P> |
| TestFloat performs relatively simple tests designed to check the fundamental |
| soundness of the floating-point under test. |
| TestFloat may also at times manage to find rarer and more subtle bugs, but it |
| will probably only find such bugs by chance. |
| Software that purposefully seeks out various kinds of subtle floating-point |
| bugs can be found through links posted on the TestFloat Web page |
| (<A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></A>). |
| </P> |
| |
| |
| <H2>3. Acknowledgments and License</H2> |
| |
| <P> |
| The TestFloat package was written by me, <NOBR>John R.</NOBR> Hauser. |
| <NOBR>Release 3</NOBR> of TestFloat is a completely new implementation |
| supplanting earlier releases. |
| This project was done in the employ of the University of California, Berkeley, |
| within the Department of Electrical Engineering and Computer Sciences, first |
| for the Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab. |
| The work was officially overseen by Prof. Krste Asanovic, with funding provided |
| by these sources: |
| <BLOCKQUOTE> |
| <TABLE> |
| <TR> |
| <TD><NOBR>Par Lab:</NOBR></TD> |
| <TD> |
| Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery |
| (Award #DIG07-10227), with additional support from Par Lab affiliates Nokia, |
| NVIDIA, Oracle, and Samsung. |
| </TD> |
| </TR> |
| <TR> |
| <TD><NOBR>ASPIRE Lab:</NOBR></TD> |
| <TD> |
| DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from |
| ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA, |
| Oracle, and Samsung. |
| </TD> |
| </TR> |
| </TABLE> |
| </BLOCKQUOTE> |
| </P> |
| |
| <P> |
| The following applies to the whole of TestFloat <NOBR>Release 3</NOBR> as well |
| as to each source file individually. |
| </P> |
| |
| <P> |
| Copyright 2011, 2012, 2013, 2014 The Regents of the University of California |
| (Regents). |
| All Rights Reserved. |
| Redistribution and use in source and binary forms, with or without |
| modification, are permitted provided that the following conditions are met: |
| </P> |
| |
| <P> |
| Redistributions of source code must retain the above copyright notice, this |
| list of conditions, and the following two paragraphs of disclaimer. |
| Redistributions in binary form must reproduce the above copyright notice, this |
| list of conditions, and the following two paragraphs of disclaimer in the |
| documentation and/or other materials provided with the distribution. |
| Neither the name of the Regents nor the names of its contributors may be used |
| to endorse or promote products derived from this software without specific |
| prior written permission. |
| </P> |
| |
| <P> |
| IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, |
| INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF |
| THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF REGENTS HAS BEEN |
| ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| </P> |
| |
| <P> |
| REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, |
| THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. |
| THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS |
| PROVIDED "<NOBR>AS IS</NOBR>". |
| REGENTS HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, |
| ENHANCEMENTS, OR MODIFICATIONS. |
| </P> |
| |
| |
| <H2>4. What TestFloat Does</H2> |
| |
| <P> |
| TestFloat is designed to test a floating-point implementation by comparing its |
| behavior with that of TestFloat's own internal floating-point implemented in |
| software. |
| For each operation to be tested, the TestFloat programs can generate a large |
| number of test cases, made up of simple pattern tests intermixed with weighted |
| random inputs. |
| The cases generated should be adequate for testing carry chain propagations, |
| plus the rounding of addition, subtraction, multiplication, and simple |
| operations like conversions. |
| TestFloat makes a point of checking all boundary cases of the arithmetic, |
| including underflows, overflows, invalid operations, subnormal inputs, zeros |
| (positive and negative), infinities, and NaNs. |
| For the interesting operations like addition and multiplication, millions of |
| test cases may be checked. |
| </P> |
| |
| <P> |
| TestFloat is not remarkably good at testing difficult rounding cases for |
| division and square root. |
| It also makes no attempt to find bugs specific to SRT division and the like |
| (such as the infamous Pentium division bug). |
| Software that tests for such failures can be found through links on the |
| TestFloat Web page, |
| <A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></A>. |
| </P> |
| |
| <P> |
| NOTE!<BR> |
| It is the responsibility of the user to verify that the discrepancies TestFloat |
| finds actually represent faults in the implementation being tested. |
| Advice to help with this task is provided later in this document. |
| Furthermore, even if TestFloat finds no fault with a floating-point |
| implementation, that in no way guarantees that the implementation is bug-free. |
| </P> |
| |
| <P> |
| For each operation, TestFloat can test all five rounding modes defined by the |
| IEEE Floating-Point Standard. |
| TestFloat verifies not only that the numeric results of an operation are |
| correct, but also that the proper floating-point exception flags are raised. |
| All five exception flags are tested, including the <I>inexact</I> flag. |
| TestFloat does not attempt to verify that the floating-point exception flags |
| are actually implemented as sticky flags. |
| </P> |
| |
| <P> |
| For the <NOBR>80-bit</NOBR> double-extended-precision format, TestFloat can |
| test the addition, subtraction, multiplication, division, and square root |
| operations at all three of the standard rounding precisions. |
| The rounding precision can be set to <NOBR>32 bits</NOBR>, equivalent to |
| single-precision, to <NOBR>64 bits</NOBR>, equivalent to double-precision, or |
| to the full <NOBR>80 bits</NOBR> of the double-extended-precision. |
| Rounding precision control can be applied only to the double-extended-precision |
| format and only for the five basic arithmetic operations: addition, |
| subtraction, multiplication, division, and square root. |
| Other operations can be tested only at full precision. |
| </P> |
| |
| <P> |
| As a rule, TestFloat is not particular about the bit patterns of NaNs that |
| appear as operation results. |
| Any NaN is considered as good a result as another. |
| This laxness can be overridden so that TestFloat checks for particular bit |
| patterns within NaN results. |
| See <NOBR>section 8</NOBR> below, <I>Variations Allowed by the IEEE |
| Floating-Point Standard</I>, plus the <CODE>-checkNaNs</CODE> option documented |
| for programs <CODE>testfloat_ver</CODE> and <CODE>testfloat</CODE>. |
| </P> |
| |
| <P> |
| TestFloat normally compares an implementation of floating-point against the |
| Berkeley SoftFloat software implementation of floating-point, also created by |
| me. |
| The SoftFloat functions are linked into each TestFloat program's executable. |
| Information about SoftFloat can be found at the Web page |
| <A HREF="http://www.jhauser.us/arithmetic/SoftFloat.html"><CODE>http://www.jhauser.us/arithmetic/SoftFloat.html</CODE></A>. |
| </P> |
| |
| <P> |
| For testing SoftFloat itself, the TestFloat package includes a |
| <CODE>testsoftfloat</CODE> program that compares SoftFloat's floating-point |
| against <EM>another</EM> software floating-point implementation. |
| The second software floating-point is simpler and slower than SoftFloat, and is |
| completely independent of SoftFloat. |
| Although the second software floating-point cannot be guaranteed to be |
| bug-free, the chance that it would mimic any of SoftFloat's bugs is low. |
| Consequently, an error in one or the other floating-point version should appear |
| as an unexpected difference between the two implementations. |
| Note that testing SoftFloat should be necessary only when compiling a new |
| TestFloat executable or when compiling SoftFloat for some other reason. |
| </P> |
| |
| |
| <H2>5. Executing TestFloat</H2> |
| |
| <P> |
| The TestFloat package consists of five programs, all intended to be executed |
| from a command-line interpreter: |
| <BLOCKQUOTE> |
| <TABLE> |
| <TR> |
| <TD> |
| <A HREF="testfloat_gen.html"><CODE>testfloat_gen</CODE></A><CODE> </CODE> |
| </TD> |
| <TD> |
| Generates test cases for a specific floating-point operation. |
| </TD> |
| </TR> |
| <TR> |
| <TD><A HREF="testfloat_ver.html"><CODE>testfloat_ver</CODE></A></TD> |
| <TD> |
| Verifies whether the results from executing a floating-point operation are as |
| expected. |
| </TD> |
| </TR> |
| <TR> |
| <TD><A HREF="testfloat.html"><CODE>testfloat</CODE></A></TD> |
| <TD> |
| An all-in-one program that generates test cases, executes floating-point |
| operations, and verifies whether the results match expectations. |
| </TD> |
| </TR> |
| <TR> |
| <TD> |
| <A HREF="testsoftfloat.html"><CODE>testsoftfloat</CODE></A><CODE> </CODE> |
| </TD> |
| <TD> |
| Like <CODE>testfloat</CODE>, but for testing SoftFloat. |
| </TD> |
| </TR> |
| <TR> |
| <TD> |
| <A HREF="timesoftfloat.html"><CODE>timesoftfloat</CODE></A><CODE> </CODE> |
| </TD> |
| <TD> |
| A program for measuring the speed of SoftFloat (included in the TestFloat |
| package for convenience). |
| </TD> |
| </TR> |
| </TABLE> |
| </BLOCKQUOTE> |
| Each program has its own page of documentation that can be opened through the |
| links in the table above. |
| </P> |
| |
| <P> |
| To test a floating-point implementation other than SoftFloat, one of three |
| different methods can be used. |
| The first method pipes output from <CODE>testfloat_gen</CODE> to a program |
| that: |
| <NOBR>(a) reads</NOBR> the incoming test cases, <NOBR>(b) invokes</NOBR> the |
| floating-point operation being tested, and <NOBR>(c) writes</NOBR> the |
| operation results to output. |
| These results can then be piped to <CODE>testfloat_ver</CODE> to be checked for |
| correctness. |
| Assuming a vertical bar (<CODE>|</CODE>) indicates a pipe between programs, the |
| complete process could be written as a single command like so: |
| <PRE> |
| testfloat_gen ... <type> | <program-that-invokes-op> | testfloat_ver ... <function> |
| </PRE> |
| The program in the middle is not supplied by TestFloat but must be created |
| independently. |
| If for some reason this program cannot take command-line arguments, the |
| <CODE>-prefix</CODE> option of <CODE>testfloat_gen</CODE> can communicate |
| parameters through the pipe. |
| </P> |
| |
| <P> |
| A second method for running TestFloat is similar but has |
| <CODE>testfloat_gen</CODE> supply not only the test inputs but also the |
| expected results for each case. |
| With this additional information, the job done by <CODE>testfloat_ver</CODE> |
| can be folded into the invoking program to give the following command: |
| <PRE> |
| testfloat_gen ... <function> | <program-that-invokes-op-and-compares-results> |
| </PRE> |
| Again, the program that actually invokes the floating-point operation is not |
| supplied by TestFloat but must be created independently. |
| Depending on circumstance, it may be preferable either to let |
| <CODE>testfloat_ver</CODE> check and report suspected errors (first method) or |
| to include this step in the invoking program (second method). |
| </P> |
| |
| <P> |
| The third way to use TestFloat is the all-in-one <CODE>testfloat</CODE> |
| program. |
| This program can perform all the steps of creating test cases, invoking the |
| floating-point operation, checking the results, and reporting suspected errors. |
| However, for this to be possible, <CODE>testfloat</CODE> must be compiled to |
| contain the method for invoking the floating-point operations to test. |
| Each build of <CODE>testfloat</CODE> is therefore capable of testing |
| <EM>only</EM> the floating-point implementation it was built to invoke. |
| To test a new implementation of floating-point, a new <CODE>testfloat</CODE> |
| must be created, linked to that specific implementation. |
| By comparison, the <CODE>testfloat_gen</CODE> and <CODE>testfloat_ver</CODE> |
| programs are entirely generic; |
| one instance is usable for testing any floating-point implementation, because |
| implementation-specific details are segregated in the custom program that |
| follows <CODE>testfloat_gen</CODE>. |
| </P> |
| |
| <P> |
| Program <CODE>testsoftfloat</CODE> is another all-in-one program specifically |
| for testing SoftFloat. |
| </P> |
| |
| <P> |
| Programs <CODE>testfloat_ver</CODE>, <CODE>testfloat</CODE>, and |
| <CODE>testsoftfloat</CODE> all report status and error information in a common |
| way. |
| As it executes, each of these programs writes status information to the |
| standard error output, which should be the screen by default. |
| In order for this status to be displayed properly, the standard error stream |
| should not be redirected to a file. |
| Any discrepancies that are found are written to the standard output stream, |
| which is easily redirected to a file if desired. |
| Unless redirected, reported errors will appear intermixed with the ongoing |
| status information in the output. |
| </P> |
| |
| |
| <H2>6. Operations Tested by TestFloat</H2> |
| |
| <P> |
| TestFloat can test all operations required by the original 1985 IEEE |
| Floating-Point Standard except for conversions to and from decimal. |
| These operations are: |
| <UL> |
| <LI> |
| conversions among the supported floating-point formats, and also between |
| integers (<NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR>, signed and unsigned) and |
| any of the floating-point formats; |
| <LI> |
| for each floating-point format, the usual addition, subtraction, |
| multiplication, division, and square root operations; |
| <LI> |
| for each format, the floating-point remainder operation defined by the IEEE |
| Standard; |
| <LI> |
| for each format, a ``round to integer'' operation that rounds to the nearest |
| integer value in the same format; and |
| <LI> |
| comparisons between two values in the same floating-point format. |
| </UL> |
| In addition, TestFloat can also test |
| <UL> |
| <LI> |
| for each floating-point format except <NOBR>80-bit</NOBR> |
| double-extended-precision, the fused multiply-add operation defined by the 2008 |
| IEEE Standard. |
| </UL> |
| </P> |
| |
| <P> |
| More information about all these operations is given below. |
| In the operation names used by TestFloat, <NOBR>32-bit</NOBR> single-precision |
| is called <CODE>f32</CODE>, <NOBR>64-bit</NOBR> double-precision is |
| <CODE>f64</CODE>, <NOBR>80-bit</NOBR> double-extended-precision is |
| <CODE>extF80</CODE>, and <NOBR>128-bit</NOBR> quadruple-precision is |
| <CODE>f128</CODE>. |
| TestFloat generally uses the same names for operations as Berkeley SoftFloat, |
| except that TestFloat's names never include the <CODE>M</CODE> that SoftFloat |
| uses to indicate that values are passed through pointers. |
| </P> |
| |
| <H3>6.1. Conversion Operations</H3> |
| |
| <P> |
| All conversions among the floating-point formats and all conversions between a |
| floating-point format and <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> integers |
| can be tested. |
| The conversion operations are: |
| <PRE> |
| ui32_to_f32 ui64_to_f32 i32_to_f32 i64_to_f32 |
| ui32_to_f64 ui64_to_f64 i32_to_f64 i64_to_f64 |
| ui32_to_extF80 ui64_to_extF80 i32_to_extF80 i64_to_extF80 |
| ui32_to_f128 ui64_to_f128 i32_to_f128 i64_to_f128 |
| |
| f32_to_ui32 f64_to_ui32 extF80_to_ui32 f128_to_ui32 |
| f32_to_ui64 f64_to_ui64 extF80_to_ui64 f128_to_ui64 |
| f32_to_i32 f64_to_i32 extF80_to_i32 f128_to_i32 |
| f32_to_i64 f64_to_i64 extF80_to_i64 f128_to_i64 |
| |
| f32_to_f64 f64_to_f32 extF80_to_f32 f128_to_f32 |
| f32_to_extF80 f64_to_extF80 extF80_to_f64 f128_to_f64 |
| f32_to_f128 f64_to_f128 extF80_to_f128 f128_to_extF80 |
| </PRE> |
| Abbreviations <CODE>ui32</CODE> and <CODE>ui64</CODE> indicate |
| <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> unsigned integer types, while |
| <CODE>i32</CODE> and <CODE>i64</CODE> indicate their signed counterparts. |
| These conversions all round according to the current rounding mode as relevant. |
| Conversions from a smaller to a larger floating-point format are always exact |
| and so require no rounding. |
| Likewise, conversions from <NOBR>32-bit</NOBR> integers to <NOBR>64-bit</NOBR> |
| double-precision or to any larger floating-point format are also exact, as are |
| conversions from <NOBR>64-bit</NOBR> integers to <NOBR>80-bit</NOBR> |
| double-extended-precision and <NOBR>128-bit</NOBR> quadruple-precision. |
| </P> |
| |
| <P> |
| For the all-in-one <CODE>testfloat</CODE> program, this list of conversion |
| operations requires amendment. |
| For <CODE>testfloat</CODE> only, conversions to an integer type have names that |
| explicitly specify the rounding mode and treatment of inexactness. |
| Thus, instead of |
| <PRE> |
| <float>_to_<int> |
| </PRE> |
| as listed above, operations converting to integer type have names of these |
| forms: |
| <PRE> |
| <float>_to_<int>_r_<round> |
| <float>_to_<int>_rx_<round> |
| </PRE> |
| The <CODE><round></CODE> component is one of `<CODE>near_even</CODE>', |
| `<CODE>near_maxMag</CODE>', `<CODE>minMag</CODE>', `<CODE>min</CODE>', or |
| `<CODE>max</CODE>', choosing the rounding mode. |
| Any other indication of rounding mode is ignored. |
| The operations with `<CODE>_r_</CODE>' in their names never raise the |
| <I>inexact</I> exception, while those with `<CODE>_rx_</CODE>' raise the |
| <I>inexact</I> exception whenever the result is not exact. |
| </P> |
| |
| <P> |
| TestFloat assumes that conversions from floating-point to an integer type |
| should raise the <I>invalid</I> exception if the input cannot be rounded to an |
| integer representable by the result format. |
| In such a circumstance, if the result type is an unsigned integer, TestFloat |
| expects the result of the operation to be the type's largest integer value. |
| If the result type is a signed integer and conversion overflows, TestFloat |
| expects the result to be the largest-magnitude integer with the same sign as |
| the input. |
| Lastly, when a NaN is converted to a signed integer type, TestFloat allows |
| either the largest postive or largest-magnitude negative integer to be |
| returned. |
| Conversions to integer types are expected never to raise the <I>overflow</I> |
| exception. |
| </P> |
| |
| <H3>6.2. Basic Arithmetic Operations</H3> |
| |
| <P> |
| The following standard arithmetic operations can be tested: |
| <PRE> |
| f32_add f32_sub f32_mul f32_div f32_sqrt |
| f64_add f64_sub f64_mul f64_div f64_sqrt |
| extF80_add extF80_sub extF80_mul extF80_div extF80_sqrt |
| f128_add f128_sub f128_mul f128_div f128_sqrt |
| </PRE> |
| The double-extended-precision (<CODE>extF80</CODE>) operations can be rounded |
| to reduced precision under rounding precision control. |
| </P> |
| |
| <H3>6.3. Fused Multiply-Add Operations</H3> |
| |
| <P> |
| For all floating-point formats except <NOBR>80-bit</NOBR> |
| double-extended-precision, TestFloat can test the fused multiply-add operation |
| defined by the 2008 IEEE Floating-Point Standard. |
| The fused multiply-add operations are: |
| <PRE> |
| f32_mulAdd |
| f64_mulAdd |
| f128_mulAdd |
| </PRE> |
| </P> |
| |
| <P> |
| If one of the multiplication operands is infinite and the other is zero, |
| TestFloat expects the fused multiply-add operation to raise the <I>invalid</I> |
| exception even if the third operand is a NaN. |
| </P> |
| |
| <H3>6.4. Remainder Operations</H3> |
| |
| <P> |
| For each format, TestFloat can test the IEEE Standard's remainder operation. |
| These operations are: |
| <PRE> |
| f32_rem |
| f64_rem |
| extF80_rem |
| f128_rem |
| </PRE> |
| The remainder operations are always exact and so require no rounding. |
| </P> |
| |
| <H3>6.5. Round-to-Integer Operations</H3> |
| |
| <P> |
| For each format, TestFloat can test the IEEE Standard's round-to-integer |
| operation. |
| For most TestFloat programs, these operations are: |
| <PRE> |
| f32_roundToInt |
| f64_roundToInt |
| extF80_roundToInt |
| f128_roundToInt |
| </PRE> |
| </P> |
| |
| <P> |
| Just as for conversions to integer types (<NOBR>section 6.1</NOBR> above), the |
| all-in-one <CODE>testfloat</CODE> program is again an exception. |
| For <CODE>testfloat</CODE> only, the round-to-integer operations have names of |
| these forms: |
| <PRE> |
| <float>_roundToInt_r_<round> |
| <float>_roundToInt_x |
| </PRE> |
| For the `<CODE>_r_</CODE>' versions, the <I>inexact</I> exception is never |
| raised, and the <CODE><round></CODE> component specifies the rounding |
| mode as one of `<CODE>near_even</CODE>', `<CODE>near_maxMag</CODE>', |
| `<CODE>minMag</CODE>', `<CODE>min</CODE>', or `<CODE>max</CODE>'. |
| The usual indication of rounding mode is ignored. |
| In contrast, the `<CODE>_x</CODE>' versions accept the usual indication of |
| rounding mode and raise the <I>inexact</I> exception whenever the result is not |
| exact. |
| This irregular system follows the IEEE Standard's precise specification for the |
| round-to-integer operations. |
| </P> |
| |
| <H3>6.6. Comparison Operations</H3> |
| |
| <P> |
| The following floating-point comparison operations can be tested: |
| <PRE> |
| f32_eq f32_le f32_lt |
| f64_eq f64_le f64_lt |
| extF80_eq extF80_le extF80_lt |
| f128_eq f128_le f128_lt |
| </PRE> |
| The abbreviation <CODE>eq</CODE> stands for ``equal'' (=), <CODE>le</CODE> |
| stands for ``less than or equal'' (≤), and <CODE>lt</CODE> stands for |
| ``less than'' (<). |
| </P> |
| |
| <P> |
| The IEEE Standard specifies that, by default, the less-than-or-equal and |
| less-than comparisons raise the <I>invalid</I> exception if either input is any |
| kind of NaN. |
| The equality comparisons, on the other hand, are defined by default to raise |
| the <I>invalid</I> exception only for signaling NaNs, not for quiet NaNs. |
| For completeness, the following additional operations can be tested if |
| supported: |
| <PRE> |
| f32_eq_signaling f32_le_quiet f32_lt_quiet |
| f64_eq_signaling f64_le_quiet f64_lt_quiet |
| extF80_eq_signaling extF80_le_quiet extF80_lt_quiet |
| f128_eq_signaling f128_le_quiet f128_lt_quiet |
| </PRE> |
| The <CODE>signaling</CODE> equality comparisons are identical to the standard |
| operations except that the <I>invalid</I> exception should be raised for any |
| NaN input. |
| Similarly, the <CODE>quiet</CODE> comparison operations should be identical to |
| their counterparts except that the <I>invalid</I> exception is not raised for |
| quiet NaNs. |
| </P> |
| |
| <P> |
| Obviously, no comparison operations ever require rounding. |
| Any rounding mode is ignored. |
| </P> |
| |
| |
| <H2>7. Interpreting TestFloat Output</H2> |
| |
| <P> |
| The ``errors'' reported by TestFloat programs may or may not really represent |
| errors in the system being tested. |
| For each test case tried, the results from the floating-point implementation |
| being tested could differ from the expected results for several reasons: |
| <UL> |
| <LI> |
| The IEEE Floating-Point Standard allows for some variation in how conforming |
| floating-point behaves. |
| Two implementations can sometimes give different results without either being |
| incorrect. |
| <LI> |
| The trusted floating-point emulation could be faulty. |
| This could be because there is a bug in the way the enulation is coded, or |
| because a mistake was made when the code was compiled for the current system. |
| <LI> |
| The TestFloat program may not work properly, reporting differences that do not |
| exist. |
| <LI> |
| Lastly, the floating-point being tested could actually be faulty. |
| </UL> |
| It is the responsibility of the user to determine the causes for the |
| discrepancies that are reported. |
| Making this determination can require detailed knowledge about the IEEE |
| Standard. |
| Assuming TestFloat is working properly, any differences found will be due to |
| either the first or last of the reasons above. |
| Variations in the IEEE Standard that could lead to false error reports are |
| discussed in <NOBR>section 8</NOBR>, <I>Variations Allowed by the IEEE |
| Floating-Point Standard</I>. |
| </P> |
| |
| <P> |
| For each reported error (or apparent error), a line of text is written to the |
| default output. |
| If a line would be longer than 79 characters, it is divided. |
| The first part of each error line begins in the leftmost column, and any |
| subsequent ``continuation'' lines are indented with a tab. |
| </P> |
| |
| <P> |
| Each error reported is of the form: |
| <PRE> |
| <inputs> => <observed-output> expected: <expected-output> |
| </PRE> |
| The <CODE><inputs></CODE> are the inputs to the operation. |
| Each output (observed and expected) is shown as a pair: the result value |
| first, followed by the exception flags. |
| </P> |
| |
| <P> |
| For example, two typical error lines could be |
| <PRE> |
| 800.7FFF00 87F.000100 => 001.000000 ...ux expected: 001.000000 ....x |
| 081.000004 000.1FFFFF => 001.000000 ...ux expected: 001.000000 ....x |
| </PRE> |
| In the first line, the inputs are <CODE>800.7FFF00</CODE> and |
| <CODE>87F.000100</CODE>, and the observed result is <CODE>001.000000</CODE> |
| with flags <CODE>...ux</CODE>. |
| The trusted emulation result is the same but with different flags, |
| <CODE>....x</CODE>. |
| Items such as <CODE>800.7FFF00</CODE> composed of hexadecimal digits and a |
| single period represent floating-point values (here <NOBR>32-bit</NOBR> |
| single-precision). |
| The two instances above were reported as errors because the exception flag |
| results differ. |
| </P> |
| |
| <P> |
| Aside from the exception flags, there are nine data types that may be |
| represented. |
| Four are floating-point types: <NOBR>32-bit</NOBR> single-precision, |
| <NOBR>64-bit</NOBR> double-precision, <NOBR>80-bit</NOBR> |
| double-extended-precision, and <NOBR>128-bit</NOBR> quadruple-precision. |
| The remaining five types are <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> |
| unsigned integers, <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> two's-complement |
| signed integers, and Boolean values (the results of comparison operations). |
| Boolean values are represented as a single character, either a <CODE>0</CODE> |
| or a <CODE>1</CODE>. |
| <NOBR>32-bit</NOBR> integers are represented as 8 hexadecimal digits. |
| Thus, for a signed <NOBR>32-bit</NOBR> integer, <CODE>FFFFFFFF</CODE> is -1, |
| and <CODE>7FFFFFFF</CODE> is the largest positive value. |
| <NOBR>64-bit</NOBR> integers are the same except with 16 hexadecimal digits. |
| </P> |
| |
| <P> |
| Floating-point values are written in a correspondingly primitive form. |
| Values of the <NOBR>64-bit</NOBR> double-precision format are represented by 16 |
| hexadecimal digits that give the raw bits of the floating-point encoding. |
| A period separates the 3rd and 4th hexadecimal digits to mark the division |
| between the exponent bits and fraction bits. |
| Some notable <NOBR>64-bit</NOBR> double-precision values include: |
| <PRE> |
| 000.0000000000000 +0 |
| 3FF.0000000000000 1 |
| 400.0000000000000 2 |
| 7FF.0000000000000 +infinity |
| |
| 800.0000000000000 -0 |
| BFF.0000000000000 -1 |
| C00.0000000000000 -2 |
| FFF.0000000000000 -infinity |
| |
| 3FE.FFFFFFFFFFFFF largest representable number less than +1 |
| </PRE> |
| The following categories are easily distinguished (assuming the |
| <CODE>x</CODE>s are not all 0): |
| <PRE> |
| 000.xxxxxxxxxxxxx positive subnormal (denormalized) numbers |
| 7FF.xxxxxxxxxxxxx positive NaNs |
| 800.xxxxxxxxxxxxx negative subnormal numbers |
| FFF.xxxxxxxxxxxxx negative NaNs |
| </PRE> |
| </P> |
| |
| <P> |
| <NOBR>128-bit</NOBR> quadruple-precision values are written the same except |
| with 4 hexadecimal digits for the sign and exponent and 28 for the fraction. |
| Notable values include: |
| <PRE> |
| 0000.0000000000000000000000000000 +0 |
| 3FFF.0000000000000000000000000000 1 |
| 4000.0000000000000000000000000000 2 |
| 7FFF.0000000000000000000000000000 +infinity |
| |
| 8000.0000000000000000000000000000 -0 |
| BFFF.0000000000000000000000000000 -1 |
| C000.0000000000000000000000000000 -2 |
| FFFF.0000000000000000000000000000 -infinity |
| |
| 3FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF largest representable number |
| less than +1 |
| </PRE> |
| </P> |
| |
| <P> |
| <NOBR>80-bit</NOBR> double-extended-precision values are a little unusual in |
| that the leading bit of precision is not hidden as with other formats. |
| When correctly encoded, the leading significand bit of an <NOBR>80-bit</NOBR> |
| double-extended-precision value will be 0 if the value is zero or subnormal, |
| and will be 1 otherwise. |
| Hence, the same values listed above appear in <NOBR>80-bit</NOBR> |
| double-extended-precision as follows (note the leading <CODE>8</CODE> digit in |
| the significands): |
| <PRE> |
| 0000.0000000000000000 +0 |
| 3FFF.8000000000000000 1 |
| 4000.8000000000000000 2 |
| 7FFF.8000000000000000 +infinity |
| |
| 8000.0000000000000000 -0 |
| BFFF.8000000000000000 -1 |
| C000.8000000000000000 -2 |
| FFFF.8000000000000000 -infinity |
| |
| 3FFE.FFFFFFFFFFFFFFFF largest representable number less than +1 |
| </PRE> |
| </P> |
| |
| <P> |
| The representation of <NOBR>32-bit</NOBR> single-precision values is unusual |
| for a different reason. |
| Because the subfields of standard <NOBR>32-bit</NOBR> single-precision do not |
| fall on neat <NOBR>4-bit</NOBR> boundaries, single-precision outputs are |
| slightly perturbed. |
| These are written as 9 hexadecimal digits, with a period separating the 3rd and |
| 4th hexadecimal digits. |
| Broken out into bits, the 9 hexademical digits cover the <NOBR>32-bit</NOBR> |
| single-precision subfields as follows: |
| <PRE> |
| x000 .... .... . .... .... .... .... .... .... sign (1 bit) |
| .... xxxx xxxx . .... .... .... .... .... .... exponent (8 bits) |
| .... .... .... . 0xxx xxxx xxxx xxxx xxxx xxxx fraction (23 bits) |
| </PRE> |
| As shown in this schematic, the first hexadecimal digit contains only the sign, |
| and will be either <CODE>0</CODE> <NOBR>or <CODE>8</CODE></NOBR>. |
| The next two digits give the biased exponent as an <NOBR>8-bit</NOBR> integer. |
| This is followed by a period and 6 hexadecimal digits of fraction. |
| The most significant hexadecimal digit of the fraction can be at most |
| <NOBR>a <CODE>7</CODE></NOBR>. |
| </P> |
| |
| <P> |
| Notable single-precision values include: |
| <PRE> |
| 000.000000 +0 |
| 07F.000000 1 |
| 080.000000 2 |
| 0FF.000000 +infinity |
| |
| 800.000000 -0 |
| 87F.000000 -1 |
| 880.000000 -2 |
| 8FF.000000 -infinity |
| |
| 07E.7FFFFF largest representable number less than +1 |
| </PRE> |
| Again, certain categories are easily distinguished (assuming the |
| <CODE>x</CODE>s are not all 0): |
| <PRE> |
| 000.xxxxxx positive subnormal (denormalized) numbers |
| 0FF.xxxxxx positive NaNs |
| 800.xxxxxx negative subnormal numbers |
| 8FF.xxxxxx negative NaNs |
| </PRE> |
| </P> |
| |
| <P> |
| Lastly, exception flag values are represented by five characters, one character |
| per flag. |
| Each flag is written as either a letter or a period (<CODE>.</CODE>) according |
| to whether the flag was set or not by the operation. |
| A period indicates the flag was not set. |
| The letter used to indicate a set flag depends on the flag: |
| <PRE> |
| v invalid exception |
| i infinite exception ("divide by zero") |
| o overflow exception |
| u underflow exception |
| x inexact exception |
| </PRE> |
| For example, the notation <CODE>...ux</CODE> indicates that the |
| <I>underflow</I> and <I>inexact</I> exception flags were set and that the other |
| three flags (<I>invalid</I>, <I>infinite</I>, and <I>overflow</I>) were not |
| set. |
| The exception flags are always written following the value returned as the |
| result of the operation. |
| </P> |
| |
| |
| <H2>8. Variations Allowed by the IEEE Floating-Point Standard</H2> |
| |
| <P> |
| The IEEE Floating-Point Standard admits some variation among conforming |
| implementations. |
| Because TestFloat expects the two implementations being compared to deliver |
| bit-for-bit identical results under most circumstances, this leeway in the |
| standard can result in false errors being reported if the two implementations |
| do not make the same choices everywhere the standard provides an option. |
| </P> |
| |
| <H3>8.1. Underflow</H3> |
| |
| <P> |
| The standard specifies that the <I>underflow</I> exception flag is to be raised |
| when two conditions are met simultaneously: |
| <NOBR>(1) <I>tininess</I></NOBR> and <NOBR>(2) <I>loss of accuracy</I></NOBR>. |
| </P> |
| |
| <P> |
| A result is tiny when its magnitude is nonzero yet smaller than any normalized |
| floating-point number. |
| The standard allows tininess to be determined either before or after a result |
| is rounded to the destination precision. |
| If tininess is detected before rounding, some borderline cases will be flagged |
| as underflows even though the result after rounding actually lies within the |
| normal floating-point range. |
| By detecting tininess after rounding, a system can avoid some unnecessary |
| signaling of underflow. |
| All the TestFloat programs support options <CODE>-tininessbefore</CODE> and |
| <CODE>-tininessafter</CODE> to control whether TestFloat expects tininess on |
| underflow to be detected before or after rounding. |
| One or the other is selected as the default when TestFloat is compiled, but |
| these command options allow the default to be overridden. |
| </P> |
| |
| <P> |
| Loss of accuracy occurs when the subnormal format is not sufficient to |
| represent an underflowed result accurately. |
| The original 1985 version of the IEEE Standard allowed loss of accuracy to be |
| detected either as an <I>inexact result</I> or as a |
| <I>denormalization loss</I>; |
| however, few if any systems ever chose the latter. |
| The latest standard requires that loss of accuracy be detected as an inexact |
| result, and TestFloat can test only for this case. |
| </P> |
| |
| <H3>8.2. NaNs</H3> |
| |
| <P> |
| The IEEE Standard gives the floating-point formats a large number of NaN |
| encodings and specifies that NaNs are to be returned as results under certain |
| conditions. |
| However, the standard allows an implementation almost complete freedom over |
| <EM>which</EM> NaN to return in each situation. |
| </P> |
| |
| <P> |
| By default, TestFloat does not check the bit patterns of NaN results. |
| When the result of an operation should be a NaN, any NaN is considered as good |
| as another. |
| This laxness can be overridden with the <CODE>-checkNaNs</CODE> option of |
| programs <CODE>testfloat_ver</CODE> and <CODE>testfloat</CODE>. |
| In order for this option to be sensible, TestFloat must have been compiled so |
| that its internal floating-point implementation (SoftFloat) generates the |
| proper NaN results for the system being tested. |
| </P> |
| |
| <H3>8.3. Conversions to Integer</H3> |
| |
| <P> |
| Conversion of a floating-point value to an integer format will fail if the |
| source value is a NaN or if it is too large. |
| The IEEE Standard does not specify what value should be returned as the integer |
| result in these cases. |
| Moreover, according to the standard, the <I>invalid</I> exception can be raised |
| or an unspecified alternative mechanism may be used to signal such cases. |
| </P> |
| |
| <P> |
| TestFloat assumes that conversions to integer will raise the <I>invalid</I> |
| exception if the source value cannot be rounded to a representable integer. |
| In such cases, TestFloat expects the result value to be the largest-magnitude |
| positive or negative integer as detailed earlier in <NOBR>section 6.1</NOBR>, |
| <I>Conversion Operations</I>. |
| The current version of TestFloat provides no means to alter these expectations. |
| </P> |
| |
| |
| <H2>9. Contact Information</H2> |
| |
| <P> |
| At the time of this writing, the most up-to-date information about TestFloat |
| and the latest release can be found at the Web page |
| <A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></A>. |
| </P> |
| |
| |
| </BODY> |
| |