Article 33666 of comp.arch: Xref: mri-gw comp.arch:33666 comp.benchmarks:4646 Path: mri-gw!psinntp!psinntp!news.intercon.com!howland.reston.ans.net!cs.utexas.edu!uunet!validgh!dgh From: dgh@validgh.com (David G. Hough at validgh) Newsgroups: comp.arch,comp.benchmarks Subject: IEEE 754 traps, hardware traps, and performance Message-ID: <645@validgh.com> Date: 12 Jan 94 03:23:08 GMT Followup-To: poster Organization: validgh, PO Box 20370, San Jose, CA 95160 Lines: 122 Some recent postings have confused a couple of intertwined issues relating to IEEE 754 floating-point arithmetic "architecture" and "implementation", particularly with respect to high-performance RISC CPU's. IEEE 754 defines five classes of user-level "exceptions". For each exception, the default "nonstop" behavior is to continue with a standard-defined result. But all 754 implementations are required to provide means to detect synchronously whether these exceptions have occurred. The usual implementation is by a set of hardware status bits that can be set and reset from user mode. In SunOS, a function ieee_flags(3m) can be used for that purpose. Another function ieee_retrospective(3m) can be invoked at the end of a program to print out whether any exceptional behavior arose; it is invoked automatically for Fortran programs. IEEE 754 encourages provision of five corresponding "traps" that cause asynchronous branches to user-mode trap handlers when the corresponding exception arises. Such handlers are supposed to have access to the instruction which trapped, its operands, and should be able to provide alternative numerical results. For that purpose, SPARC implementations provide, at some cost, a floating-point instruction queue which records the PC of floating-point instructions that have not yet completed. Implementations providing IEEE 754 traps are to run with traps DISABLED by default. Library functions or command line options may be used to enable one or more of the traps. In SunOS, a function ieee_handler(3m) and the command-line option "-fnonstd" can be used to enable SIGFPE to occur on an IEEE exception. Optional signal handler data structures can be used with some programming effort to get the instruction and data. Substituting recomputed results from user mode is trickier and requires patching sigtramp to avoid its habit of restoring floating-point registers after SIGFPE or other signals. SunPro compilers for SunOS 4.x provide a patched sigtramp for that purpose. There is no convenient programming interface to user-mode IEEE 754 traps due to lack of standardization, but premature standardization in this area is fraught with subtle performance hazards. IEEE 754 can be implemented in software, or hardware, or most typically by a combination. An efficient RISC implementation of IEEE 754 presents interesting hardware design problems, the most interesting of which is how to implement subnormal operands and results of multiplication; these usually arise from untrapped underflows. Thus SPARC implementations prior to SuperSPARC and MicroSPARC generated hardware traps to supervisor mode - a different concept from the IEEE 754 traps mentioned above - when subnormal operands or results were encountered. The supervisor mode code in the kernel was responsible for identifying the trapping instruction, decoding it, fetching the operands, recomputing the correct IEEE 754 result, and placing it in the intended destination. This code has been in place since the first release of SunOS for SPARC. However a few programs generate subnormal operands or results frequently, or more typically, zero results from underflows frequently. There is no great problem in treating underflows all the way to zero correctly entirely in hardware, and trapping only when subnormal non-zero operands or results were encountered, but early SPARC implementations generally trapped all underflows for recomputation. This made a very few programs run extremely slowly, so an alternative nonstandard mode was added to those SPARC hardware implementations that caused all subnormal operands and results to be treated as zeros, entirely in hardware. Current implementations of SuperSPARC and MicroSPARC do not require a nonstandard mode since subnormal operands and results are handled entirely in hardware with satisfactory performance compared to the normal case. I mention SPARC and SunOS in the foregoing since I'm very familiar with those specifics, but most of the RISC Unix workstation vendors could tell similar stories. The recent thread in comp.arch and comp.benchmarks relates to the initial ALPHA implementations and how they fail to fully conform to IEEE 754. The following is my understanding of the current situation, perhaps somebody from DEC will correct any misunderstandings: 1) Current ALPHA chips, like most RISC CPU's, do not handle subnormal operands and results, and causes a hardware trap to the kernel. However, presumably due to a rush to market, DEC's operating system kernels supporting ALPHA do not recompute the correct IEEE 754 subnormal result, providing zero instead; this is supposed to be fixed in future releases, and thus is not a permanent phenomenon. So standard IEEE 754 behavior with respect to subnormal operands and results is not currently available. Of course, from the point of view of DEC's migrating VAX customers, that doesn't matter since their codes have always operated without subnormal operands or results. 2) Current ALPHA chips, unlike many RISC CPU's, do not support precise user-mode traps on IEEE floating-point exceptions. Traps are available, which can be used when abort or long jump is an adequate exception response. But it may not be possible to determine precisely which instruction caused the trap, or to insert a substitute result and continue, and so these traps fall short of the IEEE 754-defined trapping capability. However there are, or could be, compiler options that cause synchronization instructions to be issued after each possibly trapping floating-point instruction. Then a user-mode SIGFPE handler would know that it could only be invoked because of an exception encountered in the immediately previous floating-point instruction. Obviously such compiler options cause the normal non-exceptional case to run more slowly. Unlike item 1) above, however, imprecise floating-point traps seem more likely than not to prevail in future high-performance CPU designs. What could be done instead? W. Kahan, the father of IEEE 754 arithmetic, believes that hardware facilities to support pre-substitution are essential: prior to a possibly exceptional instruction, one would specify the result to be used in case the (rare) exception arose, so that no asynchronous control flow would be necessary, and the common unexceptional case would proceed at full speed. So far, however, I have been somewhat intimidated by the amount of hardware and software support required to fully implement such a facility, especially in light of the workstation customer tendency to make buying decisions on the basis of least-common-denominator hardware capabilities, in order to avoid vendor lockin, and the PC hardware and software vendor tendency to provide broken floating-point hardware support, or none. But perhaps this is not surprising, since Kahan defines an EXCEPTION as any situation in which no matter how you handle it by default, somebody will TAKE EXCEPTION to your decision! -- David Hough dgh@validgh.com consultant on system correctness and performance evaluation and IEEE 754 binary floating-point arithmetic - send for business announcement