Show simple item record

dc.contributor.authorTimmons, Nicholas
dc.date.accessioned2021-11-23T22:21:31Z
dc.date.available2021-11-23T22:21:31Z
dc.date.submitted2021-04-01
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/330997
dc.description.abstractThe four arithmetic floating-point operations (+,−,÷and×) have been precisely specified in IEEE-754 since 1985, but the situation for floating-point mathematical libraries and even some hardware operations such as fused multiply-add is more nuanced as there are varying opinions on which standards should be followed and when it is acceptable to allow some error or when it is necessary to be correctly-rounded. Deterministic correctly-rounded elementary mathematical functions are important in many applications. Others are tolerant to some level of error and would benefit from less accurate, better-performing approximations. We found that, despite IEEE-754 (2008 and 2019 only)specifying that ‘recommended functions’ such as sin, cos or log should be correctly rounded, the mathematical libraries available through standard interfaces in popular programming languages provide neither correct-rounding nor maximally performing approximations, partly due to the differing accuracy requirements of these functions in conflicting standards provided for some languages, such as C. This dissertation seeks to explore the current methods used for the implementation of mathematical functions, show the error present in them and demonstrate methods to produce both low-cost correctly-rounded solutions and better approximations for specific use-cases. This is achieved by: First, exploring the error within existing mathematical libraries and examining how it is impacting existing applications and the development of programming language standards. We then make two contributions which address the accuracy and standard conformance problems that were found: 1) an approach for a correctly-rounded 32-bit implementation of the elementary functions with minimal additional performance cost on modern hardware; and 2) an approach for developing a better performing incorrectly-rounded solution for use when some error is acceptable and conforming with the IEEE-754 standard is not a requirement. For the latter contribution, we introduce a tool for semi-automated generic code sensitivity analysis and approximation. Next, we target the creation of approximations for the standard activation functions used in neural networks. Identifying that significant time is spent in the computation of the activation functions, we generate approximations with different levels of error and better performance characteristics. These functions are then tested in standard neural networks to determine if the approximations have any detrimental effect on the output of the network. We show that, for many networks and activation functions, very coarse approximations are suitable replacements to train the networks equally well at a lower overall time cost. This dissertation makes original contributions to the area of approximate computing. We demonstrate new approaches to safe-approximation and justify approximate computation generally by showing that existing mathematical libraries are already suffering the downsides of approximation and latent error without fully exploiting the optimisation space available due to the existing tolerance to that error and showing that correctly-rounded solutions are possible without a significant performance impact for many 32-bit mathematical functions.
dc.rightsAll Rights Reserved
dc.rights.urihttps://www.rioxx.net/licenses/all-rights-reserved/
dc.subjectprogramming language
dc.subjectoptimisation
dc.subjectapproximation
dc.subjectmathematical functions
dc.subjectelementary functions
dc.titleSoftware-based approximate computing for mathematical functions
dc.typeThesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (PhD)
dc.publisher.institutionUniversity of Cambridge
dc.identifier.doi10.17863/CAM.78442
rioxxterms.licenseref.urihttps://www.rioxx.net/licenses/all-rights-reserved/
dc.contributor.orcidTimmons, Nicholas [0000-0002-3668-9810]
rioxxterms.typeThesis
dc.publisher.collegeDowning
dc.type.qualificationtitlePhD in Computer Science
cam.supervisorRice, Andrew
cam.supervisor.orcidRice, Andrew [0000-0002-4677-8032]


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record