Intel\u2019s manuals for their x86\/x64 processor clearly state that the fsin instruction (calculating the trigonometric sine) has a maximum error, in round-to-nearest mode, of one unit in the last place\u2026

Article word count: 2520 <\/p>

HN Discussion: https:\/\/news.ycombinator.com\/item?id=17736357<\/a>

Posted by sgillen<\/a> (karma: 579)*Post stats: Points: 141 - Comments: 26 - 2018-08-10T20:07:01Z<\/em> <\/p> *

*#HackerNews<\/a> #2014<\/a> #bounds<\/a> #error<\/a> #intel<\/a> #quintillion<\/a> #underestimates<\/a> <\/p> *

**Article content:<\/strong> <\/p> **

**Catastrophic cancellation, enshrined in silicon<\/p> **

`C90FDAA2 2168C234 C4C6628B 80DC1CD1 29024E08 8A67CC74 <\/code><\/pre>`

But Intel doesn\u2019t use a 128-bit approximation to pi. Their approximation has just 66 bits. Here it is, taken from their manuals:<\/p>

`C90FDAA2 2168C234 C <\/code><\/pre>`

Therein lies the problem. When doing range reduction from double-precision (53-bit mantissa) pi the results will have about 13 bits of precision (66 minus 53), for an error of up to 2^40 ULPs (53 minus 13). The measured error is lower at approximately 164 billion ULPs (~37 bits), but that\u2019s still large. When doing range reduction from the 80-bit long-double format there will be almost complete cancellation and the worst-case error is [8]1.376 quintillion ULPs.<\/p>

Oops.<\/p>

Actually calculating sin<\/p>

[9]Not pi - just an approximationAfter range reduction the next step is to calculate sin. For numbers that are sufficiently close to pi this is trivial \u2013 the range-reduced number is the answer. Therefore, for double(pi) the error in the range reduction is the error in fsin, and fsin is inaccurate near pi, contradicting Intel\u2019s documentation. QED.<\/p>

Note that sin(double(pi)) should not return zero. The sine of pi is, mathematically, zero, but since float\/double\/long-double can\u2019t represent pi it would actually be incorrect for sin() to return zero when passed an approximation to pi.<\/p>

Frustration<\/p>

It is surprising that fsin is so inaccurate. I could perhaps forgive it for being inaccurate for extremely large inputs (which it is) but it is hard to forgive it for being so inaccurate on pi which is, ultimately, a very \u2018normal\u2019 input to the sin() function. Ah, the age-old decisions that we\u2019re then stuck with for decades.<\/p>

I also find Intel\u2019s documentation frustrating. In their most recent Volume 2: Instruction Set Reference they say that the range for fsin is -2^63 to +2^63. Then, in Volume 3: System Programming Guide, section 22.18.8 (Transcendental Instructions) they say \u201cresults will have a worst case error of less than 1 ulp when rounding to the nearest-even\u201d. They reiterate this in section 8.3.10 (Transcendental Instruction Accuracy) of Volume 1: Basic Architecture.<\/p>

Section 8.3.8 (Pi) of the most recent Volume 1: Basic Architecture documentation from Intel discusses the 66-bit approximation for pi that the x87 CPU uses, and says that it has been \u201cchosen to guarantee no loss of significance in a source operand\u201d, which is simply not true. This optimistic documentation has consequences. In 2001 an up-and-coming Linux kernel programmer [10]believed this documentation and used it to argue against adding an fsin wrapper function to glibc.<\/p>

Similar problems apply to fcos near pi\/2. See [11]this blog post for more details and pretty pictures.<\/p>

Intel has known for years that these instructions are [12]not as accurate as promised. They are now [13]making updates to their documentation. Updating the instruction is not a realistic option.<\/p>

I ran my tests on the 2.19 version of glibc (Ubuntu 14.04) and found that it no longer uses fsin. Clearly the glibc developers are aware of the problems with fsin and have stopped using it. That\u2019s good because g++ sometimes [14]calculates sin at compile-time, and the compile-time version is far more accurate than fsin, which makes for some bizarre inconsistencies when using glibc 2.15, which can make [15]floating-point determinism challenging.<\/p>

Is it a bug?<\/p>

When Intel\u2019s fdiv instruction was [16]found to be flawed it was clearly a bug. The [17]IEEE-754 spec says exactly how fdiv should be calculated (perfectly rounded) but imposes no restrictions on fsin. Intel\u2019s implementation of fsin is clearly weak, but for compatibility reasons it probably can\u2019t be changed so it can\u2019t really be considered a bug at this point.<\/p>

However the documentation is buggy. By claiming near-perfect accuracy across a huge domain and then only providing it in a tiny domain Intel has misled developers and caused them to make poor decisions. [18]Scott Duplichan\u2019s research shows that at long-double precision the fsin fails to meet its guarantees at numbers as small as 3.01626.<\/p>

For double-precision the guarantees are not met for numbers as small as about 3.14157. Another way of looking at this is that there are tens of billions of doubles near pi where the double-precision result from fsin fails to meet Intel\u2019s precision guarantees.<\/p>

The absolute error in the range I was looking at was fairly constant at about 4e-21, which is quite small. For many purposes this will not matter. However the relative error can get quite large, and for the domains where that matters the misleading documentation can easily be a problem.<\/p>

I\u2019m not the first to discover this by any means, but many people are not aware of this quirk. While it is getting increasingly difficult to actually end up using the fsin instruction, it still shows up occasionally, especially when using older versions of glibc. And, I think this behavior is interesting, so I\u2019m sharing.<\/p>

That\u2019s it. You can stop reading here if you want. The rest is just variations and details on the theme.<\/p>

How I discovered this<\/p>

I ran across this problem when I was planning to do a tweet version of my favorite pi\/sin() identity<\/p>

`double pi_d = 3.14159265358979323846; printf(\u201cpi = %.33f\\n\u00a0\u00a0 + %.33f\\n\u201d, pi_d, sin(pi_d)); <\/code><\/pre>`

I compiled this with g++, confident that it would work. But before sending the tweet I checked the results \u2013 I manually added the two rows of numbers \u2013 and I was confused when I got the wrong answer \u2013 I was expecting 33 digits of pi and I only got 21.<\/p>

I then printed the integer representation of sin(pi_d) in hex and it was 0x3ca1a60000000000. That many zeroes is very strange for a transcendental function.<\/p>

At first I thought that the x87 rounding mode had been set to float for some reason, although in hindsight the result did not even have float accuracy. When I discovered that the code worked on my desktop Linux machine I assumed that the virtual machine I\u2019d been using for testing must have a bug.<\/p>

It took a bit of testing and experimentation and stepping into various sin() functions to realize that the \u2018bug\u2019 was in the actual fsin instruction. It\u2019s hidden on 64-bit Linux (my desktop Linux machine is 64-bit) and with VC++ because they both have a hand-written sin() function that doesn\u2019t use fsin.<\/p>

I experimented a bit with comparing sin() to fsin with VC++ to characterize the problem and eventually, with help from [19]Scott Duplichan\u2019s expose on this topic, I recognized the problem as being a symptom of catastrophic cancellation in the range reduction. I\u2019m sure I would have uncovered the truth faster if Intel\u2019s documentation had not filled me with unjustified optimism.<\/p>

Amusingly enough, if I\u2019d followed my usual practice and declared pi_d as const then the code would have worked as expected and I would never have discovered the problem with fsin!<\/p>

Code!<\/p>

If you want to try this technique for calculating pi but you don\u2019t like manually adding 34-digit numbers then you can try this very crude code to do it for you:<\/p>

`void PiAddition() { \u00a0\u00a0double pi_d = 3.14159265358979323846; \u00a0\u00a0double sin_d = sin(pi_d); \u00a0\u00a0printf(\u201cpi = %.33f\\n + %.33f\\n\u201d, pi_d, sin_d); \u00a0\u00a0char pi_s[100]; char sin_s[100]; char result_s[100] = {}; \u00a0\u00a0snprintf(pi_s, sizeof(pi_s), \u201c%.33f\u201d, pi_d); \u00a0\u00a0snprintf(sin_s, sizeof(sin_s), \u201c%.33f\u201d, sin_d); \u00a0\u00a0int carry = 0; \u00a0\u00a0for (int i = strlen(pi_s) \u2013 1; i >= 0; \u2013i) { \u00a0\u00a0\u00a0\u00a0result_s[i] = pi_s[i]; \u00a0\u00a0\u00a0\u00a0if (pi_s[i] != \u2018.\u2019) { \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0char d = pi_s[i] + sin_s[i] + carry \u2013 \u20180\u2019 * 2; \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0carry = d > 9; \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0result_s[i] = d % 10 + \u20180\u2019; \u00a0\u00a0\u00a0\u00a0} \u00a0\u00a0} \u00a0\u00a0printf(\u201d = %s\\n\u201d, result_s); } <\/code><\/pre>`

The code makes lots of assumptions (numbers aligned, both numbers positive) but it does demonstrate the technique quite nicely. Here are some results:<\/p>

With g++ on 32-bit Linux with glibc 2.15:<\/p>

`pi = 3.141592653589793115997963468544185 \u00a0\u00a0\u00a0+ 0.000000000000000122460635382237726 \u00a0\u00a0\u00a0= 3.141592653589793238458598850781911 <\/code><\/pre>`

With VC++ 2013 Update 3:<\/p>

`pi = 3.141592653589793100000000000000000 \u00a0\u00a0\u00a0+ 0.000000000000000122464679914735320 \u00a0\u00a0\u00a0= 3.141592653589793222464679914735320 <\/code><\/pre>`

With VC++ Dev 14 CTP 3 and g++ on 64-bit Linux or glibc 2.19:<\/p>

`pi = 3.141592653589793115997963468544185 \u00a0\u00a0\u00a0+ 0.000000000000000122464679914735321 \u00a0\u00a0\u00a0= 3.141592653589793238462643383279506 <\/code><\/pre>`

In other words, if the sixth digit of sin(double(pi)) is four then you have the correct value for sin(), but if it is zero then you have the incorrect fsin version.<\/p>

Compile-time versus run-time sin<\/p>

g++ tries to do as much math as possible at compile time, which further complicates this messy situation. If g++ detects that pi_d is constant then it will calculate sin(pi_d) at compile time, [20]using MPFR. In the sample code above if pi_d is declared as \u2018const\u2019 then the results are quite different. This difference between compile-time and run-time sin() in g++ with glibc 2.15 can be seen with this code and its non-unity result:<\/p>

`const double pi_d = 3.14159265358979323846; const int zero = argc \/ 99; printf(\u201c%f\\n\u201d, sin(pi_d + zero) \/ sin(pi_d)); 0.999967 <\/code><\/pre>`

Or, this equally awesome code that also prints 0.999967:<\/p>

`const double pi_d = 3.14159265358979323846; double pi_d2 = pi_d; printf(\u201c%f\\n\u201d, sin(pi_d2) \/ sin(pi_d)); <\/code><\/pre>`

It\u2019s almost a shame that these goofy results are going away in glibc 2.19.<\/p>

Sine ~zero == ~zero<\/p>

Some years ago I was optimizing all of the trigonometric functions for an embedded system. I was making them all run about five times faster and my goal was to get identical results. However I had to abandon that goal when I realized that many of the functions, including sin(), gave incorrect answers on denormal inputs. Knowing that the sine of a denormal is the denormal helped me realized that the original versions were wrong, and I changed my goal to be identical results except when the old results were wrong.<\/p>

66 == 68<\/p>

Intel describes their internal value for pi as having 66 bits, but it arguably has 68 bits because the next three bits are all zero, so their internal 66-bit value is a correctly rounded 68-bit approximation of pi. That\u2019s why some of the results for fsin are slightly more accurate than you might initially expect.<\/p>

Sin() is trivial (for almost half of input values)<\/p>

For small enough doubles, the best answer for sin(x) is x. That is, for numbers smaller than about 1.49e-8, the closest double to the sine of x is actually x itself. I learned this from the [21]glibc source code for sin(). Since half of all doubles have a magnitude below 1.0, and since doubles have enormous range, it turns out that those numbers with magnitudes below 1.49e-8 represent about 48.6% of all doubles. In the glibc sin() source code it tests for the top-half of the integer representation of abs(x) being less than 0x3e500000, roughly equivalent to <= 1.49e-8, and that represents 48.6% of the entire range for positive doubles (0x80000000). The \u2018missing\u2019 1.4% is [22]the numbers between 1.49e-8 and 1.<\/p>

So, if asked to write a sin() function I will volunteer to do the easy (almost) half and leave the rest for somebody else.<\/p>

References<\/p>

Details of discrepancies in several of these instructions: [23]http:\/\/notabs.org\/fpuaccuracy\/index.htm<\/a><\/p>

The Inquirer mocking Intel: [24]http:\/\/www.theinquirer.net\/inquirer\/news\/1023715\/pentium-10-years-old-real-soon-now<\/a><\/p>

Intel acknowledging that using a software implementation of sin()\/cos() might be wise: [25]https:\/\/software.intel.com\/en-us\/forums\/topic\/289702<\/a><\/p>

Linus Torvalds believing the documentation: [26]http:\/\/gcc.gnu.org\/ml\/gcc\/2001-08\/msg00679.html<\/a><\/p>

The Intel documentation has caused confusion seen in this pair of bugs: [27]https:\/\/gcc.gnu.org\/bugzilla\/show_bug.cgi?id=43490<\/a> and [28]https:\/\/sourceware.org\/bugzilla\/show_bug.cgi?id=11422<\/a>.<\/p>

100,000 digits of pi: [29]http:\/\/www.geom.uiuc.edu\/~huberty\/math5337\/groupe\/digits.html<\/a><\/p>

Range reduction done right: [30]http:\/\/www.imada.sdu.dk\/~kornerup\/papers\/RR2.pdf<\/a><\/p>

gcc 4.3 improvements including using MPFR for transcendental functions: [31]http:\/\/gcc.gnu.org\/gcc-4.3\/changes.html<\/a><\/p>

Intel\u2019s blog post discussing fixes to fsin documentation: [32]https:\/\/software.intel.com\/blogs\/2014\/10\/09\/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software<\/a><\/p>

Java bug report that covers this: [33]http:\/\/bugs.java.com\/view_bug.do?bug_id=4306749<\/a><\/p>

Nice explanation of another extreme fsin error: [34]http:\/\/www.reddit.com\/r\/programming\/comments\/2is4nj\/intel_underestimates_error_bounds_by_13\/cl54afb<\/a><\/p>

Three reddit dicussions of this blog post: [35]programming, [36]math, and [37]hardware<\/p>

Hacker news discussion: [38]https:\/\/news.ycombinator.com\/item?id=8434128<\/a><\/p>

Slashdot discussion: [39]here<\/p>

Calculating Pi [40]using sin() in a Google search ([41]seen on reddit)<\/p>

Calculating 60+ digits of Pi using sin() in a [42]Wolfram Alpha search<\/p>

Visible links

1. https:\/\/randomascii.files.wordpress.com\/2014\/10\/image.png<\/a>

2. http:\/\/www.intel.com\/content\/www\/us\/en\/processors\/architectures-software-developer-manuals.html<\/a>

3. https:\/\/software.intel.com\/blogs\/2014\/10\/09\/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software<\/a>

4. http:\/\/blogs.msdn.com\/b\/vcblog\/archive\/2014\/06\/18\/crt-features-fixes-and-breaking-changes-in-visual-studio-14-ctp1.aspx<\/a>

5. http:\/\/www.imada.sdu.dk\/~kornerup\/papers\/RR2.pdf<\/a>

6. https:\/\/randomascii.wordpress.com\/2012\/02\/25\/comparing-floating-point-numbers-2012-edition\/<\/a>

7. http:\/\/www.geom.uiuc.edu\/~huberty\/math5337\/groupe\/digits.html<\/a>

8. http:\/\/notabs.org\/fpuaccuracy\/fpu-examples.htm<\/a>

9. https:\/\/randomascii.files.wordpress.com\/2014\/10\/image1.png<\/a>

10. http:\/\/gcc.gnu.org\/ml\/gcc\/2001-08\/msg00679.html<\/a>

11. http:\/\/notabs.org\/fpuaccuracy\/index.htm<\/a>

12. https:\/\/software.intel.com\/en-us\/forums\/topic\/289702<\/a>

13. https:\/\/software.intel.com\/blogs\/2014\/10\/09\/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software<\/a>

14. http:\/\/gcc.gnu.org\/gcc-4.3\/changes.html<\/a>

15. https:\/\/randomascii.wordpress.com\/2014\/10\/09\/intel-underestimates-error-bounds-by-1-3-quintillion\/<\/a>

16. http:\/\/en.wikipedia.org\/wiki\/Pentium_FDIV_bug<\/a>

17. http:\/\/en.wikipedia.org\/wiki\/IEEE_floating_point<\/a>

18. http:\/\/notabs.org\/fpuaccuracy\/fpu-examples.htm<\/a>

19. http:\/\/notabs.org\/fpuaccuracy\/index.htm<\/a>

20. http:\/\/gcc.gnu.org\/gcc-4.3\/changes.html<\/a>

21. https:\/\/sourceware.org\/git\/?p=glibc.git;a=blob;f=sysdeps\/ieee754\/dbl-64\/s_sin.c;hb=HEAD#l281<\/a>

22. https:\/\/twitter.com\/jcalsbeek\/status\/522614184194760704<\/a>

23. http:\/\/notabs.org\/fpuaccuracy\/index.htm<\/a>

24. http:\/\/www.theinquirer.net\/inquirer\/news\/1023715\/pentium-10-years-old-real-soon-now<\/a>

25. https:\/\/software.intel.com\/en-us\/forums\/topic\/289702<\/a>

26. http:\/\/gcc.gnu.org\/ml\/gcc\/2001-08\/msg00679.html<\/a>

27. https:\/\/gcc.gnu.org\/bugzilla\/show_bug.cgi?id=43490<\/a>

28. https:\/\/sourceware.org\/bugzilla\/show_bug.cgi?id=11422<\/a>

29. http:\/\/www.geom.uiuc.edu\/~huberty\/math5337\/groupe\/digits.html<\/a>

30. http:\/\/www.imada.sdu.dk\/~kornerup\/papers\/RR2.pdf<\/a>

31. http:\/\/gcc.gnu.org\/gcc-4.3\/changes.html<\/a>

32. https:\/\/software.intel.com\/blogs\/2014\/10\/09\/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software<\/a>

33. http:\/\/bugs.java.com\/view_bug.do?bug_id=4306749<\/a>

34. http:\/\/www.reddit.com\/r\/programming\/comments\/2is4nj\/intel_underestimates_error_bounds_by_13\/cl54afb<\/a>

35. http:\/\/www.reddit.com\/r\/programming\/comments\/2is4nj\/intel_underestimates_error_bounds_by_13\/<\/a>

36. http:\/\/www.reddit.com\/r\/math\/comments\/2ishlb\/intel_underestimates_error_bounds_by_13\/<\/a>

37. http:\/\/www.reddit.com\/r\/hardware\/comments\/2iyou6\/intel_underestimates_processor_error_bounds_by_13\/<\/a>

38. https:\/\/news.ycombinator.com\/item?id=8434128<\/a>

39. http:\/\/hardware.slashdot.org\/story\/14\/10\/10\/193217\/where-intel-processors-fail-at-math-again<\/a>

40. https:\/\/www.google.com\/search?q=3+%2B+sin%28+3+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%29+%29&oq=3+%2B+sin%28+3+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%29+%29<\/a>

41. http:\/\/www.reddit.com\/r\/programming\/comments\/2is4nj\/intel_underestimates_error_bounds_by_13\/cl54x74<\/a>

42. http:\/\/www.wolframalpha.com\/input\/?i=3%2Bsin%283%29+%2B+sin%283%2Bsin%283%29%29+%2B+sin%283%2Bsin%283%29+%2B+sin%283%2Bsin%283%29%29%29%2Bsin%283%2Bsin%283%29+%2B+sin%283%2Bsin%283%29%29+%2B+sin%283%2Bsin%283%29+%2B+sin%283%2Bsin%283%29%29%29%29<\/a> <\/p>## Intel Underestimates Error Bounds by 1.3 quintillion (2014)<\/a><\/h4>\n\n

Intel\u2019s manuals for their x86\/x64 processor clearly state that the fsin instruction (calculating the trigonometric sine) has a maximum error, in round-to-nearest mode, of one unit in the last place\u2026

\nArticle word count: 2520 <\/p>\n\nHN Discussion: https:\/\/news.ycombinator.com\/item?id=17736357<\/a>

\nPosted by sgillen<\/a> (karma: 579)Post stats: Points: 141 - Comments: 26 - 2018-08-10T20:07:01Z<\/em> <\/p>\n\n

#HackerNews<\/a> #2014<\/a> #bounds<\/a> #error<\/a> #intel<\/a> #quintillion<\/a> #underestimates<\/a> <\/p>\n\n

Article content:<\/strong> <\/p>\n\n

Catastrophic cancellation, enshrined in silicon<\/p>\n\n`C90FDAA2 2168C234 C4C6628B 80DC1CD1 29024E08 8A67CC74\n<\/code><\/pre>\n\n`

But Intel doesn\u2019t use a 128-bit approximation to pi. Their approximation has just 66 bits. Here it is, taken from their manuals:<\/p>\n\n

`C90FDAA2 2168C234 C\n<\/code><\/pre>\n\n`

Therein lies the problem. When doing range reduction from double-precision (53-bit mantissa) pi the results will have about 13 bits of precision (66 minus 53), for an error of up to 2^40 ULPs (53 minus 13). The measured error is lower at approximately 164 billion ULPs (~37 bits), but that\u2019s still large. When doing range reduction from the 80-bit long-double format there will be almost complete cancellation and the worst-case error is [8]1.376 quintillion ULPs.<\/p>\n\n

Oops.<\/p>\n\n

Actually calculating sin<\/p>\n\n

[9]Not pi - just an approximationAfter range reduction the next step is to calculate sin. For numbers that are sufficiently close to pi this is trivial \u2013 the range-reduced number is the answer. Therefore, for double(pi) the error in the range reduction is the error in fsin, and fsin is inaccurate near pi, contradicting Intel\u2019s documentation. QED.<\/p>\n\n

Note that sin(double(pi)) should not return zero. The sine of pi is, mathematically, zero, but since float\/double\/long-double can\u2019t represent pi it would actually be incorrect for sin() to return zero when passed an approximation to pi.<\/p>\n\n

Frustration<\/p>\n\n

It is surprising that fsin is so inaccurate. I could perhaps forgive it for being inaccurate for extremely large inputs (which it is) but it is hard to forgive it for being so inaccurate on pi which is, ultimately, a very \u2018normal\u2019 input to the sin() function. Ah, the age-old decisions that we\u2019re then stuck with for decades.<\/p>\n\n

I also find Intel\u2019s documentation frustrating. In their most recent Volume 2: Instruction Set Reference they say that the range for fsin is -2^63 to +2^63. Then, in Volume 3: System Programming Guide, section 22.18.8 (Transcendental Instructions) they say \u201cresults will have a worst case error of less than 1 ulp when rounding to the nearest-even\u201d. They reiterate this in section 8.3.10 (Transcendental Instruction Accuracy) of Volume 1: Basic Architecture.<\/p>\n\n

Section 8.3.8 (Pi) of the most recent Volume 1: Basic Architecture documentation from Intel discusses the 66-bit approximation for pi that the x87 CPU uses, and says that it has been \u201cchosen to guarantee no loss of significance in a source operand\u201d, which is simply not true. This optimistic documentation has consequences. In 2001 an up-and-coming Linux kernel programmer [10]believed this documentation and used it to argue against adding an fsin wrapper function to glibc.<\/p>\n\n

Similar problems apply to fcos near pi\/2. See [11]this blog post for more details and pretty pictures.<\/p>\n\n

Intel has known for years that these instructions are [12]not as accurate as promised. They are now [13]making updates to their documentation. Updating the instruction is not a realistic option.<\/p>\n\n

I ran my tests on the 2.19 version of glibc (Ubuntu 14.04) and found that it no longer uses fsin. Clearly the glibc developers are aware of the problems with fsin and have stopped using it. That\u2019s good because g++ sometimes [14]calculates sin at compile-time, and the compile-time version is far more accurate than fsin, which makes for some bizarre inconsistencies when using glibc 2.15, which can make [15]floating-point determinism challenging.<\/p>\n\n

Is it a bug?<\/p>\n\n

When Intel\u2019s fdiv instruction was [16]found to be flawed it was clearly a bug. The [17]IEEE-754 spec says exactly how fdiv should be calculated (perfectly rounded) but imposes no restrictions on fsin. Intel\u2019s implementation of fsin is clearly weak, but for compatibility reasons it probably can\u2019t be changed so it can\u2019t really be considered a bug at this point.<\/p>\n\n

However the documentation is buggy. By claiming near-perfect accuracy across a huge domain and then only providing it in a tiny domain Intel has misled developers and caused them to make poor decisions. [18]Scott Duplichan\u2019s research shows that at long-double precision the fsin fails to meet its guarantees at numbers as small as 3.01626.<\/p>\n\n

For double-precision the guarantees are not met for numbers as small as about 3.14157. Another way of looking at this is that there are tens of billions of doubles near pi where the double-precision result from fsin fails to meet Intel\u2019s precision guarantees.<\/p>\n\n

The absolute error in the range I was looking at was fairly constant at about 4e-21, which is quite small. For many purposes this will not matter. However the relative error can get quite large, and for the domains where that matters the misleading documentation can easily be a problem.<\/p>\n\n

I\u2019m not the first to discover this by any means, but many people are not aware of this quirk. While it is getting increasingly difficult to actually end up using the fsin instruction, it still shows up occasionally, especially when using older versions of glibc. And, I think this behavior is interesting, so I\u2019m sharing.<\/p>\n\n

That\u2019s it. You can stop reading here if you want. The rest is just variations and details on the theme.<\/p>\n\n

How I discovered this<\/p>\n\n

I ran across this problem when I was planning to do a tweet version of my favorite pi\/sin() identity<\/p>\n\n

`double pi_d = 3.14159265358979323846;\n printf(\u201cpi = %.33f\\n\u00a0\u00a0 + %.33f\\n\u201d, pi_d, sin(pi_d));\n<\/code><\/pre>\n\n`

I compiled this with g++, confident that it would work. But before sending the tweet I checked the results \u2013 I manually added the two rows of numbers \u2013 and I was confused when I got the wrong answer \u2013 I was expecting 33 digits of pi and I only got 21.<\/p>\n\n

I then printed the integer representation of sin(pi_d) in hex and it was 0x3ca1a60000000000. That many zeroes is very strange for a transcendental function.<\/p>\n\n

At first I thought that the x87 rounding mode had been set to float for some reason, although in hindsight the result did not even have float accuracy. When I discovered that the code worked on my desktop Linux machine I assumed that the virtual machine I\u2019d been using for testing must have a bug.<\/p>\n\n

It took a bit of testing and experimentation and stepping into various sin() functions to realize that the \u2018bug\u2019 was in the actual fsin instruction. It\u2019s hidden on 64-bit Linux (my desktop Linux machine is 64-bit) and with VC++ because they both have a hand-written sin() function that doesn\u2019t use fsin.<\/p>\n\n

I experimented a bit with comparing sin() to fsin with VC++ to characterize the problem and eventually, with help from [19]Scott Duplichan\u2019s expose on this topic, I recognized the problem as being a symptom of catastrophic cancellation in the range reduction. I\u2019m sure I would have uncovered the truth faster if Intel\u2019s documentation had not filled me with unjustified optimism.<\/p>\n\n

Amusingly enough, if I\u2019d followed my usual practice and declared pi_d as const then the code would have worked as expected and I would never have discovered the problem with fsin!<\/p>\n\n

Code!<\/p>\n\n

If you want to try this technique for calculating pi but you don\u2019t like manually adding 34-digit numbers then you can try this very crude code to do it for you:<\/p>\n\n

`void PiAddition() { \u00a0\u00a0double pi_d = 3.14159265358979323846; \u00a0\u00a0double sin_d = sin(pi_d);\n\n \u00a0\u00a0printf(\u201cpi = %.33f\\n + %.33f\\n\u201d, pi_d, sin_d);\n\n \u00a0\u00a0char pi_s[100]; char sin_s[100]; char result_s[100] = {}; \u00a0\u00a0snprintf(pi_s, sizeof(pi_s), \u201c%.33f\u201d, pi_d); \u00a0\u00a0snprintf(sin_s, sizeof(sin_s), \u201c%.33f\u201d, sin_d); \u00a0\u00a0int carry = 0; \u00a0\u00a0for (int i = strlen(pi_s) \u2013 1; i >= 0; \u2013i) { \u00a0\u00a0\u00a0\u00a0result_s[i] = pi_s[i]; \u00a0\u00a0\u00a0\u00a0if (pi_s[i] != \u2018.\u2019) { \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0char d = pi_s[i] + sin_s[i] + carry \u2013 \u20180\u2019 * 2; \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0carry = d > 9; \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0result_s[i] = d % 10 + \u20180\u2019; \u00a0\u00a0\u00a0\u00a0} \u00a0\u00a0} \u00a0\u00a0printf(\u201d = %s\\n\u201d, result_s);\n\n }\n<\/code><\/pre>\n\n`

The code makes lots of assumptions (numbers aligned, both numbers positive) but it does demonstrate the technique quite nicely. Here are some results:<\/p>\n\n

With g++ on 32-bit Linux with glibc 2.15:<\/p>\n\n

`pi = 3.141592653589793115997963468544185\n \u00a0\u00a0\u00a0+ 0.000000000000000122460635382237726\n \u00a0\u00a0\u00a0= 3.141592653589793238458598850781911\n<\/code><\/pre>\n\n`

With VC++ 2013 Update 3:<\/p>\n\n

`pi = 3.141592653589793100000000000000000 \u00a0\u00a0\u00a0+ 0.000000000000000122464679914735320\n\n \u00a0\u00a0\u00a0= 3.141592653589793222464679914735320\n<\/code><\/pre>\n\n`

With VC++ Dev 14 CTP 3 and g++ on 64-bit Linux or glibc 2.19:<\/p>\n\n

`pi = 3.141592653589793115997963468544185\n \u00a0\u00a0\u00a0+ 0.000000000000000122464679914735321\n \u00a0\u00a0\u00a0= 3.141592653589793238462643383279506\n<\/code><\/pre>\n\n`

In other words, if the sixth digit of sin(double(pi)) is four then you have the correct value for sin(), but if it is zero then you have the incorrect fsin version.<\/p>\n\n

Compile-time versus run-time sin<\/p>\n\n

g++ tries to do as much math as possible at compile time, which further complicates this messy situation. If g++ detects that pi_d is constant then it will calculate sin(pi_d) at compile time, [20]using MPFR. In the sample code above if pi_d is declared as \u2018const\u2019 then the results are quite different. This difference between compile-time and run-time sin() in g++ with glibc 2.15 can be seen with this code and its non-unity result:<\/p>\n\n

`const double pi_d = 3.14159265358979323846; const int zero = argc \/ 99;\n\n printf(\u201c%f\\n\u201d, sin(pi_d + zero) \/ sin(pi_d));\n\n 0.999967\n<\/code><\/pre>\n\n`

Or, this equally awesome code that also prints 0.999967:<\/p>\n\n

`const double pi_d = 3.14159265358979323846; double pi_d2 = pi_d;\n\n printf(\u201c%f\\n\u201d, sin(pi_d2) \/ sin(pi_d));\n<\/code><\/pre>\n\n`

It\u2019s almost a shame that these goofy results are going away in glibc 2.19.<\/p>\n\n

Sine ~zero == ~zero<\/p>\n\n

Some years ago I was optimizing all of the trigonometric functions for an embedded system. I was making them all run about five times faster and my goal was to get identical results. However I had to abandon that goal when I realized that many of the functions, including sin(), gave incorrect answers on denormal inputs. Knowing that the sine of a denormal is the denormal helped me realized that the original versions were wrong, and I changed my goal to be identical results except when the old results were wrong.<\/p>\n\n

66 == 68<\/p>\n\n

Intel describes their internal value for pi as having 66 bits, but it arguably has 68 bits because the next three bits are all zero, so their internal 66-bit value is a correctly rounded 68-bit approximation of pi. That\u2019s why some of the results for fsin are slightly more accurate than you might initially expect.<\/p>\n\n

Sin() is trivial (for almost half of input values)<\/p>\n\n

For small enough doubles, the best answer for sin(x) is x. That is, for numbers smaller than about 1.49e-8, the closest double to the sine of x is actually x itself. I learned this from the [21]glibc source code for sin(). Since half of all doubles have a magnitude below 1.0, and since doubles have enormous range, it turns out that those numbers with magnitudes below 1.49e-8 represent about 48.6% of all doubles. In the glibc sin() source code it tests for the top-half of the integer representation of abs(x) being less than 0x3e500000, roughly equivalent to <= 1.49e-8, and that represents 48.6% of the entire range for positive doubles (0x80000000). The \u2018missing\u2019 1.4% is [22]the numbers between 1.49e-8 and 1.<\/p>\n\n

So, if asked to write a sin() function I will volunteer to do the easy (almost) half and leave the rest for somebody else.<\/p>\n\n

References<\/p>\n\n

Details of discrepancies in several of these instructions: [23]http:\/\/notabs.org\/fpuaccuracy\/index.htm<\/a><\/p>\n\n

The Inquirer mocking Intel: [24]http:\/\/www.theinquirer.net\/inquirer\/news\/1023715\/pentium-10-years-old-real-soon-now<\/a><\/p>\n\n

Intel acknowledging that using a software implementation of sin()\/cos() might be wise: [25]https:\/\/software.intel.com\/en-us\/forums\/topic\/289702<\/a><\/p>\n\n

Linus Torvalds believing the documentation: [26]http:\/\/gcc.gnu.org\/ml\/gcc\/2001-08\/msg00679.html<\/a><\/p>\n\n

The Intel documentation has caused confusion seen in this pair of bugs: [27]https:\/\/gcc.gnu.org\/bugzilla\/show_bug.cgi?id=43490<\/a> and [28]https:\/\/sourceware.org\/bugzilla\/show_bug.cgi?id=11422<\/a>.<\/p>\n\n

100,000 digits of pi: [29]http:\/\/www.geom.uiuc.edu\/~huberty\/math5337\/groupe\/digits.html<\/a><\/p>\n\n

Range reduction done right: [30]http:\/\/www.imada.sdu.dk\/~kornerup\/papers\/RR2.pdf<\/a><\/p>\n\n

gcc 4.3 improvements including using MPFR for transcendental functions: [31]http:\/\/gcc.gnu.org\/gcc-4.3\/changes.html<\/a><\/p>\n\n

Intel\u2019s blog post discussing fixes to fsin documentation: [32]https:\/\/software.intel.com\/blogs\/2014\/10\/09\/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software<\/a><\/p>\n\n

Java bug report that covers this: [33]http:\/\/bugs.java.com\/view_bug.do?bug_id=4306749<\/a><\/p>\n\n

Nice explanation of another extreme fsin error: [34]http:\/\/www.reddit.com\/r\/programming\/comments\/2is4nj\/intel_underestimates_error_bounds_by_13\/cl54afb<\/a><\/p>\n\n

Three reddit dicussions of this blog post: [35]programming, [36]math, and [37]hardware<\/p>\n\n

Hacker news discussion: [38]https:\/\/news.ycombinator.com\/item?id=8434128<\/a><\/p>\n\n

Slashdot discussion: [39]here<\/p>\n\n

Calculating Pi [40]using sin() in a Google search ([41]seen on reddit)<\/p>\n\n

Calculating 60+ digits of Pi using sin() in a [42]Wolfram Alpha search<\/p>\n\n

Visible links

\n 1. https:\/\/randomascii.files.wordpress.com\/2014\/10\/image.png<\/a>

\n 2. http:\/\/www.intel.com\/content\/www\/us\/en\/processors\/architectures-software-developer-manuals.html<\/a>

\n 3. https:\/\/software.intel.com\/blogs\/2014\/10\/09\/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software<\/a>

\n 4. http:\/\/blogs.msdn.com\/b\/vcblog\/archive\/2014\/06\/18\/crt-features-fixes-and-breaking-changes-in-visual-studio-14-ctp1.aspx<\/a>

\n 5. http:\/\/www.imada.sdu.dk\/~kornerup\/papers\/RR2.pdf<\/a>

\n 6. https:\/\/randomascii.wordpress.com\/2012\/02\/25\/comparing-floating-point-numbers-2012-edition\/<\/a>

\n 7. http:\/\/www.geom.uiuc.edu\/~huberty\/math5337\/groupe\/digits.html<\/a>

\n 8. http:\/\/notabs.org\/fpuaccuracy\/fpu-examples.htm<\/a>

\n 9. https:\/\/randomascii.files.wordpress.com\/2014\/10\/image1.png<\/a>

\n 10. http:\/\/gcc.gnu.org\/ml\/gcc\/2001-08\/msg00679.html<\/a>

\n 11. http:\/\/notabs.org\/fpuaccuracy\/index.htm<\/a>

\n 12. https:\/\/software.intel.com\/en-us\/forums\/topic\/289702<\/a>

\n 13. https:\/\/software.intel.com\/blogs\/2014\/10\/09\/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software<\/a>

\n 14. http:\/\/gcc.gnu.org\/gcc-4.3\/changes.html<\/a>

\n 15. https:\/\/randomascii.wordpress.com\/2014\/10\/09\/intel-underestimates-error-bounds-by-1-3-quintillion\/<\/a>

\n 16. http:\/\/en.wikipedia.org\/wiki\/Pentium_FDIV_bug<\/a>

\n 17. http:\/\/en.wikipedia.org\/wiki\/IEEE_floating_point<\/a>

\n 18. http:\/\/notabs.org\/fpuaccuracy\/fpu-examples.htm<\/a>

\n 19. http:\/\/notabs.org\/fpuaccuracy\/index.htm<\/a>

\n 20. http:\/\/gcc.gnu.org\/gcc-4.3\/changes.html<\/a>

\n 21. https:\/\/sourceware.org\/git\/?p=glibc.git;a=blob;f=sysdeps\/ieee754\/dbl-64\/s_sin.c;hb=HEAD#l281<\/a>

\n 22. https:\/\/twitter.com\/jcalsbeek\/status\/522614184194760704<\/a>

\n 23. http:\/\/notabs.org\/fpuaccuracy\/index.htm<\/a>

\n 24. http:\/\/www.theinquirer.net\/inquirer\/news\/1023715\/pentium-10-years-old-real-soon-now<\/a>

\n 25. https:\/\/software.intel.com\/en-us\/forums\/topic\/289702<\/a>

\n 26. http:\/\/gcc.gnu.org\/ml\/gcc\/2001-08\/msg00679.html<\/a>

\n 27. https:\/\/gcc.gnu.org\/bugzilla\/show_bug.cgi?id=43490<\/a>

\n 28. https:\/\/sourceware.org\/bugzilla\/show_bug.cgi?id=11422<\/a>

\n 29. http:\/\/www.geom.uiuc.edu\/~huberty\/math5337\/groupe\/digits.html<\/a>

\n 30. http:\/\/www.imada.sdu.dk\/~kornerup\/papers\/RR2.pdf<\/a>

\n 31. http:\/\/gcc.gnu.org\/gcc-4.3\/changes.html<\/a>

\n 32. https:\/\/software.intel.com\/blogs\/2014\/10\/09\/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software<\/a>

\n 33. http:\/\/bugs.java.com\/view_bug.do?bug_id=4306749<\/a>

\n 34. http:\/\/www.reddit.com\/r\/programming\/comments\/2is4nj\/intel_underestimates_error_bounds_by_13\/cl54afb<\/a>

\n 35. http:\/\/www.reddit.com\/r\/programming\/comments\/2is4nj\/intel_underestimates_error_bounds_by_13\/<\/a>

\n 36. http:\/\/www.reddit.com\/r\/math\/comments\/2ishlb\/intel_underestimates_error_bounds_by_13\/<\/a>

\n 37. http:\/\/www.reddit.com\/r\/hardware\/comments\/2iyou6\/intel_underestimates_processor_error_bounds_by_13\/<\/a>

\n 38. https:\/\/news.ycombinator.com\/item?id=8434128<\/a>

\n 39. http:\/\/hardware.slashdot.org\/story\/14\/10\/10\/193217\/where-intel-processors-fail-at-math-again<\/a>

\n 40. https:\/\/www.google.com\/search?q=3+%2B+sin%28+3+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%29+%29&oq=3+%2B+sin%28+3+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%2B+sin%28+3+%2B+sin%28+3+%29+%29+%29<\/a>

\n 41. http:\/\/www.reddit.com\/r\/programming\/comments\/2is4nj\/intel_underestimates_error_bounds_by_13\/cl54x74<\/a>

\n 42. http:\/\/www.wolframalpha.com\/input\/?i=3%2Bsin%283%29+%2B+sin%283%2Bsin%283%29%29+%2B+sin%283%2Bsin%283%29+%2B+sin%283%2Bsin%283%29%29%29%2Bsin%283%2Bsin%283%29+%2B+sin%283%2Bsin%283%29%29+%2B+sin%283%2Bsin%283%29+%2B+sin%283%2Bsin%283%29%29%29%29<\/a> <\/p>\n\n