Better function approximations: Taylor vs. Remez
You may have once crossed this particular piece of magic:
The right part is the Taylor series of sin around 0. It converges very quickly to the actual value of sin(a). This allows a computer to compute the sine of a number with arbitrary precision.
And when I say it’s magic, it’s because it is! Some functions, called the entire functions, can be computed everywhere using one single formula! Other functions may require a different formula for different intervals; they are the analytic functions, a superset of the entire functions. In general, Taylor series are an extremely powerful tool to compute the value of a given function with very high accuracy, because for several common functions such as sin, tan or exp the terms of the series are easy to compute and, when implemented on a computer, can even be stored in a table at compile time.
Approximating sin with Taylor series
This is how one would approximate sin using 7 terms of its Taylor series on the [-π/2, π/2] interval. The more terms, the better the precision, but we’ll stop at 7 for now:
static double taylorsin(double x) { static const double a0 = 1.0, a1 = -1.666666666666666666666666666666e-1, /* -1/3! */ a2 = 8.333333333333333333333333333333e-3, /* 1/5! */ a3 = -1.984126984126984126984126984126e-4, /* -1/7! */ a4 = 2.755731922398589065255731922398e-6, /* 1/9! */ a5 = -2.505210838544171877505210838544e-8, /* -1/11! */ a6 = 1.605904383682161459939237717015e-10; /* 1/13! */ double x2 = x * x; return x * (a0 + x2 * (a1 + x2 * (a2 + x2 * (a3 + x2 * (a4 + x2 * (a5 + x2 * a6)))))); }
And you may think…
“Oh wow that is awesome! So simple for such a difficult function. Also, since I read your masterpiece about polynomial evaluation I know how to improve that function so that it is very fast!”
Well, actually, no.
If you are approximating a function over an interval using its Taylor series then either you or the person who taught you is a fucking idiot because a Taylor series approximates a function near a fucking point, not over a fucking interval, and if you don’t understand why it’s important then please read on because that shit is gonna blow your mind.
Error measurement
Let’s have a look at how much error our approximation introduces. The formula for the absolute error is simple:
And this is how it looks like over our interval:
You can see that the error skyrockets near the edges of the [-π/2, π/2] interval.
“Well the usual way to fix this is to split the interval in two or more parts, and use a different Taylor series for each interval.”
Oh, really? Well, let’s see the error on [-π/4, π/4] instead:
I see no difference! The error is indeed smaller, but again, it becomes extremely large at the edges of the interval. And before you start suggesting reducing the interval even more, here is the error on [-π/8, π/8] now:
I hope this makes it clear that:
- the further from the centre of the interval, the larger the error
- the error distribution is very unbalanced
- the maximum error on [-π/2, π/2] is about 6.63e-10
And now I am going to show you why that maximum error value is pathetic.
A better approximation
Consider this new function:
static double minimaxsin(double x) { static const double a0 = 1.0, a1 = -1.666666666640169148537065260055e-1, a2 = 8.333333316490113523036717102793e-3, a3 = -1.984126600659171392655484413285e-4, a4 = 2.755690114917374804474016589137e-6, a5 = -2.502845227292692953118686710787e-8, a6 = 1.538730635926417598443354215485e-10; double x2 = x * x; return x * (a0 + x2 * (a1 + x2 * (a2 + x2 * (a3 + x2 * (a4 + x2 * (a5 + x2 * a6)))))); }
It doesn’t look very different, right? Right. The values a0 to a6 are slightly different, but the rest of the code is strictly the same.
Yet what a difference it makes! Look at this error curve:
That new function makes it obvious that:
- the error distribution is better spread over the interval
- the maximum error on [-π/2, π/2] is about 4.96e-14
Check that last figure again. The new maximum error isn’t 10% better, or maybe twice as good. It is more than ten thousand times smaller!!
The minimax polynomial
The above coefficients describe a minimax polynomial: that is, the polynomial that minimises a given error when approximating a given function. I will not go into the mathematical details, but just remember this: if the function is sufficiently well-suited (as sin, tan, exp etc. are), then the minimax polynomial can be found.
The problem? It’s hard to find. The most popular algorithm to find it is the Remez exchange algorithm, and few people really seem to understand how it works (or there would be a lot less Taylor series). I am not going to explain it right now. Usually you need professional math tools such as Maple or Mathematica if you want to compute a minimax polynomial. The Boost library is a notable exception, though.
But you saw the results, so stop using Taylor series. Spending some time finding the minimax polynomial is definitely worth it. This is why I am working on a Remez framework that I will make public and free for everyone to use, modify and do what the fuck they want. In the meantime, if you have functions to numerically approximate, or Taylor-based implementations that you would like to improve, let me know in the comments! This will be great use cases for me.
Attachments (4)
- error-pi-2.png (10.0 KB) - added by sam 5 years ago.
- error-pi-4.png (7.3 KB) - added by sam 5 years ago.
- error-pi-8.png (7.3 KB) - added by sam 5 years ago.
- remez-error.png (25.2 KB) - added by sam 5 years ago.
Download all attachments as: .zip
Comments
Hi, Sam,
I searched high and low for high precision (not actually arbitrary, 32 digits would be enough for me now) approximation coeffs to the usual bunch of functions (trigonometric, hyperbolic and their inverses) and I found your blog after I have read the rest of the whole Net... :-) I don't have Mathematica or Maple (SciLab yes but it doesn't seem to have support for this application of Remez). I could compile Boost if you tell me there is no other alternative but do you tell me? :-) I don't think your framework is mature enough to use, isn't it?
Bye,
Hi, Sam,
I visited a friend with a Matlab and, dear me, it can't do more than binary double, amounting to some 16 digits. Sheesh...
Bye,
@Gábor: Well you will be happy to learn that I made a first release of LolRemez. I hope it can prove helpful!
Definitely, thanks, very interested, I'll check it out first thing in the next year... :-D
I'm also interested in the real package you use there. Right now I use 128-bit decimal floats (based on the DecNumber code) and I'm quite satisfied with it (actually, this is not gaming, speed is not of the utmost importance here, but it seems to be rather optimized maybe even for that). Functions are mostly Taylor now, because I needed to address other issues more urgently but the tangent family, as obvious, is very poor. I already started to calculate some simpler Chebyshev expansions but now I'll make an effort to use your Remez solution.
With the order above 20, I started to get real printouts like 10.000000000000000000000-1-3-4-2-8-5-6-2-6-6-2-1-9-6-6-50-9e-1. Now I modify my real package to work with your Remez implementation. And I don't really like the >> operator, by the way, I don't think it's elegant to have that with a real... :-D
Bye, Gábor
@Gábor: thanks for the feedback. The printout issues were a bug that is fixed in changelist [1121]. I agree about the >> operator; I removed it in changelist [1122] and made the integer division and multiplication detect when the operand is a power of two.
These changes will be in LolRemez 0.2 but you can probably apply them manually if you need them and don’t want to wait.
Yes, it started to behave very nicely now. If only the same could be said of inverse trigonometric functions... Those pesky near-one values ruin everything, as usual... :-)
@Gábor: usually inverse trigonometric functions use the following identity:
The downside is that you need to compute a square root. The upside is that the arcsin polynomial coefficients don't change, so there is little memory access overhead.
@Sam, I only want to get arctan now, the other two can be calculated then using the half angle formula. And I also want to experiment with the Euler arctan approximation formula now. With factorials and powers it seems to offer better convergence. I didn't have time to work on it yesterday but I'll check it out now.
Actually, even with simple sin and cos, in this precision range, Remez is not that much better than the naive Taylor. I could reach my machine epsilon with a 25th order Remez while naive Taylor requires some 30-35 iterations. Still some gain, of course, but not *that* spectacular. :-) But I do stress the precision range, the relative gain is nicer if you only use, say, double binary FP. But sin and cos behave themselves, of course, not like their inverse functions.
I have all other functions in the decNumber package, both logs, exp, pow, sqrt, so there is no problem there.
@Sam, actually, I solved everything based on Euler for atan, thanks, it works just fine for my purposes now.
@Gábor: I am sorry but I realised that an accuracy of 1e-18 was hardcoded in LolRemez 0.1. Usually this isn’t a problem because people target float or double. But if your machine epsilon is smaller, I recommend you use LolRemez 0.2 instead. Also, convergence is now significantly faster. Sorry for the inconvenience!
By the way, can I quote you on “I could reach my machine epsilon with a 25th order Remez while naive Taylor requires some 30-35 iterations. Still some gain, of course, but not *that* spectacular.”? I find this quite funny :-)
@Sam. Don't. :-)) I wouldn't be that liberal with it if I was writing games or any other piece of software that needs to call it repeatedly, of course. But in this very particular case, it is calculated once per user input and, frankly, 5 ms versus 10 ms isn't a difference you'd really notice in this case. So, I only wanted to stress that unlike with sin/cos, doing something to avoid Taylor with atan was really indispensable because even 10,000 iterations (for a macheps of 1e-34) couldn't make it. That's still a bit larger for a margin, isn't it? :-)
But, actually, 1e-34 is crazy, of course. No real need for it, either, just pure marketing... :-)
Hey Sam, you're a lifesaver! I've been searching the net for function approximation software for two days now - a lot a paper and a lot of talk - but not a lot of people seem to actually have the guts to do it! Thanks! I'm actually just using your code just to generate coefficients for waveshaping functions for real-time audio (atm) where approximation is a godsend since 4x Oversampling is used in general. Even with the odd bug here and there - it's still functional - and this is coming from a math and programming noob who has only started learning C++ two weeks ago. And yeah - The Taylor Series sucks balls!
Regards Andrew Ainslie South Africa
Found your website through the kvraudio developer forum. Heya, I tried lolremez0.2 and tanh(x) with 8 coefficients error 1.5266e-4 then I tried tanh's identity sinh(x)/cosh(x)- error 8.1442e-5...
Random programming thing - how do I add numbers to the formula? i.e. sin(2*x)? Getting compile errors here if I add numbers - probably missing something obvious...
@Andrew: I think not everything is implemented for mixed real/float operations, but if 2*x doesn't work, x*2 will probably.
Until I have implemented every operator, you can just cast the number to real: sin(real(2) * x). Also check out the bottom of real.h, it has predefined values for π, e, etc.
I'll check your tanh result to see what may cause the difference.
Hey Sam! Thanks for the help: x*2 does work.
Haven't tried casting the number yet. Also did a little experiment: tried the Gudermannian function two ways:
R_PI_2*(atan(tanh((x*R_2_SQRTPI)/2))) - this didn't work
then I tried:
(atan(tanh((x*1.1283791)/2)))*1.5707963 - this did work
Inanyway - thanks for the great software! It has already saved me a great deal of cycles. Looking forward to future development.
Regards Andrew
@Andrew: real is a class, not a namespace. In order to use R_PI_2 and the like, you need to write real::R_PI_2.
Thanks for your great software! I wanted to replace some Taylor-polynomials in my code for a long time, but always was too lazy to implement a Remez algorithm myself ... Your code really works like a charm, very easy to approximate own functions.
Anyways I have some small suggestions:
Make the output routine independent from the calculation routine and maybe make it possible to get the coefficients manually. I had to adapt your code, to print output as "{ a0, a1, ...}" to directly copy'n'paste it to C++.
Furthermore, your program might be slightly faster, if you don't calculate the inverse of your matrix explicitly, but do matrix-decomposition and solving (not really critical, as no-one cares if it runs in 10 or 15 secs ...)
Also what might be nice is support for rational approximations as well. If one needs to approximate atan(y/x) there is a division anyways, so rational polynomials might be better.
@ChriSopht: thanks for the suggestions. I'll make it easier to gather and format the coefficients in the next release.
Also, I think it's easy to understand the reason why I did not bother to improve the matrix inversion: the running time of one inversion is less than two milliseconds! The search for local extrema is clearly what needs to be improved first, and I will work on that.
I really would like to support rational approximations, too. The biggest problem is the increased instability: the algorithm is not guaranteed to converge at all.
@sam: W.r.t runtime: ok, I never bother to benchmark your program, maybe it will only start being an issue if the polynomial gets very big.
W.r.t rational approximations, maybe the best way would be to make it some way interactive -- similar to the way boost's minimax optimizer does it. The basic idea they apply is to let the user manually skew the initial critical points towards or away from zero -- of course this only makes sense in an interactive version. Actually, I also like that they make it possible to decide for number of coefficients and precision of calculation at runtime. A big disadvantage of boost is at the moment that the weight is limited to relative or absolute and only the first coefficient can (directly) be pinned to zero (or to any other constant by shifting the function). Of course tricks such as y=sqrt(x) and sin(y)/y can be used as well, however "relative weight" then means relative to that modified function, not the original.
Hi Sam,
I will take the bait on "If you are approximating a function over an interval using its Taylor series..." part. As far as I remember, I was taught to do that, with a single additional step of approximating the error term as a power of x or sum of few of those, and approximating it using Chebyschev polynomial. I would say this is the Taylorian baseline to compare against. (I was not doing it for some time, but I may try that over weekend for 7 terms of sinus just to have the feeling)
Hi, interesting thread. For the specific case of the arctangent the following two rational approximations might be of your interest:
[This is for 0 <= x < Infinity. When x < 0 just take -atan_approx(-x)]
Second order:
atan_approx(x) = (Pi/2)*(b*x + x*x)/(1 + 2*b*x + x*x) where b = 0.596227, with a maximum approximation error of 0.1620º
Third order:
atan_approx(x) = (Pi/2)*(c*x + x*x + x*x*x)/(1 + (c+1)*x + (c+1)x*x + x*x*x) where c = (1 + sqrt(17))/8 and the maximum approximation error is 0.00811º
@mz Interesting, thanks. I would love for LolRemez to handle rational approximations, too, but the Remez algorithm is no longer guaranteed to converge in that case. I'll need to first find the relevant scientific papers — and there aren't many, unfortunately.
Nice post. I have been playing with your impl to approximate the normalized sinc function. I am breaking the intervals at the zeros, seems to be the best approach. Thanks for the work, it's been educational/helpful!
aren't the numbers 6e-10, 2e-14, 1e-19 very small? what am I missing here?
Wrong comparison!!!
Your first example is: with 6 coefficients a1 to a6, the Taylor series gets you worse error at the ends, worse by a factor of 10,000. In other words, you're choosing the number of coefficients and comparing the worst-case errors.
Cool.
What you SHOULD be comparing is: given a target worst-case error (e.g. 6.0E-10), how many coefficients are needed for each of the two methods (Taylor v. minimax)?
I assume the number of coefficients gives a pretty strong indication of the execution time. In other words, you should not be giving the computer a time limit and asking what it can do; you should be giving it a worst tolerable error and asking how long it will take.
It's also worth noting that, in specifying the interval of interest, you'd better not try to use the resulting minimax polynomial outside that interval, because the polynomial shoots off like crazy. The higher the order, the more quickly it shoots off. It's worth making that explicit because it's not obvious from your graph of minimax error.
In this case of sine and cosine, I have found substantially reduced computation time with the following recipe:
sin 2x = 2 sinx.cosx and cos 2x = 1 - 2(sinx)^{2 }
The above formula for cos2x is better from an error-propagation point of view than cos 2x = -1 + 2.(cosx)^{2 }
By optimising the size of the initial power series v. the number of halvings/doublings, you get more accuracy with less computation. That's because you have two processes here (doubling/halving and power series) letting you choose how much work to perform in each, instead of only one process (power series). That gives you freedom to find the right balance which gets the smallest error for any quantity of computational work.
For small required accuracies (10 decimal digits) there's not much in it, but when you get to larger numbers of digits, this method gets far more powerful. The general trend is that the amount of computation required (i.e. number of multiplies) is proportional to the square root of the number of digits required.
None of the above prevents you reducing the angle to the 0 - 45 degrees range (or the equivalent in radians) and of course you'd need an accurate value of pi for that.