Playing with the CPU pipeline
This article will show how basic knowledge of a modern CPU’s instruction pipeline can help microoptimise code at very little cost, using a real world example: the approximation of a trigonometric function. All this without necessarily having to look at lines of assembly code.
The code used for this article is included in the attached file.
Evaluating polynomials
Who needs polynomials anyway? We’re writing games, not a computer algebra system, after all. But wait! Taylor series are an excellent mathematical tool for approximating certain classes of functions. For instance, this is the Taylor series of sin(x) near x = 0:
Truncating the series at the 15th power will compute sin(x) with an absolute error no greater than 1e11 in the range [π/2; π/2], and 2e16 in the range [π/4; π/4].
However, a better approximation known as the minimax polynomial (probably featured in an upcoming article) will give a maximum absolute error of about 2e16 on the whole [π/2; π/2] range:
static double a0 = +1.0; static double a1 = 1.666666666666580809419428987894207e1; static double a2 = +8.333333333262716094425037738346873e3; static double a3 = 1.984126982005911439283646346964929e4; static double a4 = +2.755731607338689220657382272783309e6; static double a5 = 2.505185130214293595900283001271652e8; static double a6 = +1.604729591825977403374012010065495e10; static double a7 = 7.364589573262279913270651228486670e13; double sin1(double x) { return a0 * x + a1 * x * x * x + a2 * x * x * x * x * x + a3 * x * x * x * x * x * x * x + a4 * x * x * x * x * x * x * x * x * x + a5 * x * x * x * x * x * x * x * x * x * x * x + a6 * x * x * x * x * x * x * x * x * x * x * x * x * x + a7 * x * x * x * x * x * x * x * x * x * x * x * x * x * x * x; }
That is 64 multiplications and 7 additions, though compiler options such as GCC’s ffastmath
will help factor the expression in order to perform fewer operations.
It is possible to help the CPU by noticing that a term such as x^9 can be computed in only one operation if x^2 and x^7 are already known, leading to the following code:
double sin2(double x) { double ret, y = x, x2 = x * x; ret = a0 * y; y *= x2; ret += a1 * y; y *= x2; ret += a2 * y; y *= x2; ret += a3 * y; y *= x2; ret += a4 * y; y *= x2; ret += a5 * y; y *= x2; ret += a6 * y; y *= x2; ret += a7 * y; return ret; }
That is now only 16 multiplications and 7 additions. But it is possible to do even better using the Horner form of a polynomial evaluation:
Leading to the following code:
double sin3(double x) { double x2 = x * x; return x * (a0 + x2 * (a1 + x2 * (a2 + x2 * (a3 + x2 * (a4 + x2 * (a5 + x2 * (a6 + x2 * a7))))))); }
We are down to 9 multiplications and 7 additions. There is probably no way to be faster, is there? Let’s see…
Timings
Here are the timings in nanoseconds for the above code, compared with the glibc’s sin() function. The test CPU is an Intel® Core™ i72620M CPU at 2.70GHz. The functions were compiled using O3 ffastmath
:
function  sin  sin1  sin2  sin3

nanoseconds per call  22.518  16.406  16.658  25.276 
Wait, what? Our superbly elegant function, performing only 9 multiplications, is actually slower than the 64multiplication version? Which itself is as fast as the 16multiplication one? Surely we overlooked something.
That’s right. We ignored the CPU pipeline.
The instruction pipeline
In order to execute an instruction, such as “add A and B into C”, a CPU needs to do at least the following:
 fetch the instruction (ie. read it from the program’s memory)
 decode the instruction
 read the instruction’s operands (ie. A and B)
 execute the instruction
 write the result in memory or in registers (in our case, C)
On a modern Intel® CPU, the execution step only accounts for 1/10 or even 1/16 of the total execution time. The idea behind pipelining is simple: while executing an instruction, the CPU can often already read the operands for the next one.
But there is a problem with this strategy: if the next instruction depends on the result of the current one, the CPU cannot read the next operands yet. This is called a readafterwrite hazard, and most usually causes a pipeline stall: the CPU just does nothing until it can carry on.
For the sake of simplicity, imagine the CPU’s pipeline depth is 3. At a given time, it can fetch, execute and finish one instruction:
●  instruction is being fetched, executed or finished 
○  instruction could start, but needs to wait for the result of a previous instruction 
This is how the CPU would execute A = (a + b) * (c + d)
:
time →  total: 7  
1  B = a + b  ●  ●  ●  
2  C = c + d  ●  ●  ●  
3  A = B * C  ○  ○  ●  ●  ● 
The c + d
operation can be started very early because it does not depend on the result of a + b
. This is called instructionlevel parallelism. However, the final B * C
operation needs to wait for all previous instructions to finish.
Since every operation in sin3()
depends on the previous one, this is how it would execute that function:
time →  total: 48  
1  x2 = x * x  ●  ●  ●  
2  A = a7 * x2  ○  ○  ●  ●  ●  
3  A += a6  ○  ○  ●  ●  ●  
4  A *= x2  ○  ○  ●  ●  ●  
5  A += a5  ○  ○  ●  ●  ●  
6  A *= x2  ○  ○  ●  ●  ●  
7  A += a4  ○  ○  ●  ●  ●  
8  A *= x2  ○  ○  ●  ●  ●  
9  A += a3  ○  ○  ●  ●  ●  
10  A *= x2  ○  ○  ●  ●  ●  
11  A += a2  ○  ○  ●  ●  ●  
12  A *= x2  ○  ○  ●  ●  ●  
13  A += a1  ○  ○  ●  ●  ●  
14  A *= x2  ○  ○  ●  ●  ●  
15  A += a0  ○  ○  ●  ●  ●  
16  A *= x  ○  ○  ●  ●  ● 
These 9 multiplications and 7 additions are done in 48 units of time. No instructionlevel parallelism is possible because each instruction needs to wait for the previous one to finish.
The secret behind sin2()
’s performance is that the large number of independent operations allows the compiler to reorganise the computation so that the instructions can be scheduled in a much more efficient way. This is roughly how GCC compiled it:
time →  total: 30  
1  x2 = x * x  ●  ●  ●  
2  A = a7 * x2  ○  ○  ●  ●  ●  
3  x3 = x2 * x  ●  ●  ●  
4  A += a6  ○  ●  ●  ●  
5  B = a1 * x3  ●  ●  ●  
6  x5 = x3 * x2  ●  ●  ●  
7  A *= x2  ●  ●  ●  
8  C = a2 * x5  ○  ●  ●  ●  
9  B += x  ●  ●  ●  
10  x7 = x5 * x2  ●  ●  ●  
11  A += a5  ●  ●  ●  
12  D = a3 * x7  ●  ●  ●  
13  B += C  ●  ●  ●  
14  x9 = x7 * x2  ●  ●  ●  
15  B += D  ○  ●  ●  ●  
16  E = a4 * x9  ●  ●  ●  
17  x11 = x9 * x2  ●  ●  ●  
18  B += E  ○  ●  ●  ●  
19  A *= x11  ●  ●  ●  
20  A += B  ○  ○  ●  ●  ● 
These 13 multiplications and 7 additions are executed in 30 units of time instead of 48 for the previous version. The compiler has been rather clever here: the number of ○’s is kept small.
Note that 30 / 48 = 0.625
, and the ratio between sin2
and sin3
’s timings is 16.658 / 25.276 = 0.659
. Reality matches theory pretty well!
Going further
We have seen that increasing the number of operations in order to break dependencies between CPU instructions allowed to help the compiler perform better optimisations taking advantage of the CPU pipeline. But that was at the cost of 40% more multiplications. Maybe there is a way to improve the scheduling without adding so many instructions?
Luckily there are other ways to evaluate a polynomial.
EvenOdd form and similar schemes
Consider our 8th order polynomial:
Separating the odd and even coefficients, it can be rewritten as:
Which using Horner’s form yields to:
This polynomial evaluation scheme is called the EvenOdd scheme. It only has 9 multiplications and 7 additions (only one multiplication more than the optimal case). It results in the following C code:
double sin4(double x) { double x2 = x * x; double x4 = x2 * x2; double A = a0 + x4 * (a2 + x4 * (a4 + x4 * a6)); double B = a1 + x4 * (a3 + x4 * (a5 + x4 * a7)); return x * (A + x2 * B); }
And this is the expected scheduling:
time →  total: 33  
1  x2 = x * x  ●  ●  ●  
2  x4 = x2 * x2  ○  ○  ●  ●  ●  
3  B = a7 * x4  ○  ○  ●  ●  ●  
4  A = a6 * x4  ●  ●  ●  
5  B += a5  ○  ●  ●  ●  
6  A += a4  ●  ●  ●  
7  B *= x4  ○  ●  ●  ●  
8  A *= x4  ●  ●  ●  
9  B += a3  ○  ●  ●  ●  
10  A += a2  ●  ●  ●  
11  B *= x4  ○  ●  ●  ●  
12  A *= x4  ●  ●  ●  
13  B += a1  ○  ●  ●  ●  
14  A += a0  ●  ●  ●  
15  B *= x2  ○  ●  ●  ●  
16  A += B  ○  ○  ●  ●  ●  
17  A *= x  ○  ○  ●  ●  ● 
Still not good enough, but we’re certainly onto something here. Let’s try another decomposition for the polynomial:
And using Horner’s form again:
Resulting in the following code:
double sin5(double x) { double x2 = x * x; double x4 = x2 * x2; double x6 = x4 * x2; double A = a0 + x6 * (a3 + x6 * a6); double B = a1 + x6 * (a4 + x6 * a7); double C = a2 + x6 * a5; return x * (A + x2 * B + x4 * C); }
And the following scheduling:
time →  total: 31  
1  x2 = x * x  ●  ●  ●  
2  x4 = x2 * x2  ○  ○  ●  ●  ●  
3  x6 = x4 * x2  ○  ○  ●  ●  ●  
4  B = x6 * a7  ○  ○  ●  ●  ●  
5  A = x6 * a6  ●  ●  ●  
6  C = x6 * a5  ●  ●  ●  
7  B += a4  ●  ●  ●  
8  A += a3  ●  ●  ●  
9  C += a2  ●  ●  ●  
10  B *= x6  ●  ●  ●  
11  A *= x6  ●  ●  ●  
12  C *= x4  ●  ●  ●  
13  B += a1  ●  ●  ●  
14  A += a0  ●  ●  ●  
15  B *= x2  ○  ●  ●  ●  
16  A += C  ●  ●  ●  
17  A += B  ○  ○  ●  ●  ●  
18  A *= x  ○  ○  ●  ●  ● 
One more instruction and two units of time better. That’s slightly better, but still not as good as we would like. One problem is that a lot of time is lost waiting for the value x^{6} to be ready. We need to find computations to do in the meantime to avoid pipeline stalls.
HighLow form
Instead of splitting the polynomial into its even and odd coefficients, we split it into its high and low coefficients:
And again using Horner’s form:
The corresponding code is now:
double sin6(double x) { double x2 = x * x; double x4 = x2 * x2; double x8 = x4 * x4; double A = a0 + x2 * (a1 + x2 * (a2 + x2 * a3)); double B = a4 + x2 * (a5 + x2 * (a6 + x2 * a7)); return x * (A + x8 * B); }
And the expected scheduling:
time →  total: 30  
1  x2 = x * x  ●  ●  ●  
2  B = x2 * a7  ○  ○  ●  ●  ●  
3  A = x2 * a3  ●  ●  ●  
4  x4 = x2 * x2  ●  ●  ●  
5  B += a6  ●  ●  ●  
6  A += a2  ●  ●  ●  
7  x8 = x4 * x4  ●  ●  ●  
8  B *= x2  ●  ●  ●  
9  A *= x2  ●  ●  ●  
10  B += a5  ○  ●  ●  ●  
11  A += a1  ●  ●  ●  
12  B *= x2  ○  ●  ●  ●  
13  A *= x2  ●  ●  ●  
14  B += a4  ○  ●  ●  ●  
15  A += a0  ●  ●  ●  
16  B *= x8  ○  ●  ●  ●  
17  A += B  ○  ○  ●  ●  ●  
18  A *= x  ○  ○  ●  ●  ● 
Finally! We now schedule as well as GCC, and with 11 multiplications instead of 13. Still no real performance gain, though.
Pushing the limits
Can we do better? Probably. Remember that each ○ in the above table is a pipeline stall, and any instruction we would insert there would be basically free.
Note the last instruction, A *= x. It causes a stall because it needs to wait for the final value of A, but it would not be necessary if A and B had been multiplied by x beforehands.
Here is a way to do it (bold instructions indicate a new instruction or a modified one):
time →  total: 27  
1  x2 = x * x  ●  ●  ●  
2  B = x2 * a7  ○  ○  ●  ●  ●  
3  A = x2 * a3  ●  ●  ●  
4  x4 = x2 * x2  ●  ●  ●  
5  B += a6  ●  ●  ●  
6  A += a2  ●  ●  ●  
7  x8 = x4 * x4  ●  ●  ●  
8  B *= x2  ●  ●  ●  
9  A *= x2  ●  ●  ●  
10  x3 = x2 * x  ●  ●  ●  
11  B += a5  ●  ●  ●  
12  A += a1  ●  ●  ●  
13  C = a0 * x  ●  ●  ●  
14  B *= x2  ●  ●  ●  
15  A *= x3  ●  ●  ●  
16  x9 = x8 * x  ●  ●  ●  
17  B += a4  ●  ●  ●  
18  A += C  ●  ●  ●  
19  B *= x9  ○  ●  ●  ●  
20  A += B  ○  ○  ●  ●  ● 
Excellent! Just as many instructions as GCC, but now with fewer pipeline stalls. I don’t know whether this scheduling is optimal for the (incorrect) assumption of a 3stage pipeline, but it does look pretty good. Also, loading a0, a1 etc. from memory hasn't been covered for the sake of simplicity.
Anyway, we just need to write the code corresponding to this behaviour, and hope the compiler understands what we need:
double sin7(double x) { double x2 = x * x; double x3 = x2 * x; double x4 = x2 * x2; double x8 = x4 * x4; double x9 = x8 * x; double A = x3 * (a1 + x2 * (a2 + x2 * a3)); double B = a4 + x2 * (a5 + x2 * (a6 + x2 * a7)); double C = a0 * x; return A + C + x9 * B; }
Conclusion
It’s time to check the results! Here they are, for all the functions covered in this article:
function  sin  sin1  sin2  sin3  sin4  sin5  sin6  sin7

nanoseconds per call  22.518  16.406  16.658  25.276  18.666  18.582  16.366  17.470 
Damn. All these efforts to understand and refactor a function, and our best effort actually performs amongst the worst!
What did we miss? Actually, this time, nothing. The problem is that GCC didn't understand what we were trying to say in sin7()
and proceeded with its own optimisation ideas. Compiling with O3
instead of O3 ffastmath
gives a totally different set of timings:
function  sin  sin1  sin2  sin3  sin4  sin5  sin6  sin7

nanoseconds per call  22.497  30.250  19.865  25.279  18.587  18.958  16.362  15.891 
There. We win eventually!
There is a way to still use ffastmath
yet prevent GCC from trying to be too clever. This might be preferable because we do not want to lose the benefits of ffastmath
in other places. By using an architecturespecific assembly construct, we can mark temporary variables as used, effectively telling GCC that the variable needs to be really computed and not optimised away:
double sin7(double x) { double x2 = x * x; double x3 = x2 * x; double x4 = x2 * x2; double x8 = x4 * x4; double x9 = x8 * x; #if defined __x86_64__ __asm__("" : "+x" (x3), "+x" (x9)); #elif defined __powerpc__  defined __powerpc64__ __asm__("" : "+f" (x3), "+f" (x9)); #else __asm__("" : "+m" (x3), "+m" (x9)); /* Out of luck :( */ #endif double A = x3 * (a1 + x2 * (a2 + x2 * a3)); double B = a4 + x2 * (a5 + x2 * (a6 + x2 * a7)); double C = a0 * x; return A + C + x9 * B; }
This works on the x86_64 architecture, where "+x"
indicates the SSE registers commonly used for floating point calculations, and on the PowerPC, where "+f"
can be used. This approach is not portable and it is not clear what should be used on other platforms. Using "+m"
is generic but often means a useless store into memory; however, on x86 it is still a noticeable gain.
And our final results, this time with the full O3 ffastmath
optimisation flags:
function  sin  sin1  sin2  sin3  sin4  sin5  sin6  sin7

nanoseconds per call  22.522  16.411  16.663  25.277  18.628  18.588  16.365  15.617 
The code used for this article is included in the attached file.
Attachments (1)
 poly.cpp (5.5 KB)  added by 7 years ago.
Download all attachments as: .zip
Comments
You totally forgot about using SIMD!
Very instructive. Thanks!
Note: Mac OS X users, you can use
clang++ stdlib=libc++ poly.cpp
to compile the attached file.Heck yeah this is exactly what I ndeeed.
Please post more of these gems.. Someone mentioned SIMD (SSE or AVX or whatever version).. wouldn't loading and unloading from that format incur its own hit?
Please either use 1/10 or onetenth, not "one 10th".
Can you use ffastmath and fnounsafemathoptimizations to avoid GCC reordering your code? Regrouping of floating point arithmetic is an "unsafe" optimization since it can change the result (due to different rounding).
what is this gcc version? If your processor is capable of, and gcc is recent enough(>4.6, I think) the ftreevectorizer option is included into O3, and it should kick in while calculating polynomials.
Just tested on a sandy bridge with gcc 4.8, indded it does turn sin7 into simd instructions.
OMG It is holly cucumber
Nice! Just Superb explanation. Especially for undergraduates.
The things I take away from this article are:
1) If you just focus on writing clean and understandable code, the compiler will usually do a pretty good job at optimizing it.
2) If you do decide to handoptimize your code, be sure to carefully measure the results because it could just as well have the opposite effect.
I've seen a lot of situations where people tried to outsmart a compiler only to make things worse in both performance and maintainability. Microoptimizations such as these are rarely needed in practice and should always be very carefully considered.
Excellent post  thanks!
Simply by switching compiler from gcc to clang, I got a factor 2.5 of speedup.
Clang 3.4
g++ 4.8.1
Also worth noting the processor specified uses simultaneous multithreading, which might have skewed the results a bit.
Very informative report but I think it is based on a wrong assumption: it is not simply the pipeline. The i7 (and many more before that!) has a particular type of runtime optimization in the form of a dual pipeline with registers renaming  which is why the execution needs to wait for the previous result to be available.
It would be interesting to evaluate the impact of Hyperthreading (Linux typically divides the dual pipelines to run more processes at the same time, loosing on this particular type of optimization). Also, this should be evaluated on an ARM core, which is very likely not to have this extra boost and probably won't show this discrepancy. I think the comment is interesting but not across the board.
Great post! I have fetured it in the last embedded systems weekly issue.
I hope you'll like it
Truncating the series at the 15th power will compute sin(x) with an absolute error no greater than 1e11 in the range [π/2; π/2], and 2e16 in the range [π/4; π/4].
Did you really mean 1e11, or you wanted to mean 2e11, since 1e11 really do not mean anything which is equal to 1 anyway. This might be a typo.
kongkon
@anonymous: I am using the E notation here, so 1e11 should be understood as 1×10^{11}.
Pure showingoff a useless skill. I prefer to optimize UNTIL it is necessary enough.  but most of time it turns that Compiler did good enough.
Hello...
See also https://en.wikipedia.org/wiki/Estrin%27s_scheme
現金網 http://king168.net
http://v9onlinepharm.com/ , and all other CNS stimulants are contraindicated for patients who have undergone CABG.
Tinedol  an effective remedy for fungal foot odor and itching.
wh0cd896123 <a href=http://paroxetine.reisen/>paroxetine hcl 20mg</a> <a href=http://arimidexformen.pro/>arimidex pills</a> <a href=http://allopurinol.systems/>allopurinol</a>
wh0cd59278 <a href=http://effexor.us.org/>Effexor 150mg</a> <a href=http://buyinderal.us.org/>buy inderal online</a> <a href=http://diclofenac75mg.us.com/>Diclofenac Sod</a>
wh0cd516347 <a href=http://doxycycline100mg.reisen/>doxycycline 100mg</a> <a href=http://advairdiskus.pro/>advair</a> <a href=http://buyelimite.store/>cheap elimite</a> <a href=http://buyvermox.store/>vermox</a> <a href=http://buyphenergan.shop/>phenergan generic</a> <a href=http://amoxicillin.work/>amoxicillin prescription</a> <a href=http://buylasix.reisen/>visit website</a> <a href=http://buysildenafil.reisen/>where to buy sildenafil citrate</a>
wh0cd708146 <a href=http://amoxicillin875mg.us.com/>amoxicillin 875 mg</a> <a href=http://claritingeneric.us.com/>CLARITIN 10MG</a> <a href=http://amoxicillin.us.org/>purchase amoxicillin</a>
wh0cd928567 <a href=http://kytril.world/>kytril</a> <a href=http://elocon.world/>elocon</a> <a href=http://vermox.live/>mebendazole tablets</a>
wh0cd899967 <a href=http://buysildenafil.reisen/>cheap sildenafil citrate</a> <a href=http://eloconcreamforsale.pro/>elocon</a> <a href=http://clomid.tools/>clomid medicine</a> <a href=http://buyallopurinol.reisen/>cost of allopurinol</a> <a href=http://buybupropion.store/>discover more</a> <a href=http://diflucanoverthecounter.pro/>order diflucan online</a> <a href=http://prednisone20mg.pro/>prednisone 20mg</a>
wh0cd312157 <a href=http://buycafergot.reisen/>cafergot</a> <a href=http://buyvpxl.shop/>vpxl online</a> <a href=http://acyclovir800mg.pro/>acyclovir 800 mg tablet</a> <a href=http://buybuspar.shop/>buspar</a> <a href=http://buybuspar.reisen/>buspirone 5 mg</a> <a href=http://lipitorgeneric.pro/>lipitor</a> <a href=http://buyalbuterol.work/>albuterol</a> <a href=http://tretinoin.fund/>full report</a>
wh0cd944796 <a href=http://proscar.reisen/>proscar</a> <a href=http://buspar.store/>buspar</a> <a href=http://elimite.work/>generic elimite</a>
wh0cd34492 <a href=http://lasix.biz/>Lasix Online</a>
wh0cd299762 <a href=http://doxycyclineprice.us.com/>doxycycline</a> <a href=http://synthroid.us.org/>more information</a>
is cbd marijuana legal <a href=https://cbdpillssale.com/>best cbd capsules reviews</a> cbd pills legal in all 50 states cannabis cbd capsules for sale <a href=https://cbdcapsulesonline.org/>best cannabis cbd products for fibromyalgia</a> cbd marijuana stocks cannabis cbd oil for pain <a href=https://cbdpillsonline.com/>side effects of cbd pills</a> cbd cannabidiol cbd pills reviews <a href="https://cbdpillssale.com/">marijuana cbds and seizures</a> marijuana cbd colorado cbd pills <a href="https://cbdcapsulesonline.org/">cbd cannabis oil online</a> dosage for cbd capsules cannabidiol cbd oil for stress <a href="https://cbdpillsonline.com/">cannabidiol cbd pills for sale</a> cbd marijuana strains
allyauto com <a href="http://albanyriverrats.com/">auto owners insurance reviews</a> health insurance cost <a href=http://albanyriverrats.com/>car auto insurance quotes</a> what is the cheapest car insurance
wh0cd43047 <a href=http://suhagra.us.com/>suhagra</a> <a href=http://retina24h.us.org/>RetinA</a>
wh0cd38328 <a href=http://anafranilprice.us.com/>buy anafranil online</a>
wh0cd969591 <a href=http://naprosyn.world/>naprosyn 375 mg tab</a> <a href=http://septilin.reisen/>septilin</a> <a href=http://dipyridamole.reisen/>dipyridamole</a> <a href=http://coreg.reisen/>coreg hypertension</a>
wh0cd540258 <a href=http://buyadvair.reisen/>advair</a> <a href=http://zoloft.news/>zoloft</a> <a href=http://buytretinoin.reisen/>tretinoin cream</a>
wh0cd556483 <a href=http://onlinefurosemide.us.com/>more hints</a>
wh0cd483000 <a href=http://genericzoloft.us.com/>continue reading</a>
wh0cd866602 <a href=http://zebeta.world/>generic zebeta</a> <a href=http://florinef.world/>florinef</a> <a href=http://pilex.reisen/>pilex</a> <a href=http://clozaril.reisen/>clozaril</a> <a href=http://lasuna.world/>generic lasuna</a> <a href=http://cardizem.world/>cardizem 120 mg</a> <a href=http://diamox.reisen/>diamox</a> <a href=http://lotensin.world/>lotensin generic</a>
wh0cd972536 <a href=http://lasixmedication.us.com/>buy lasix</a> <a href=http://lipitor.us.com/>lipitor sales</a>
This article is showing very well how basic knowledge of a modern CPU’s instruction pipeline can help microoptimise code at very little cost, using a real world example. Thanks for it and carry on it. To get electrical safety matting Click here http://electricalmattingco.co.uk/