What is SSE?
SSE = Streaming SIMD Extensions. It is an instruction set for IA-32 processors that speeds up doing SIMD operations.
What the heck is SIMD anyway?
SIMD = Single Instruction Multiple Data. Example: you want to add 1.0 to every element of a floating point array - so your instruction (Add 1.0) is the same for each element, but the data (the elements of the array) are different - hence multiple data and single instruction.
Why can't I just use a loop?
Loops are slow. That's why.
mov ecx, nElements
addOneLoop:
mov eax, [esi]
add eax, 1
mov [esi], eax
add esi, 4
sub ecx, 1
jnz addOneLoop
Assuming that all elements of the array are in L1 cache (which is a bogus assumption, but will do for our purposes help simplify the explanation), the processor will spend far more time doing the book-keeping for the loop - decrement the counter, increment the pointer, executing the branch, than actually doing the addition. In fact, the add eax, 1* is the fastest instruction in the loop because the immediate operand is a nice small constant, and the other operand is a register. In fact, if you looping condition is more complicated than the one shown, or if you have a branching construct inside of the loop, branch misprediction alone can slow your code by a factor of around 2/3!
It turns out that this kind of loop is really common in certain applications - signal and image processing for instance, and so AMD and Intel support the SSE instruction set which accelerate these algorithms.
Must be complicated, right?
Not really. It is quite simple, and we'll go into the details over the next few posts. Let me know if you have any questions/comments.
The Eager Student's Question Corner: (Or maybe lame section name corner :-)
* Any guesses as to why I used "add eax, 1" instead of "inc eax"?
1 comments:
Eager student's answer: To save carry?
Hmm. You really should let your old readers know you've restarted your blog. I'd stopped checking after that retirement notice of yours.
Post a Comment