The key point to SSE is that AMD and Intel have put 8 new 128-bit registers on their processors. You can view the registers as:
- 16 8-bit integers
- 8 16-bit integers
- 4 32-bit integers
- 4 32-bit single precision floating point numbers (float)
- 2 64-bit double precision floating point numbers (double)
One big feature (especially for C/C++ programmers) that came with SSE was faster double/float to integer conversion instructions. When coding using the x87 floating point stack, the code to convert a floating point number to an integer in C, would work roughly like this: (a) store the FPU rounding mode (b) Set the rounding mode to truncate [note that this is required by the C standard] (c) Convert double to integer (d) Restore old FPU rounding mode. The whole circus was rather expensive. SSE gives 2 sets of fast instructions to convert doubles/floats (and the tuples of 4 floats and 2 doubles) to integers. The instruction itself specifies the rounding mode to use (i.e. truncate/round) and so no messing with the FPU state is required. In fact, the so-called SSE acceleration that the VC++ 8 compiler pretends to do is mainly limited to using the faster convert instructions!
Besides the fast math there are some other benefits. SSE finally got rid of the floating point register stack, so the new floating point code is easier to write and a little faster at times. There are also some pretty cool instructions aimed at specific problems. An example is the CRC calculation instruction that is available with SSE revision 4!
Anyway, that's it for now. Next time, I promise more code and less gyan. Let me know if you have questions/comments.
0 comments:
Post a Comment