Wednesday, February 15, 2012

Calculating the Length of a 3D Vector using SSE

This article explains how to calculate the length of a single 3D float vector stored in a SSE register. The length or norm of a vector is defined as the square root of the dot product of the vector with itself:
|v| = length3(v) = sqrt(v.x^2 + v.y^2 + v.z^2)
A single SSE register can be used to hold a 3D vector (the highest 32 bits are unused). In a previous article we show how to load a struct containing 3 float values into a SSE register. You may as well use the _mm_setr_ps(x, y, z, 0) intrinsic. SSE4 introduced the DPPS instruction (accessible via the _mm_dp_ps intrinsic) which allows to calculate the dot product of up to four float values. We will now use this intrinsic to calculate the length of a 3D vector with minimal instructions.