The atomic guarantee of CAS slows down processors. So some checking is done without it first:
void spin_lock(int *mutex) { while (1) { if (*mutex == 0) { // the mutex was at least momentarily unlocked if (!CAS(mutex, 0, 1) break; // we have locked the mutex // some other thread beat us to it, so try again } } }