You probably think you know everything about what the Greatest Common Divisor is and how to calculate it quickly! Well, that’s most probably the case, but just in case you do not remember all the details then here’s a short post about the GCD! (All the code is available here)

Simple GCD

Let’s start with the basic procedure everybody learn in the first week of any introduction to computer science class:

ll safe_gcd(ll u,ll v) {
if( v == 0 ) {
return u;
}
return safe_gcd( v, u % v );
}


I call this the safe_gcd since is very easy to remember and hard to screw up when coding. This code implements the original algorithm described in the book Elements from Euclid which is supposedly being one of his inventions, although there are people which believe that it might be already know some two hundred years before Euclid, or at least by a certain Eudoxus about 100 years before Euclid.. But that’s not that relevant after all..

The procedure works perfectly and it’s simple to code, an iterative implementation follows immediately from the recursive code:

ll safe_gcd_it(ll u,ll v) {
while(v) {
u %= v;
v ^= u; //This is just a swap!
u ^= v;
v ^= u;
}
return u;
}


This algorithm works perfectly and has served humanity for hundreds of years, it will require at most $\approx 5\times\log_{10} a$ steps to calculate the GCD of a,b where $a>b$. Things get’s interesting if one attempt to make some improvement on this basic algorithm.

Optimized simple GCD

One possible optimization designed to reduce the amount of step required to calculate the GCD is obtained by noting that given the division algorithm: $a=b \times q+r$ where $0 < r \leq b$ ($r= a \bmod b$) if a and b are integers with $b \neq 0$ then there is a unique pair of integers q and r such that $\frac{-|b|}{2} < r \leq \frac{|b|}{2}$, which in turn means that if $\frac{|b|}{2} < r \leq b$ then $a=b \times q+r$ can be substituted by $a=b \times (q+1)+(r-b)$ or $a = b \times (q - 1) + (r + b)$ depending on b being greater or smaller than zero (Make sure to always keep the absolute value of r – b)

Let’s see how this work calculating the GCD of two very big numbers $a=4582854257124982585,b=991075736100441490$

This is list of steps performed by the standard algorithm:

$1: 4582854257124982585,991075736100441490\newline 2: 991075736100441490,618551312723216625\newline 3: 618551312723216625,372524423377224865\newline 4: 372524423377224865,246026889345991760\newline 5: 246026889345991760,126497534031233105\newline 6: 126497534031233105,119529355314758655\newline 7: 119529355314758655,6968178716474450\newline 8: 6968178716474450,1070317134693005\newline 9: 1070317134693005,546275908316420\newline 10: 546275908316420,524041226376585\newline 11: 524041226376585,22234681939835\newline 12: 22234681939835,12643541760380\newline 13: 12643541760380,9591140179455\newline 14: 9591140179455,3052401580925\newline 15: 3052401580925,433935436680\newline 16: 433935436680,14853524165\newline 17: 14853524165,3183235895\newline 18: 3183235895,2120580585\newline 19: 2120580585,1062655310\newline 20: 1062655310,1057925275\newline 21: 1057925275,4730035\newline 22: 4730035,3127470\newline 23: 3127470,1602565\newline 24: 1602565,1524905\newline 25: 1524905,77660\newline 26: 77660,49365\newline 27: 49365,28295\newline 28: 28295,21070\newline 29: 21070,7225\newline 30: 7225,6620\newline 31: 6620,605\newline 32: 605,570\newline 33: 570,35\newline 34: 35,10\newline 35: 10,5\newline 36: 5,0\newline$

And this is the list of steps performed by the improved algorithm:

$1: 4582854257124982585,991075736100441490\newline 2: 991075736100441490,372524423377224865\newline 3: 372524423377224865,126497534031233105\newline 4: 126497534031233105,6968178716474450\newline 5: 6968178716474450,1070317134693005\newline 6: 1070317134693005,524041226376585\newline 7: 524041226376585,22234681939835\newline 8: 22234681939835,9591140179455\newline 9: 9591140179455,3052401580925\newline 10: 3052401580925,433935436680\newline 11: 433935436680,14853524165\newline 12: 14853524165,3183235895\newline 13: 3183235895,1062655310\newline 14: 1062655310,4730035\newline 15: 4730035,1602565\newline 16: 1602565,77660\newline 17: 77660,28295\newline 18: 28295,7225\newline 19: 7225,605\newline 20: 605,35\newline 21: 35,10\newline 22: 10,5\newline 23: 5,0\newline$

Well, looks good! from the 36 steps required to calculate the GCD with the standard algorithm to just 23 with the improved one, on average this implementation will be faster of a factor of $\log_2 \phi \approx 0.694$ where $\phi = \frac{1 + \sqrt{5}}{2}$, but keep in mind that the increased amount of operations which the computer need to perform during each iteration will reduce slightly the advantage of this procedure over the basic implementation, let’s see the code:

ll optimized_gcd(ll u,ll v) {
if( v == 0 ) {
return std::abs(u);
}
const ll t = u % v;
if( t > std::abs(v) / 2 ) {
if( v > 0 )
return optimized_gcd( v , std::abs(t - v) );
return optimized_gcd( v, t + v );
}
return optimized_gcd( v , t );
}


As you can see there’s some more computing here, but at least on paper this is faster the the basic algorithm, see the last paragraph for some benchmarks results.

Binary GCD

In 1961 Josef Stein discovered a cleaver way of improving the GCD algorithm by exploiting four simple facts about positive integers and GCD:

1. a and b are both even then $\gcd(a,b)=2\times gcd(\frac{a}{2},\frac{b}{2})$
2. a is even and b is odd then $\gcd(a,b)=gcd(\frac{a}{2},b)$
3. from Euclid’s algorithm: $\gcd(a,b)=\gcd(a-b,b)$
4. if a and b are both even then $a-b$ is even and $|a-b| < \max(a,b)$

For more details you might have a look at Wikipedia or much preferably to the venerable Knuth bible. Given those facts a very basic implementation of the binary algorithm is the following:

ll binary_gcd(ll u,ll v) {
if(!u)
return v;
if(!v)
return u;
if(u==v)
return u;
ll k{ 0 };
while( u % 2 == 0 && v % 2 == 0 )//Rule n. 1
u /= 2 , v /= 2, ++k;
ll t{ u - v };//Rule n. 3
while( t ) {
while( t % 2 == 0 )//Rule n. 2
t /= 2;
if( t < 0 )//Rule n. 4 (Replacing the larger of the two)
v = -t;
else
u = t;
t = u - v;//Rule n. 3
}
return std::pow(2,k)*u;//From rule n. 1, remember we've reduced u and v by 2^k
}


This is a little more complex implementation that the previous one, but should be reasonable faster in real applications even if the asymptotic bound is the same as for the original GCD algorithm. After a short evaluation you might notice that this is exactly the same code as the safe_gcd_it with one additional loop for exploiting the rule n. 1/2 and few additional operations for the rule 4 and 3.

Conclusion

To verify which of those algorithms is faster let’s run a benchmark, you might test this code by yourself cloning it from my repo here.

To test those algorithms I’ll generate a certain amount of random pair of numbers {a,b} which then I’ll use to feed each of the functions we have seen here, the time required to execute the procedures is taken by the venerable chrono::high_resolution_clock functionality.

Running this simple bench program on a set of 5000000 pair of numbers with the code compiled with -O3 using the latest clang, the output is the following:

Generating data
Running benchmarks
Results:
binary_gcd          2037ms         1
optimized_gcd       2192ms   1.07609
safe_gcd_it         2265ms   1.11193
safe_gcd            2294ms   1.12617


Looks like the safe and simple implementation of the GCD algorithm is the slower one! The faster algorithm seems to be the one discovered by Stein, that is somehow expected since even from the theoretical standpoint the average complexity of the binary GCD is out-performing the basic GCD.

I attempted a second test changing the input data, this time on a sequence of increasing integers $0...n-1$ where $n=2\times n$ where n is the amount of input. This time the sequence will not contain very big numbers but at most and thus the difference between two a,b would be no at most n, whereas in the previous test the difference might have been at most numeric_limits<long long>::max(), the output is the following:

Generating data
Running benchmarks
Results:
binary_gcd          838ms         1
optimized_gcd       893ms   1.06563
safe_gcd            928ms    1.1074
safe_gcd_it         935ms   1.11575


In this particular execution safe_gcd_it is slower than safe_gcd, but running this experiment multiple times you’ll notice that those two functions frequently switch between themselves the last two position of the result list (Are always slower than the other algorithms)

I ran this latest test with the same amount of input (5000000) but as you can see now the execution time for all of them is lower than the previous test, the reason is that is much less frequent to encounter extreme values of input where $a\ll b$ or vice-versa, which means than on average the amount of steps will be smaller (For all the algorithms).

But still, seems that the binary version of the GCD algorithm is the faster, followed by the optimized Euclid algorithm, that’s nice to know.