Sunday, July 14, 2013

A multiply built with adds and bitshifts

Let's be multiplication machines! In our toolbox, let's say we only have adds and bitshifts. We're wonderful machines that can add any two numbers together just as easily as any other two numbers. It's just as easy to do \(1+1\) as it is to do \(5391 + 213492\). We're also binary thinkers and can easily move bits left and right to effectively multiply and divide by 2. Isn't life wonderful? But we don't want to just multiply by 2's. We want to be able to multiply by any two, arbitrary numbers.
How do we calculate something like \(m \times n\) for any \(m\) and \(n?\)
Or maybe since using concrete numbers is nice, let's say \(m=13\) and \(n=127\).

Going way back to when we were but incy wincy adders just learning how to multiply, we learned that multiplication is shorthand for repeated addition, so to calculate \(13 \times 127 \), we could either add 13 to itself 127 times, or add 127 to itself 13 times
\(13 \times 127 = 13 + 13 + 13 + ... \) repeat until we've added 127 13s.
or
\(13 \times 127 = 127 + 127 + 127 + ... \) repeat until we've added 13 127s.
The first way of calculating \(13 \times 127 \) takes 126 operations, and the second way of calculating takes about 12 operations, so we'll prefer the second method. So now we know that \(m \times n \) can be done in \(\min{(m,n)}\) time, that is, linear in either \(m\) or \(n\), whichever is smaller. An implementation of this in c would look something like this:


int multiply_by_adds(int m, int n)
{
   int iterations = MIN(m, n);
   int base = MAX(m, n);
   int result = 0;
   for (int i=0; i < iterations; i++) {
      result += base;
   }  
   return result;
}


But can we do better?

Yes we can! We have bitshifts in our toolbox. What do we know about bitshifts? We know that shifting left is the same as multiplying by powers of two. Each shift left increases the power of two by which we multiply.

In binary,
\[127 = 001111111.\]
\[127 \times 1 = 127 = 001111111 = 127<<0\]
\[127 \times 2 = 254 = 011111110 = 127<<1\]
\[127 \times 4 = 508 = 111111100 = 127<<2\]
etc...

Conveniently, binary representation exactly expresses a number as the sum of powers of two! This representation is really cool because decomposes our weird numbers into factors that the computer can easily manipulate.
\[13 = 8 + 4 + 1 = 2^3 + 2^2 + 2^0 = 1000 + 0100 + 0001 = 1101.\]

Therefore, we can rewrite
\[13 \times 127 = (8 + 4 + 1) \times 127 = \]
\[(8 \times 127) + (4 \times 127) + (1 \times 127) =\]
\[(127<<3) + (127<<2) + (127<<0)  = \]
\[1016 + 508 + 127  = 1651 \]
and we're done! no need for silly repeated additions of \(127\).

By leveraging the binary representation of our numbers, we've written an algorithm that requires one shift and one addition for each power two that compose \(m\) or \(n\). Now there is a slight nuance given the way integers are represented with a fixed number of bits on our computers, so in practice we can iterate through each bit of the representation to do this in constant time. An implementation in c would look something like this:

int multiply_by_shifts_adds(int m, int n)
{
   int multiplier = MIN(m, n);
   int base = MAX(m, n);
   int result = 0;
   for (int i = 0; i < BITS_IN_INT; i++) {
      if (0x1 << i & multiplier) result += base << i;
   }
   return result;
}

However, iterating through all of the bits is pretty wasteful.  If many of the bits are 0, we should just iterate through the minimum number of bits. Then we will have an algorithm that can be done in \(\log(\min{(m,n)})\) time that is roughly bounded by the constant time it takes to just iterate through all the bits of the integer representation. An implementation in c, using the gcc compiler, would look something like this: 

int multiply_by_shifts_adds_opt(int m, int n)
{
   int multiplier = MIN(m, n);
   int base = MAX(m, n);
   int result = 0;
   if (multiplier == 0) return 0;
   int lead_zeros = __builtin_clz(multiplier); // builtin gcc function
   int highest_pow = BITS_IN_INT - lead_zeros - 1;
   for (int i = highest_pow; i >= 0 ; i--) {
      if (0x1 << i & multiplier) result += base << i;
   }
   return result;
}

Full github code here.

Timing results for the three multiplier implementations discussed above.  As predicted, the implementation with just adds increases linearly, the implementation with shifts and adds that works through all the bits of our integer representation works in constant time, and the optimized implementation using shifts and adds that works through the minimum number of bits works it logarithmic time.  

Of course we would never implement something like this in practice at the level of c. All higher level languages have multiplication as a basic operator, but this exercise gave us a glimpse of the power of binary representation that allow computers to do computations quickly.  Multiply away, computers!

No comments:

Post a Comment