Monday, July 29, 2013

Noisy Dynamical Systems

Today we're going to play with a nonlinear dynamical system and noise. Code for singing along is posted on github. Our system is described, in general, by
\[\dot{x} = f(x),\]
which says that the rate of change of \(x\) is just some arbitrary function \(f(\cdot)\) of \(x\). In block diagram format, the system looks like this:

block diagram of system
System block diagram. \(1/s\) is an integrator.  The derivative of \(x\) is its input, and \(x\) is its output.
Specifically, let's let our nonlinear function take the form 
\[f(x) = e^{-x}-1.\]

To start analyzing this system, we'll plot the phase portrait, which shows us how \(\dot{x}\) depends on \(x\).
Phase portrait of our dynamical system
The phase portrait for our nonlinear system tells us how the system changes. We'll be running this system from several different initial conditions indicated with the black to red  dots. Their positions along the x-axis are their initial condition, and their positions along the y-axis are their initial derivatives. 
From the phase portrait, we can tell that the system's state, or value of \(x\), increases when the state is negative and decreases when the state is positive. Therefore, we can predict that the system will stabilize at 0.

Do we see this stable point?
System state trajectory over time for initial states scattered between -1.5 and 1.5 .

Yes! Great!

What if we add some noise? 
\[\dot{x} = f(x) + u \]
\[u \text{ is } \mathcal{N}(0, \sigma)\]
That is, we've added Gaussian white noise of variance \(\sigma^{2}\) to our system.
Our system block diagram now looks like this: 
Block diagram of system with noise added to input of integrator.
Do we still see the system go to same stable point?

System simulation with noise added to the input of the integrator.  Noise is  Gaussian white noise with variance indicated by \(\sigma\) above each subplot.

It looks like the system goes to the same stable point at low levels of noise, but system get's pretty swamped with noise at higher variances, which makes it hard to tell if the system settles around the predicted stable point. Running the simulation many times and averaging across trials helps us see the long term behavior more clearly:
Each line is the average of 25 trials.
That's better. It looks like the system settles, on average, at the predicted state for low noise levels, but at higher levels of noise, the system seems to be averaging at a higher state value. If we do the same averaging over trials but with much larger noise, we see this effect more clearly:

The system no longer averages out anywhere close to the predicted stable point without noise. The system doesn't even look like it reaches its average stable value by the end of the simulation for very high levels of noise. Why does the stable point shift?

Does the average stable point change because we are now inputting a low pass filtered version of the noise into the nonlinearity? I'm not sure of an intuitive way of understanding this yet.

Now let's tweak the system a little:

\[\dot{x} = f(x+u)\]

Now, noise is passed through the nonlinearity before integration.

Show system diagram.
Block diagram of system with noise added to the input to the nonlinearity.
Running the simulation for this system yields:

System simulation with noise added to the input to the nonlinearity.  Each line is a single trial, not averaged. 
Look carefully at the value that the system settles at for each level of noise. The system is less noisy, but fixed point position is far more sensitive to the level of noise!

Think about it this way. If you are the system, and you are at a state x, what do you see? Because of the noise you see the everything around you. So if we take the noise distribution and average each point of the phase portrait with our noise (i.e. convolve our system function \(f(\cdot)\) with the pdf (probability density function) of our noise), we get a better sense of the fixed points of our system:
Phase portrait of the the system after convolving the system function, \(f\), with the noise distribution pdf.

Another way of analyzing the effect of noise is via the Taylor series expansion of the \[f(\cdot)\]. With noise, \(\dot{x}=f(x+u)\), and
\[f(x+u)=f(x)+f'(x)u+\frac{1}{2}f''(x)u^{2}+\frac{1}{3!}f^{(3)}(x)u^3+...\]
Taking the expected value,
\[E[f(x+u)]=f(x)+f'(x)E[u]+\frac{1}{2}f''(x)E[u^{2}]+\frac{1}{3!}f^{(3)}(x)E[u^3]+...\]
Since we're assuming \(u\) is symmetric about 0, the expected value of the odd powers of \(u\) are also 0,  so
\[E[f(x+u)]=f(x)+\frac{1}{2}f''(x)E[u^{2}]+\frac{1}{4!}f^{(4)}(x)E[u^4]+...\].
Before moving forward, we can see here why the phase portrait changes with noise. When noise entering the nonlinearity, \(\dot{x}\) is no longer just \(f(\cdot)\), but \(f(\cdot)\) summed with its derivatives weighted by the higher order moments of \(u\). In our example case,
\[f(x)=e^{-x}-1\]
\[f'(x)=-e^{-x}\]
\[f''(x)=e^{-x}\]
\[f^{(3)}(x)=-e^{-x}\]
\[f^{(4)}(x)=e^{-x}\]
\[\vdots\]
Notice a pattern? All of the odd ordered derivatives are the same, and all of the even ordered derivatives are the same! So for our example,
\[E[f(x+u)]=e^{-x}-1+\frac{1}{2}e^{-x}E[u^{2}]+\frac{1}{4!}e^{-x}E[u^4]+...\]
\[E[f(x+u)]=e^{-x}-1+e^{-x}\left(\frac{\sigma^{2}}{2}+\frac{3\sigma^{4}}{4!}+...\right)\]

This is neat! We can analytically see that noise changes the phase portrait.


What's different between the two noise cases?
Both noise cases change the dynamics of the system, but the system is more sensitive to noise input to the nonlinearity.

Keep in mind that in reality we can really see both kinds of noise:.
Block diagram of system with noise added to both 
If you can control its distribution, noise can even become another knob to tune your system!

Sunday, July 14, 2013

A multiply built with adds and bitshifts

Let's be multiplication machines! In our toolbox, let's say we only have adds and bitshifts. We're wonderful machines that can add any two numbers together just as easily as any other two numbers. It's just as easy to do \(1+1\) as it is to do \(5391 + 213492\). We're also binary thinkers and can easily move bits left and right to effectively multiply and divide by 2. Isn't life wonderful? But we don't want to just multiply by 2's. We want to be able to multiply by any two, arbitrary numbers.
How do we calculate something like \(m \times n\) for any \(m\) and \(n?\)
Or maybe since using concrete numbers is nice, let's say \(m=13\) and \(n=127\).

Going way back to when we were but incy wincy adders just learning how to multiply, we learned that multiplication is shorthand for repeated addition, so to calculate \(13 \times 127 \), we could either add 13 to itself 127 times, or add 127 to itself 13 times
\(13 \times 127 = 13 + 13 + 13 + ... \) repeat until we've added 127 13s.
or
\(13 \times 127 = 127 + 127 + 127 + ... \) repeat until we've added 13 127s.
The first way of calculating \(13 \times 127 \) takes 126 operations, and the second way of calculating takes about 12 operations, so we'll prefer the second method. So now we know that \(m \times n \) can be done in \(\min{(m,n)}\) time, that is, linear in either \(m\) or \(n\), whichever is smaller. An implementation of this in c would look something like this:


int multiply_by_adds(int m, int n)
{
   int iterations = MIN(m, n);
   int base = MAX(m, n);
   int result = 0;
   for (int i=0; i < iterations; i++) {
      result += base;
   }  
   return result;
}


But can we do better?

Yes we can! We have bitshifts in our toolbox. What do we know about bitshifts? We know that shifting left is the same as multiplying by powers of two. Each shift left increases the power of two by which we multiply.

In binary,
\[127 = 001111111.\]
\[127 \times 1 = 127 = 001111111 = 127<<0\]
\[127 \times 2 = 254 = 011111110 = 127<<1\]
\[127 \times 4 = 508 = 111111100 = 127<<2\]
etc...

Conveniently, binary representation exactly expresses a number as the sum of powers of two! This representation is really cool because decomposes our weird numbers into factors that the computer can easily manipulate.
\[13 = 8 + 4 + 1 = 2^3 + 2^2 + 2^0 = 1000 + 0100 + 0001 = 1101.\]

Therefore, we can rewrite
\[13 \times 127 = (8 + 4 + 1) \times 127 = \]
\[(8 \times 127) + (4 \times 127) + (1 \times 127) =\]
\[(127<<3) + (127<<2) + (127<<0)  = \]
\[1016 + 508 + 127  = 1651 \]
and we're done! no need for silly repeated additions of \(127\).

By leveraging the binary representation of our numbers, we've written an algorithm that requires one shift and one addition for each power two that compose \(m\) or \(n\). Now there is a slight nuance given the way integers are represented with a fixed number of bits on our computers, so in practice we can iterate through each bit of the representation to do this in constant time. An implementation in c would look something like this:

int multiply_by_shifts_adds(int m, int n)
{
   int multiplier = MIN(m, n);
   int base = MAX(m, n);
   int result = 0;
   for (int i = 0; i < BITS_IN_INT; i++) {
      if (0x1 << i & multiplier) result += base << i;
   }
   return result;
}

However, iterating through all of the bits is pretty wasteful.  If many of the bits are 0, we should just iterate through the minimum number of bits. Then we will have an algorithm that can be done in \(\log(\min{(m,n)})\) time that is roughly bounded by the constant time it takes to just iterate through all the bits of the integer representation. An implementation in c, using the gcc compiler, would look something like this: 

int multiply_by_shifts_adds_opt(int m, int n)
{
   int multiplier = MIN(m, n);
   int base = MAX(m, n);
   int result = 0;
   if (multiplier == 0) return 0;
   int lead_zeros = __builtin_clz(multiplier); // builtin gcc function
   int highest_pow = BITS_IN_INT - lead_zeros - 1;
   for (int i = highest_pow; i >= 0 ; i--) {
      if (0x1 << i & multiplier) result += base << i;
   }
   return result;
}

Full github code here.

Timing results for the three multiplier implementations discussed above.  As predicted, the implementation with just adds increases linearly, the implementation with shifts and adds that works through all the bits of our integer representation works in constant time, and the optimized implementation using shifts and adds that works through the minimum number of bits works it logarithmic time.  

Of course we would never implement something like this in practice at the level of c. All higher level languages have multiplication as a basic operator, but this exercise gave us a glimpse of the power of binary representation that allow computers to do computations quickly.  Multiply away, computers!