Homo Habilis: noise

Friday, August 16, 2013

The importance of high spike rates

My lab uses the Neural Engineering Framework (NEF) for a principled method of implementing computations on our neuromorphic hardware. NEF uses the spike rates of neurons to represent values and calculate the weights connecting neurons. A fundamental dilemma is then how do we estimate the spike rate of a neuron? Real neurons can't just tell you their spike rate (Although simulated ones certainly can), so you have to estimate the neural spike rate from the spikes that the neurons output.

A simple, reasonable, first approach to estimating neural spike rate would be to just count spikes over a time window and divide the number of spikes by the length of the time window. This would work in the simple case of having as long as you want to observe and with an unchanging spike rate to estimate. In this case, as you make the observation window longer and longer, the spike rate estimate will become closer to the actual spike rate.

Unfortunately, we don't have the luxury of time, and our systems work in dynamic environments where variables are constantly changing. Using a box window is also limited by quantization: A spike is either in the window or not, so your spike rate estimate will be quantized to multiples of \(1/T\), where \(T\) is the window length. Here, we can see that as \(T\to\infty\), our estimate could take on any value. However, we typically limit \(T\) to 100ms in operation, so if we used a box window, our spike rate estimates would be quantized to multiples of 10Hz, which not very precise.

A better way to estimate spike rates for calculating the weights to connect neurons in our network would be to see spike rates in a manner similar to how the neurons will see them.

How does a neuron see a spike rate? Well, spikes interact with a neuron through the neuron's synapse, which is a low pass filter in our neuron model. The synapse converts the spike (impulses) into a current that is inject into the neuron's soma. It is this current that can be interpreted as what the neuron sees as the spike rate.

Therefore, to put ourselves in the perspective of the neuron when collecting spike rate data to training the network, we can use a low pass filter, with time constant matched to the neural synapses, to estimate the spike rate instead of counting spikes over a time window. For Poisson and uniform spike inputs, this method looks something like this:

Spike rate estimation for various rates over time. Blue traces are filter estimate when input spikes are generated uniformly. Red traces are filter estimate when input spikes are generated from a Poisson process. Filter time constant is 100ms. Spike rate estimates are normalized to the actual firing rate, so the correct estimate on the y-axis is 1. Vertical dotted lines denote the first 5 time constants of time. Horizontal dotted lines denote exponential progression towards steady state value.

A few things to note from the plot:

As spike rates increase, our estimate gets tighter to the actual spike rate
A uniform spike input is understandably less noisy than a Poisson spike input.
It takes time for the synapse to reach its steady state spike rate estimate. Note how the traces follow the saturating exponential function indicated by the dotted line.

Why does it make sense that as spike rate increases, the the estimate gets tighter to the actual spike rate? Consider the Poisson spike process, where the number of spikes seen in a window of time is described by a Poisson distribution, in which the mean equals the variance, \(\mu=\sigma^{2}.\) As the rate of the Poisson distribution increases, it looks more and more like a Gaussian distribution. For a Gaussian distribution, about 68% of the probability is within 1 standard deviation of the mean. Looking at the ratio between the standard deviation and the mean,

\[\frac{\sqrt{\sigma^{2}}}{\mu}=\frac{\sqrt{\mu}}{\mu}\]

\[ = \frac{1}{\sqrt{\mu}}\]

we see that as the mean rate increases, the spread around the mean drops with the square root of the mean, which explains why the estimate gets tighter to the actual spike rate with higher rates.

It's intuitive that a Poisson spike input would be more noisy than a uniform input. But how much more noisy?Looking at the distribution of spike rate estimate values collected over time after allowing the system to settle (i.e. run for 5 time constants) gives us an idea of what the synapse estimates as the rate:

Distribution (i.e. histogram) of spike rate estimates with a uniform spike input. x-axis is spike rate estimate, and y-axis is histogram counts so area of histogram is 1.

Distribution (i.e. histogram) of spike rate estimates with a Poisson spike input. x-axis is spike rate estimate, and y-axis is histogram counts so area of histogram is 1.

Of note from these plots:

The estimate of the Poisson spike rate begins to look Gaussian as the rate increases.
The estimate of the uniform spike rate begins to look uniform as the rate increases. Surprise surprise!
The x-axis scaling gets tighter with high firing rates, reflecting that our estimate gets better and better.

Code for this post found here.

Monday, July 29, 2013

Noisy Dynamical Systems

Today we're going to play with a nonlinear dynamical system and noise. Code for singing along is posted on github. Our system is described, in general, by

\[\dot{x} = f(x),\]

which says that the rate of change of \(x\) is just some arbitrary function \(f(\cdot)\) of \(x\). In block diagram format, the system looks like this:

System block diagram. \(1/s\) is an integrator. The derivative of \(x\) is its input, and \(x\) is its output.

Specifically, let's let our nonlinear function take the form

\[f(x) = e^{-x}-1.\]

To start analyzing this system, we'll plot the phase portrait, which shows us how \(\dot{x}\) depends on \(x\).

The phase portrait for our nonlinear system tells us how the system changes. We'll be running this system from several different initial conditions indicated with the black to red dots. Their positions along the x-axis are their initial condition, and their positions along the y-axis are their initial derivatives.

From the phase portrait, we can tell that the system's state, or value of \(x\), increases when the state is negative and decreases when the state is positive. Therefore, we can predict that the system will stabilize at 0.

Do we see this stable point?

System state trajectory over time for initial states scattered between -1.5 and 1.5 .

Yes! Great!

What if we add some noise?

\[\dot{x} = f(x) + u \]

\[u \text{ is } \mathcal{N}(0, \sigma)\]

That is, we've added Gaussian white noise of variance \(\sigma^{2}\) to our system.

Our system block diagram now looks like this:

Block diagram of system with noise added to input of integrator.

Do we still see the system go to same stable point?

System simulation with noise added to the input of the integrator. Noise is Gaussian white noise with variance indicated by \(\sigma\) above each subplot.

It looks like the system goes to the same stable point at low levels of noise, but system get's pretty swamped with noise at higher variances, which makes it hard to tell if the system settles around the predicted stable point. Running the simulation many times and averaging across trials helps us see the long term behavior more clearly:

Each line is the average of 25 trials.

That's better. It looks like the system settles, on average, at the predicted state for low noise levels, but at higher levels of noise, the system seems to be averaging at a higher state value. If we do the same averaging over trials but with much larger noise, we see this effect more clearly:

The system no longer averages out anywhere close to the predicted stable point without noise. The system doesn't even look like it reaches its average stable value by the end of the simulation for very high levels of noise. Why does the stable point shift?

Does the average stable point change because we are now inputting a low pass filtered version of the noise into the nonlinearity? I'm not sure of an intuitive way of understanding this yet.

Now let's tweak the system a little:

\[\dot{x} = f(x+u)\]

Now, noise is passed through the nonlinearity before integration.

Show system diagram.

Block diagram of system with noise added to the input to the nonlinearity.

Running the simulation for this system yields:

System simulation with noise added to the input to the nonlinearity. Each line is a single trial, not averaged.

Look carefully at the value that the system settles at for each level of noise. The system is less noisy, but fixed point position is far more sensitive to the level of noise!

Think about it this way. If you are the system, and you are at a state x, what do you see? Because of the noise you see the everything around you. So if we take the noise distribution and average each point of the phase portrait with our noise (i.e. convolve our system function \(f(\cdot)\) with the pdf (probability density function) of our noise), we get a better sense of the fixed points of our system:

Phase portrait of the the system after convolving the system function, \(f\), with the noise distribution pdf.

Another way of analyzing the effect of noise is via the Taylor series expansion of the \[f(\cdot)\]. With noise, \(\dot{x}=f(x+u)\), and

\[f(x+u)=f(x)+f'(x)u+\frac{1}{2}f''(x)u^{2}+\frac{1}{3!}f^{(3)}(x)u^3+...\]

Taking the expected value,

\[E[f(x+u)]=f(x)+f'(x)E[u]+\frac{1}{2}f''(x)E[u^{2}]+\frac{1}{3!}f^{(3)}(x)E[u^3]+...\]

Since we're assuming \(u\) is symmetric about 0, the expected value of the odd powers of \(u\) are also 0, so

\[E[f(x+u)]=f(x)+\frac{1}{2}f''(x)E[u^{2}]+\frac{1}{4!}f^{(4)}(x)E[u^4]+...\].

Before moving forward, we can see here why the phase portrait changes with noise. When noise entering the nonlinearity, \(\dot{x}\) is no longer just \(f(\cdot)\), but \(f(\cdot)\) summed with its derivatives weighted by the higher order moments of \(u\). In our example case,

\[f(x)=e^{-x}-1\]

\[f'(x)=-e^{-x}\]
\[f''(x)=e^{-x}\]

\[f^{(3)}(x)=-e^{-x}\]

\[f^{(4)}(x)=e^{-x}\]

\[\vdots\]

Notice a pattern? All of the odd ordered derivatives are the same, and all of the even ordered derivatives are the same! So for our example,

\[E[f(x+u)]=e^{-x}-1+\frac{1}{2}e^{-x}E[u^{2}]+\frac{1}{4!}e^{-x}E[u^4]+...\]

\[E[f(x+u)]=e^{-x}-1+e^{-x}\left(\frac{\sigma^{2}}{2}+\frac{3\sigma^{4}}{4!}+...\right)\]

This is neat! We can analytically see that noise changes the phase portrait.

What's different between the two noise cases?
Both noise cases change the dynamics of the system, but the system is more sensitive to noise input to the nonlinearity.

Keep in mind that in reality we can really see both kinds of noise:.

Block diagram of system with noise added to both

If you can control its distribution, noise can even become another knob to tune your system!