Wednesday, September 10, 2014

egalitarian naming convention in marriage and for children


Progressive people are plagued with naming conundrums when getting married and having kids. Who takes who's name and does that imply something about the union? 

If each person has <first name> <middle name> <last name>, each spouse can take on the last name of the other spouse as a replacement middle name or new second middle name.

A daughter's name will be
<first name> <father's last name> <mother's last name>
A son's name will be
<first name> <mother's last name> <father's last name>

The daughter and son naming conventions can be switched, but knowing the convention through generations makes it possible to trace family lineage either matrilineally or patrilineally.

As DNA is recombined with the conception of a child, so names are recombined.

Tuesday, July 15, 2014

Discretization

Discretization of a differential equation

Discretization is a key step in simulating a differential equation on a computer. There are many ways to discretize a continuous function:
  • foward Euler's method
  • backward Euler's method
  • bilinear transform
  • impulse invariance
  • and many more.
Here we'll consider Euler's method and a few variations on impulse invariance.
Consider the following differential equation:
$$\tau\dot{x}=-x+u$$
We need to approximate $x(t)$ and $\dot{x}$ in terms of $x[k]$, where $x[k]$ is the $k$th sample of $x(t)$. That is
$$x[k]=x(t_k)$$
where $t_k = k \Delta t$ for sample period $\Delta t$. We'd like to find an update equation that describes the $x$ over the discrete steps in time. That is, we'd like to find an equation of the form
$$ x[k] = f(x\text{ in the past}) + g(\text{input}) $$

Euler's method

Euler's method approximates the differential operator with a difference equation. This is equivalent to linearizing the differential equation about a point. Among the various versions ofthe Euler method, the most common are the forward difference and the backward difference approximations.
\begin{align} \dot{x}(t_k)& \approx \frac{x(t_{k+1})-x(t_{k})}{\Delta t} \quad \text{forward Euler} \\ \dot{x}(t_k)& \approx \frac{x(t_{k})-x(t_{k-1})}{\Delta t} \quad \text{backward Euler} \end{align}
or in discrete notation
\begin{align} \dot{x}(t_k)& \approx \frac{x[k+1]-x[k]}{\Delta t} \quad \text{forward Euler} \\ \dot{x}(t_k)& \approx \frac{x[k]-x[k-1]}{\Delta t} \quad \text{backward Euler} \end{align}

Discretization using the forward Euler's method

Substituting the forward Euler approximation into the differential equation under consideration, we have
\begin{align} \tau\frac{x[k+1]-x[k]}{\Delta t} &= -x[k] + u[k] \\ \frac{\tau}{\Delta t}x[k+1]-\frac{\tau}{\Delta t}x[k] &= -x[k] + u[k] \\ \frac{\tau}{\Delta t}x[k+1] &= \left(\frac{\tau}{\Delta t}-1\right)x[k] + u[k] \\ \frac{\tau}{\Delta t}x[k+1] &= \left(\frac{\tau-\Delta t}{\Delta t}\right)x[k] + u[k] \\ x[k+1] &= \left(\frac{\tau-\Delta t}{\tau}\right)x[k] + \frac{\Delta t}{\tau}u[k] \\ x[k+1] &= \alpha x[k] + (1-\alpha) u[k] \qquad \text{where }\alpha\triangleq\left(\frac{\tau-\Delta t}{\tau}\right) \end{align}

Discretization using the backward Euler's method

With the backward Euler approximation, we have
\begin{align} \tau\frac{x[k]-x[k-1]}{\Delta t} &= -x[k] + u[k] \\ \frac{\tau}{\Delta t}x[k]-\frac{\tau}{\Delta t}x[k-1] &= -x[k] + u[k] \\ \left(\frac{\tau}{\Delta t}+1\right)x[k] &= \frac{\tau}{\Delta t}x[k-1] + u[k] \\ \left(\frac{\tau+\Delta t}{\Delta t}\right)x[k] &= \frac{\tau}{\Delta t}x[k-1] + u[k] \\ x[k] &= \left(\frac{\tau}{\tau+\Delta t}\right)x[k-1] + \left(\frac{\Delta t}{\tau+\Delta t}\right)u[k] \\ x[k] &= \alpha x[k-1] + (1-\alpha) u[k] \qquad \text{where }\alpha\triangleq\left(\frac{\tau}{\tau+\Delta t}\right) \end{align}
Note that the update equations have the same form in both the forward and backward Euler's method. Both are a weighted sum of the previous state and the input.

Impulse invariance

The basic idea of impuse invariance is to discretize a continuous system by sampling the impulse response, finding the transfer function, and then finding the corresponding update equation. That is, for continuous impulse response $h(t)$, the discrete impulse response will be given by
$$h[n]=h(n \Delta t)\Delta t.$$
Note that this formula is not consistent when a causal continuous-time impulse response has a discontinuity at $t=0$ (See here), but for now we'll assume that the correction has a negligible effect (its effect is proportional to the step size) and continue with the uncorrected version for simplicity. We'll look at the corrected version later. For a continuous transfer function with $N$ poles
$$H(s)=\sum_{k=1}^N\frac{A_k}{s-s_k},$$
the discrete transfer function will be given by
$$H(z)=\Delta t\sum_{k=1}^N\frac{A_k}{1-e^{s_k\Delta t}z^{-1}}.$$
Our differential equation $\tau \dot{x}=-x+u$ has impulse response
$$h(t) = \frac{1}{\tau}e^{-t/\tau}u(t)$$
where $u(t)$ is the unit step function, which corresponds to the transfer function
\begin{align} \tau sX(s) &= -X(s)+U(s) \\ (\tau s+1)X(s) &= U(s) \\ \frac{X(s)}{U(s)} &= \frac{1}{\tau s+1} \end{align}
\begin{align} H(s) &= \frac{X(s)}{U(s)} \\ &= \frac{1}{\tau s+1} \\ &= \frac{1/\tau}{s+1/\tau} \end{align}
This means that the discrete transfer function will be
$$H(z)=\frac{\Delta t/\tau}{1-e^{-\Delta t/\tau}z^{-1}}.$$
From the transfer function, we can derive the update equation.
\begin{align} H(z) &= \frac{X(z)}{U(z)} \\ \frac{\Delta t/\tau}{1-e^{-\Delta t/\tau}z^{-1}} &= \frac{X(z)}{U(z)} \\ \left(1-e^{-\Delta t/\tau}z^{-1}\right)X(z) &= \frac{\Delta t}{\tau}U(z) \\ X(z) &= e^{-\Delta t/\tau}z^{-1}X(z) + \frac{\Delta t}{\tau}U(z) \\ x[k] &= e^{-\Delta t/\tau}x[k-1]+\frac{\Delta t}{\tau}u[k] \end{align}
See again that the difference equation takes on the same form with the weighted sum of the previous state with the input. In addition, if we use the Taylor series expansion of the exponential we can see the connection between the impulse invariance approximation and the foward Euler method.
\begin{align} e^{\Delta t/\tau} &= 1-\frac{\Delta t}{\tau}+ \frac{1}{2!}\left(\frac{\Delta t}{\tau}\right)^2- \frac{1}{3!}\left(\frac{\Delta t}{\tau}\right)^3\ldots \\ &\approx 1-\frac{\Delta t}{\tau} \\ &= \frac{\tau-\Delta t}{\tau} \end{align}
Substituting the Taylor series approximation, we get $$x[k]=\frac{\tau - \Delta t}{\tau}x[k-1]+\frac{\Delta t}{\tau}u[k]$$ which is the same as the forward Euler method approximation!

Impulse invariance, corrected

For a continuous transfer function
$$H(s)=\sum_{k=1}^N\frac{A_k}{s-s_k},$$
the corrected impulse invariance is given by
$$ h[n] = T\left(h(nT)-\frac{1}{2}h(0)\delta[n]\right) $$
which consequently means that
$$ H(z) = \Delta t\sum_{k=1}^{N}\frac{A_k}{1-e^{s_k\Delta t}z^{-1}} - \frac{\Delta t}{2}\sum_{k=1}^{N}A_k. $$
Recall that our differential equation has a transfer function
$$ H(s)=\frac{1/\tau}{s+1/\tau}, $$
so our discrete transfer function will be given by
\begin{align} H(z) &= \frac{\Delta t/\tau}{1-e^{-\Delta t/\tau}z^{-1}} - \frac{\Delta t/\tau}{2} \\ &= \frac{\Delta t}{\tau}\left(\frac{1}{1-e^{-\Delta t/\tau}z^{-1}} - \frac{1}{2}\right) \\ &= \frac{\Delta t}{2\tau}\left(\frac{2-\left(1-e^{-\Delta t/\tau}z^{-1}\right)}{1-e^{-\Delta t/\tau}z^{-1}}\right) \\ &= \frac{\Delta t}{2\tau}\left(\frac{1+e^{-\Delta t/\tau}z^{-1}}{1-e^{-\Delta t/\tau}z^{-1}}\right) \\ \end{align}
Our update equation will then follow as
\begin{align} \frac{X(z)}{U(z)} &= H(z) \\ &= \frac{\Delta t}{2\tau}\left(\frac{1+e^{-\Delta t/\tau}z^{-1}}{1-e^{-\Delta t/\tau}z^{-1}}\right) \\ X(z)\left(1-e^{-\Delta t/\tau}z^{-1}\right) &= \frac{\Delta t}{2\tau}\left(1+e^{-\Delta t/\tau}z^{-1}\right)U(z) \\ &\Updownarrow \\ x[k]-e^{-\Delta t/\tau}x[k-1] &= \frac{\Delta t}{2\tau}u[k]+\frac{\Delta t}{2\tau}e^{-\Delta t/\tau}u[k-1] \\ x[k] &= e^{-\Delta t/\tau}x[k-1] + \frac{\Delta t}{2\tau}u[k]+\frac{\Delta t}{2\tau}e^{-\Delta t/\tau}u[k-1]. \\ \end{align}

Conservation of area

Sometimes we care about preserving the area of the continuous function being discretized. The area, as the integral of the function over time, often has the physical interpretation of energy. Here we look at whether the various discretization techniques preserve the area of an impulse in their impulse responses. An impulse is defined as having unit area. We typically discretize the impulse as a $1$-time step pulse of height $\frac{1}{\Delta t}$ at $u[0]$ so that with a time step of $\Delta t$, the area is given by $\frac{1}{\Delta t}\Delta t=1$.

Forward Euler's method area

Recall that the update equation for the forward Euler's method is
$$ x[k+1] = \left(\frac{\tau-\Delta t}{\tau}\right)x[k] + \frac{\Delta t}{\tau}u[k]. $$
Given an impulse of height $\frac{1}{\Delta t}$, the impulse response will then have area
\begin{align} A &= \Delta t\left(\frac{1}{\tau} + \frac{1}{\tau}\left(\frac{\tau-\Delta t}{\tau}\right) + \frac{1}{\tau}\left(\frac{\tau-\Delta t}{\tau}\right)^2 + \ldots\right) \\ &= \frac{\Delta t}{\tau}\sum_{k=0}^{\infty}\left(\frac{\tau-\Delta t}{\tau}\right)^k \\ &= \frac{\Delta t}{\tau}\frac{1}{1-\frac{\tau-\Delta t}{\tau}} \\ &= \frac{\Delta t}{\tau}\frac{\tau}{\tau-(\tau-\Delta t)} \\ &= \frac{\Delta t}{\Delta t} \\ &= 1. \end{align}
The unit area is preserved!

Backward Euler's method area

Recall that the update equation for the forward Euler's method is
$$ x[k] = \left(\frac{\tau}{\tau+\Delta t}\right)x[k-1] + \left(\frac{\Delta t}{\tau+\Delta t}\right)u[k] $$
Given an impulse of height $\frac{1}{\Delta t}$, the impulse response will then have area
\begin{align} A &= \Delta t\left(\frac{1}{\tau+\Delta t} + \frac{1}{\tau+\Delta t}\frac{\tau}{\tau+\Delta t} + \frac{1}{\tau+\Delta t}\left(\frac{\tau}{\tau+\Delta t}\right)^2 + \ldots \right) \\ &= \frac{\Delta t}{\tau+\Delta t}\sum_{k=0}^{\infty}\left(\frac{\tau}{\tau+\Delta t}\right)^k \\ &= \frac{\Delta t}{\tau+\Delta t}\frac{1}{1-\frac{\tau}{\tau+\Delta t}} \\ &= \frac{\Delta t}{\tau+\Delta t}\frac{\tau+\Delta t}{\tau+\Delta t-\tau} \\ &= \frac{\Delta t}{\Delta t} \\ &= 1. \end{align}
The unit area is preserved!

Impulse invariance area

Recall that the update equation for impulse invariance is
$$ x[k] = e^{-\Delta t/\tau}x[k-1]+\frac{\Delta t}{\tau}u[k] $$
Given an impulse of height $\frac{1}{\Delta t}$, the impulse response will then have area
\begin{align} A &= \Delta t\left(\frac{1}{\tau} + \frac{1}{\tau}e^{-\Delta t/\tau} + \frac{1}{\tau}e^{-2\Delta t/\tau} + \ldots \right) \\ &= \frac{\Delta t}{\tau}\sum_{k=0}^{\infty}\left(e^{-\Delta t/\tau}\right)^k \\ &= \frac{\Delta t}{\tau}\frac{1}{1-e^{-\Delta t/\tau}} \\ &\neq 1. \end{align}
The unit area is not preserved...In fact, taking the Taylor series expansion, we see that the area is increased as
\begin{align} A &= \frac{\Delta t}{\tau}\frac{1}{1-e^{-\Delta t/\tau}} \\ &= \frac{\Delta t}{\tau}\frac{1}{1-\left(1-\frac{\Delta t}{\tau}+ \frac{\Delta t^2}{2\tau^2}- \frac{\Delta t^3}{6\tau^3}+\ldots\right)} \\ &= \frac{\Delta t}{\tau}\frac{1}{\frac{\Delta t}{\tau}- \frac{\Delta t^2}{2\tau^2}+ \frac{\Delta t^3}{6\tau^3}-\ldots} \\ &> 1 \end{align}
because
$$ \frac{\Delta t}{\tau}-\frac{\Delta t^2}{2\tau^2}+ \frac{\Delta t^3}{6\tau^3}-\ldots < \frac{\Delta t}{\tau} $$

Impulse invariance, corrected, area

Recall that the update equation for corrected impulse invariance is
$$ x[k] = e^{-\Delta t/\tau}x[k-1] + \frac{\Delta t}{2\tau}u[k]+\frac{\Delta t}{2\tau}e^{-\Delta t/\tau}u[k-1] $$
Given an impulse of height $\frac{1}{\Delta t}$, the impulse response will then have area
\begin{align} A &= \Delta t\left(\overbrace{\frac{1}{2\tau}}^{k=0} + \overbrace{\frac{1}{2\tau}e^{-\Delta t/\tau} + \frac{1}{2\tau}e^{-\Delta t/\tau}}^{k=1} + \overbrace{\left(\frac{1}{2\tau}e^{-\Delta t/\tau} + \frac{1}{2\tau}e^{-\Delta t/\tau}\right)e^{-\Delta t/\tau}}^{k=2} + \ldots \right) \\ &= \Delta t\left(\frac{1}{2\tau} + \frac{1}{\tau}e^{-\Delta t/\tau} + \frac{1}{\tau}e^{-2\Delta t/\tau} + \ldots \right) \\ &= \frac{\Delta t}{2\tau}+\frac{\Delta t}{\tau}\sum_{k=1}^{\infty}\left(e^{-\Delta t/\tau}\right)^k \\ &= \frac{\Delta t}{\tau}\left(\frac{1}{2}+\sum_{k=1}^{\infty}\left(e^{-\Delta t/\tau}\right)^k\right) \\ &= \frac{\Delta t}{\tau}\left(\frac{1}{2}+\frac{e^{-\Delta t/\tau}}{1-e^{-\Delta t/\tau}}\right) \\ &= \frac{\Delta t}{\tau}\left(\frac{1-e^{-\Delta t/\tau}+2e^{-\Delta t/\tau}}{2\left(1-e^{-\Delta t/\tau}\right)}\right) \\ &= \frac{\Delta t}{2\tau}\frac{1+e^{-\Delta t/\tau}}{1-e^{-\Delta t/\tau}} \\ &\neq 1 \end{align}
Again, the unit area is not preserved...

Area preserving impulse response discretization

There is a modification to the impulse response update equation that will preserve area. If we let
$$ x[k] = e^{-\Delta t/\tau}x[k-1]+\left(1-e^{-\Delta t/\tau}\right)u[k] $$
then the unit area of an impulse will be preserved. To prove this,
\begin{align} A &= \Delta t\left(\frac{1-e^{-\Delta t/\tau}}{\Delta t} + \frac{1-e^{-\Delta t/\tau}}{\Delta t}e^{-\Delta t/\tau} + \frac{1-e^{-\Delta t/\tau}}{\Delta t}e^{-2\Delta t/\tau} + \ldots \right) \\ &= \left(1-e^{-\Delta t/\tau}\right)\sum_{k=0}^{\infty}\left(e^{-\Delta t/\tau}\right)^k \\ &= \left(1-e^{-\Delta t/\tau}\right)\frac{1}{1-e^{-\Delta t/\tau}} \\ &= 1. \end{align}

Wednesday, August 28, 2013

unit circle under different norms

Using the p-norm defined as
\[||x||_{p} = \left(\sum_{i=1}^{n} |x_{i}|^{p}\right)^{1/p}\]
for \(p>0\), we can look at what a unit circle is for each norm. Note that for \(0<p<1\) the formula above is not actually a norm as it violates the triangle inequality required to be a norm. Regardless, let's look at what a unit circle looks like in 2D for a few of the norms:
Isn't it cool how the corners of the unit circle get pushed out as the order of the norm increases! 


Friday, August 16, 2013

The importance of high spike rates


My lab uses the Neural Engineering Framework (NEF) for a principled method of implementing computations on our neuromorphic hardware. NEF uses the spike rates of neurons to represent values and calculate the weights connecting neurons. A fundamental dilemma is then how do we estimate the spike rate of a neuron? Real neurons can't just tell you their spike rate (Although simulated ones certainly can), so you have to estimate the neural spike rate from the spikes that the neurons output. 

A simple, reasonable, first approach to estimating neural spike rate would be to just count spikes over a time window and divide the number of spikes by the length of the time window. This would work in the simple case of having as long as you want to observe and with an unchanging spike rate to estimate. In this case, as you make the observation window longer and longer, the spike rate estimate will become closer to the actual spike rate. 

Unfortunately, we don't have the luxury of time, and our systems work in dynamic environments where variables are constantly changing.  Using a box window is also limited by quantization: A spike is either in the window or not, so your spike rate estimate will be quantized to multiples of \(1/T\), where \(T\) is the window length. Here, we can see that as \(T\to\infty\), our estimate could take on any value. However, we typically limit \(T\) to 100ms in operation, so if we used a box window, our spike rate estimates would be quantized to multiples of 10Hz, which not very precise. 

A better way to estimate spike rates for calculating the weights to connect neurons in our network would be to see spike rates in a manner similar to how the neurons will see them. 

How does a neuron see a spike rate? Well, spikes interact with a neuron through the neuron's synapse, which is a low pass filter in our neuron model. The synapse converts the spike (impulses) into a current that is inject into the neuron's soma. It is this current that can be interpreted as what the neuron sees as the spike rate. 

Therefore, to put ourselves in the perspective of the neuron when collecting spike rate data to training the network, we can use a low pass filter, with time constant matched to the neural synapses, to estimate the spike rate instead of counting spikes over a time window. For Poisson and uniform spike inputs, this method looks something like this:
Spike rate estimation for various rates over time. Blue traces are filter estimate when input spikes are generated uniformly. Red traces are filter estimate when input spikes are generated from a Poisson process. Filter time constant is 100ms. Spike rate estimates are normalized to the actual firing rate, so the correct estimate on the y-axis is 1. Vertical dotted lines denote the first 5 time constants of time. Horizontal dotted lines denote exponential progression towards steady state value.
A few things to note from the plot:
  • As spike rates increase, our estimate gets tighter to the actual spike rate
  • A uniform spike input is understandably less noisy than a Poisson spike input.
  • It takes time for the synapse to reach its steady state spike rate estimate. Note how the traces follow the saturating exponential function indicated by the dotted line. 
Why does it make sense that as spike rate increases, the the estimate gets tighter to the actual spike rate? Consider the Poisson spike process, where the number of spikes seen in a window of time is described by a Poisson distribution, in which the mean equals the variance, \(\mu=\sigma^{2}.\) As the rate of the Poisson distribution increases, it looks more and more like a Gaussian distribution. For a Gaussian distribution, about 68% of the probability is within 1 standard deviation of the mean.  Looking at the ratio between the standard deviation and the mean, 
\[\frac{\sqrt{\sigma^{2}}}{\mu}=\frac{\sqrt{\mu}}{\mu}\]
\[ = \frac{1}{\sqrt{\mu}}\]
we see that as the mean rate increases, the spread around the mean drops with the square root of the mean, which explains why the estimate gets tighter to the actual spike rate with higher rates.

It's intuitive that a Poisson spike input would be more noisy than a uniform input. But how much more noisy?Looking at the distribution of spike rate estimate values collected over time after allowing the system to settle (i.e. run for 5 time constants) gives us an idea of what the synapse estimates as the rate:
Distribution (i.e. histogram) of spike rate estimates with a uniform spike input. x-axis is spike rate estimate, and y-axis is histogram counts so area of histogram is 1.
Distribution (i.e. histogram) of spike rate estimates with a Poisson spike input. x-axis is spike rate estimate, and y-axis is histogram counts so area of histogram is 1.
Of note from these plots:
  • The estimate of the Poisson spike rate begins to look Gaussian as the rate increases.
  • The estimate of the uniform spike rate begins to look uniform as the rate increases. Surprise surprise!
  • The x-axis scaling gets tighter with high firing rates, reflecting that our estimate gets better and better.
Code for this post found here.

Monday, July 29, 2013

Noisy Dynamical Systems

Today we're going to play with a nonlinear dynamical system and noise. Code for singing along is posted on github. Our system is described, in general, by
\[\dot{x} = f(x),\]
which says that the rate of change of \(x\) is just some arbitrary function \(f(\cdot)\) of \(x\). In block diagram format, the system looks like this:

block diagram of system
System block diagram. \(1/s\) is an integrator.  The derivative of \(x\) is its input, and \(x\) is its output.
Specifically, let's let our nonlinear function take the form 
\[f(x) = e^{-x}-1.\]

To start analyzing this system, we'll plot the phase portrait, which shows us how \(\dot{x}\) depends on \(x\).
Phase portrait of our dynamical system
The phase portrait for our nonlinear system tells us how the system changes. We'll be running this system from several different initial conditions indicated with the black to red  dots. Their positions along the x-axis are their initial condition, and their positions along the y-axis are their initial derivatives. 
From the phase portrait, we can tell that the system's state, or value of \(x\), increases when the state is negative and decreases when the state is positive. Therefore, we can predict that the system will stabilize at 0.

Do we see this stable point?
System state trajectory over time for initial states scattered between -1.5 and 1.5 .

Yes! Great!

What if we add some noise? 
\[\dot{x} = f(x) + u \]
\[u \text{ is } \mathcal{N}(0, \sigma)\]
That is, we've added Gaussian white noise of variance \(\sigma^{2}\) to our system.
Our system block diagram now looks like this: 
Block diagram of system with noise added to input of integrator.
Do we still see the system go to same stable point?

System simulation with noise added to the input of the integrator.  Noise is  Gaussian white noise with variance indicated by \(\sigma\) above each subplot.

It looks like the system goes to the same stable point at low levels of noise, but system get's pretty swamped with noise at higher variances, which makes it hard to tell if the system settles around the predicted stable point. Running the simulation many times and averaging across trials helps us see the long term behavior more clearly:
Each line is the average of 25 trials.
That's better. It looks like the system settles, on average, at the predicted state for low noise levels, but at higher levels of noise, the system seems to be averaging at a higher state value. If we do the same averaging over trials but with much larger noise, we see this effect more clearly:

The system no longer averages out anywhere close to the predicted stable point without noise. The system doesn't even look like it reaches its average stable value by the end of the simulation for very high levels of noise. Why does the stable point shift?

Does the average stable point change because we are now inputting a low pass filtered version of the noise into the nonlinearity? I'm not sure of an intuitive way of understanding this yet.

Now let's tweak the system a little:

\[\dot{x} = f(x+u)\]

Now, noise is passed through the nonlinearity before integration.

Show system diagram.
Block diagram of system with noise added to the input to the nonlinearity.
Running the simulation for this system yields:

System simulation with noise added to the input to the nonlinearity.  Each line is a single trial, not averaged. 
Look carefully at the value that the system settles at for each level of noise. The system is less noisy, but fixed point position is far more sensitive to the level of noise!

Think about it this way. If you are the system, and you are at a state x, what do you see? Because of the noise you see the everything around you. So if we take the noise distribution and average each point of the phase portrait with our noise (i.e. convolve our system function \(f(\cdot)\) with the pdf (probability density function) of our noise), we get a better sense of the fixed points of our system:
Phase portrait of the the system after convolving the system function, \(f\), with the noise distribution pdf.

Another way of analyzing the effect of noise is via the Taylor series expansion of the \[f(\cdot)\]. With noise, \(\dot{x}=f(x+u)\), and
\[f(x+u)=f(x)+f'(x)u+\frac{1}{2}f''(x)u^{2}+\frac{1}{3!}f^{(3)}(x)u^3+...\]
Taking the expected value,
\[E[f(x+u)]=f(x)+f'(x)E[u]+\frac{1}{2}f''(x)E[u^{2}]+\frac{1}{3!}f^{(3)}(x)E[u^3]+...\]
Since we're assuming \(u\) is symmetric about 0, the expected value of the odd powers of \(u\) are also 0,  so
\[E[f(x+u)]=f(x)+\frac{1}{2}f''(x)E[u^{2}]+\frac{1}{4!}f^{(4)}(x)E[u^4]+...\].
Before moving forward, we can see here why the phase portrait changes with noise. When noise entering the nonlinearity, \(\dot{x}\) is no longer just \(f(\cdot)\), but \(f(\cdot)\) summed with its derivatives weighted by the higher order moments of \(u\). In our example case,
\[f(x)=e^{-x}-1\]
\[f'(x)=-e^{-x}\]
\[f''(x)=e^{-x}\]
\[f^{(3)}(x)=-e^{-x}\]
\[f^{(4)}(x)=e^{-x}\]
\[\vdots\]
Notice a pattern? All of the odd ordered derivatives are the same, and all of the even ordered derivatives are the same! So for our example,
\[E[f(x+u)]=e^{-x}-1+\frac{1}{2}e^{-x}E[u^{2}]+\frac{1}{4!}e^{-x}E[u^4]+...\]
\[E[f(x+u)]=e^{-x}-1+e^{-x}\left(\frac{\sigma^{2}}{2}+\frac{3\sigma^{4}}{4!}+...\right)\]

This is neat! We can analytically see that noise changes the phase portrait.


What's different between the two noise cases?
Both noise cases change the dynamics of the system, but the system is more sensitive to noise input to the nonlinearity.

Keep in mind that in reality we can really see both kinds of noise:.
Block diagram of system with noise added to both 
If you can control its distribution, noise can even become another knob to tune your system!

Sunday, July 14, 2013

A multiply built with adds and bitshifts

Let's be multiplication machines! In our toolbox, let's say we only have adds and bitshifts. We're wonderful machines that can add any two numbers together just as easily as any other two numbers. It's just as easy to do \(1+1\) as it is to do \(5391 + 213492\). We're also binary thinkers and can easily move bits left and right to effectively multiply and divide by 2. Isn't life wonderful? But we don't want to just multiply by 2's. We want to be able to multiply by any two, arbitrary numbers.
How do we calculate something like \(m \times n\) for any \(m\) and \(n?\)
Or maybe since using concrete numbers is nice, let's say \(m=13\) and \(n=127\).

Going way back to when we were but incy wincy adders just learning how to multiply, we learned that multiplication is shorthand for repeated addition, so to calculate \(13 \times 127 \), we could either add 13 to itself 127 times, or add 127 to itself 13 times
\(13 \times 127 = 13 + 13 + 13 + ... \) repeat until we've added 127 13s.
or
\(13 \times 127 = 127 + 127 + 127 + ... \) repeat until we've added 13 127s.
The first way of calculating \(13 \times 127 \) takes 126 operations, and the second way of calculating takes about 12 operations, so we'll prefer the second method. So now we know that \(m \times n \) can be done in \(\min{(m,n)}\) time, that is, linear in either \(m\) or \(n\), whichever is smaller. An implementation of this in c would look something like this:


int multiply_by_adds(int m, int n)
{
   int iterations = MIN(m, n);
   int base = MAX(m, n);
   int result = 0;
   for (int i=0; i < iterations; i++) {
      result += base;
   }  
   return result;
}


But can we do better?

Yes we can! We have bitshifts in our toolbox. What do we know about bitshifts? We know that shifting left is the same as multiplying by powers of two. Each shift left increases the power of two by which we multiply.

In binary,
\[127 = 001111111.\]
\[127 \times 1 = 127 = 001111111 = 127<<0\]
\[127 \times 2 = 254 = 011111110 = 127<<1\]
\[127 \times 4 = 508 = 111111100 = 127<<2\]
etc...

Conveniently, binary representation exactly expresses a number as the sum of powers of two! This representation is really cool because decomposes our weird numbers into factors that the computer can easily manipulate.
\[13 = 8 + 4 + 1 = 2^3 + 2^2 + 2^0 = 1000 + 0100 + 0001 = 1101.\]

Therefore, we can rewrite
\[13 \times 127 = (8 + 4 + 1) \times 127 = \]
\[(8 \times 127) + (4 \times 127) + (1 \times 127) =\]
\[(127<<3) + (127<<2) + (127<<0)  = \]
\[1016 + 508 + 127  = 1651 \]
and we're done! no need for silly repeated additions of \(127\).

By leveraging the binary representation of our numbers, we've written an algorithm that requires one shift and one addition for each power two that compose \(m\) or \(n\). Now there is a slight nuance given the way integers are represented with a fixed number of bits on our computers, so in practice we can iterate through each bit of the representation to do this in constant time. An implementation in c would look something like this:

int multiply_by_shifts_adds(int m, int n)
{
   int multiplier = MIN(m, n);
   int base = MAX(m, n);
   int result = 0;
   for (int i = 0; i < BITS_IN_INT; i++) {
      if (0x1 << i & multiplier) result += base << i;
   }
   return result;
}

However, iterating through all of the bits is pretty wasteful.  If many of the bits are 0, we should just iterate through the minimum number of bits. Then we will have an algorithm that can be done in \(\log(\min{(m,n)})\) time that is roughly bounded by the constant time it takes to just iterate through all the bits of the integer representation. An implementation in c, using the gcc compiler, would look something like this: 

int multiply_by_shifts_adds_opt(int m, int n)
{
   int multiplier = MIN(m, n);
   int base = MAX(m, n);
   int result = 0;
   if (multiplier == 0) return 0;
   int lead_zeros = __builtin_clz(multiplier); // builtin gcc function
   int highest_pow = BITS_IN_INT - lead_zeros - 1;
   for (int i = highest_pow; i >= 0 ; i--) {
      if (0x1 << i & multiplier) result += base << i;
   }
   return result;
}

Full github code here.

Timing results for the three multiplier implementations discussed above.  As predicted, the implementation with just adds increases linearly, the implementation with shifts and adds that works through all the bits of our integer representation works in constant time, and the optimized implementation using shifts and adds that works through the minimum number of bits works it logarithmic time.  

Of course we would never implement something like this in practice at the level of c. All higher level languages have multiplication as a basic operator, but this exercise gave us a glimpse of the power of binary representation that allow computers to do computations quickly.  Multiply away, computers!

Sunday, April 14, 2013

Mounting Stanford AFS Ubuntu

Stanford uses AFS to host filesystem space for students. Stanford also uses Kerberos for authentication. The combination of the two can be maddening to work with remotely.

These directions are to for people with running Ubuntu who would like to mount the Stanford afs to access files as if they were local. Restating 1nfty's directions:
  1. Install kerberos and afs related components
    sudo apt-get install krb5-user krb5-clients openafs-krb5 openafs-client
  2. Make sure /afs exists. If not, create it.
  3. Reboot to find /afs directory filled with pointers to afs volumes all over the world.
  4. Authenticate through kerberos and obtain AFS token by sequentially running
    kinit
    aklog
  5. Make soft link of your user directory to somewhere more convenient
    ln -s /afs/ir.stanford.edu/[path_to_afs_home_directory] ~/afs
These directions did not work straight out of the box for me. I'm pretty sure Narasimham's directions were all I needed, but I'm not sure. Stanford's directions were pretty bloated and I couldn't make much sense of them. If(When) I have to go through the procedure again, then I'll more thoroughly document the procedure and clean this post up.

Once everything is all set up, don't forget to authenticate before actually trying to access your files:
kinit
aklog