Homo Habilis: August 2012

Logistic regression is used for classification problems. As Andrew said, it's a bit confusing given the "regression" in the name.

LR cost function is given by:
\[
\text{Cost} (h_\theta (x),y) =
\begin{cases}
-\log(h_\theta (x)) \quad &\text{if } y=1 \\
-\log(1-h_\theta (x)) \quad &\text{if } y=0
\end{cases}
\]

where \(h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}}\) is the logistic function.

Since \(y \in \{0,1\}\) only, we can reduce the cost function to an equivalent, single equation.
\[
\text{Cost} (h_\theta (x),y) = -y\log(h_\theta (x)) - (1-y)\log(1-h_\theta (x))
\]

This leads to the overall cost function for the logistic regression:
\[
J(\theta) = -\frac{1}{m} [\sum_{i=1}^m y^{(i)}\log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1-h_\theta (x))]
\]

Our goal is to find \(\min_\theta J(\theta)\). To do so, we use gradient descent, but we first need to find the partial derivatives \(\frac{\partial}{\partial \theta_j} J(\theta)\).

We're going to make use of a neat property of the logistic function:
\begin{align}
g'(z) &= \frac{d}{dz} \frac{1}{1+e^{-z}} = \frac{1}{(1+e^{-z})^2}e^{-z} \\
&= \frac{1+e^{-z}-1}{(1+e^{-z})^2} = \frac{1}{1+e^{-z}}-\frac{1}{(1+e^{-z})^2} = \frac{1}{1+e^{-z}}(1-\frac{1}{1+e^{-z}}) \\
&= g(z) (1-g(z))
\end{align}

So for our with our cost function:
\begin{align}
\frac{\partial}{\partial \theta_j} J(\theta) &= -\frac{1}{m} \left [\frac{\partial}{\partial \theta_j} \sum_{i=1}^m y^{(i)}\log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1-h_\theta (x)) \right] \\
&= -\frac{1}{m} \left [ \sum_{i=1}^m y^{(i)}\frac{1}{h_\theta(x^{(i)})}\frac{\partial}{\partial \theta_j}h_\theta(x^{(i)}) + (1-y^{(i)})\frac{1}{1-h_\theta(x^{(i)})}\left (-\frac{\partial}{\partial \theta_j}h_\theta (x^{(i)})\right) \right]
\end{align}

using the chain rule and the logistic regression derivative, we see that
\begin{align}
\frac{\partial}{\partial \theta_j} J(\theta) &=-\frac{1}{m} \left [ \sum_{i=1}^m y^{(i)}\frac{x_j^{(i)}}{h_\theta(x^{(i)})}h_\theta(x^{(i)})(1-h_\theta(x^{(i)})) - (1-y^{(i)})\frac{x_j^{(i)}}{1-h_\theta(x^{(i)})}h_\theta (x)(1-h_\theta(x^{(i)})) \right] \\
&= -\frac{1}{m} \left [ \sum_{i=1}^m y^{(i)}x_j^{(i)}(1-h_\theta(x^{(i)})) - (1-y^{(i)})x_j^{(i)}h_\theta (x^{(i)})) \right] \\
&= -\frac{1}{m} \left [ \sum_{i=1}^m y^{(i)}x_j^{(i)} - x_jh_\theta (x^{(i)}) \right] \\
\frac{\partial}{\partial \theta_j} J(\theta) &= \frac{1}{m} \sum_{i=1}^m (h_\theta (x^{(i)}) - y^{(i)}) x_j^{(i)}
\end{align}

This formula can now be used in gradient descent.

I recently made the mistake of merging unwanted data files into the lab git repository, and unfortunately and these files weren't discovered until after they were also pushed to the remote repository.

To clean up the git repository, I used the git filter-branch command, force pushed the changes to the remote, and then had collaborators rebase their local copies of the contaminated branch.

Using git filter branch:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch [filename]' --[commit to begin with]^..
rm -Rf .git/refs/original
rm -Rf .git/logs
git gc

Force pushing the changes:
git push --force origin [branch name]

Have collaborators rebase their local branches
git fetch (NOT PULL!!!!!)
git rebase --onto origin/[branch] [branch] [local branch ([branch] and anything derived from [branch])]

The last command does a hard reset on of the current branch to origin/[branch]. It then takes the commits from [local branch] and [branch] and then applies them to the current branch.

In addition I learned how to use regex to search for files matching wildcards through subdirectories -- you need to use the escape character \

e.g. to git remove all .txt files from a directory and all subdirectories:
git -rm ./\*.txt
without the escape character, the * wildcard is expanded by the shell. with the escape character, git rm is allowed to interpret the *.txt

Homo Habilis

Sunday, August 19, 2012

Partial Derivative Logistic Regression Cost Function

Saturday, August 4, 2012

cleaning up unwanted data files from a git repository