Saturday, August 4, 2012

cleaning up unwanted data files from a git repository

I recently made the mistake of merging unwanted data files into the lab git repository, and unfortunately and these files weren't discovered until after they were also pushed to the remote repository.

To clean up the git repository, I used the git filter-branch command, force pushed the changes to the remote, and then had collaborators rebase their local copies of the contaminated branch.

Using git filter branch:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch [filename]' --[commit to begin with]^..
rm -Rf .git/refs/original
rm -Rf .git/logs
git gc

Force pushing the changes:
git push --force origin [branch name]

Have collaborators rebase their local branches
git fetch (NOT PULL!!!!!)
git rebase --onto origin/[branch] [branch] [local branch ([branch] and anything derived from [branch])]

The last command does a hard reset on of the current branch to origin/[branch].  It then takes the commits from [local branch] and [branch] and then applies them to the current branch.

In addition I learned how to use regex to search for files matching wildcards through subdirectories -- you need to use the escape character \

e.g. to git remove all .txt files from a directory and all subdirectories:
git -rm ./\*.txt
without the escape character, the * wildcard is expanded by the shell.  with the escape character, git rm is allowed to interpret the *.txt

No comments:

Post a Comment