dmo.ca/ blog/ Now, when did I send that patch again?

Let's say you've got a customer with some code that was pulled from your source repository at some arbitrary point in the past, and now you want to give them a few bugfixes.

So, you just pull out your SCM toolbox, and diff from the release point to now, and send them a patch, right? Waitaminute.... what if you don't have any record of when you gave them a code drop? Uh-oh.

If we were thinking ahead, we would have tagged our tree when we gave them a code drop. But, we didn't. Thankfully, we've moved to git, which lets us find a way out of this mess...

First, grab that file you were looking to update, and pull it down to a system that contains your git tree. Then, run:

$ git-hash-object your-file
9846d4480ae4a6fc6a2d3685d752b3c7c7f4d64c

This gives you the git hash for that file. Now, we can search our git tree for it, and find where that particular hash exists. Git doesn't provide any direct support for doing this, but we can script the git tools to do it easily enough.

The first thing we need to do is fetch a list of all commits. We do this with git-rev-list like so:

$ git-rev-list --since="6 months ago" HEAD
d1691ca727347c82f736d6c8d7b73a583960f437
4b401430b1554a86e8ed30badbdf3ef976a3515a
...  ...  ...

This gives us the IDs of all commit objects for the last 6 months. From these commit objects, we can extract the tree of files with:

$ git-cat-file commit d1691ca727347c82f736d6c8d7b73a583960f437
tree 09c2cb6c41011cf461a34d886cfd07a877b4be53
parent 4b8309de2751998f5fb57ec3e0bfe6c8069d09d6
author Dave O'Neill <dmo@roaringpenguin.com> 1188317338 -0400
committer Dave O'Neill <dmo@roaringpenguin.com> 1188317338 -0400

Some big long commit message was here.

All we care about at this point is the 'tree' line. This gives us a starting point to list the contents of our tree object -- the files (or "blobs" in gitspeak). We do that with this command:

$ git-ls-tree -r 09c2cb6c41011cf461a34d886cfd07a877b4be53
100644 blob 2e6e14fdeca600263dcd92ccf7535da56b8b3c5e    .gitignore
100644 blob 177974045858e4e19c43f57199b1c30c581a739f    MANIFEST
100644 blob b3c1ddf8b30721c297d3243112d2cc79796750b4    Makefile.PL
... ... ...

Now, we just need to loop over each 'blob' line in the output, and compare the hash we obtained from our git-hash-object above to find the matching file.

Rolling it all together gives us a nice shell script that takes a filename and gives you back the commit IDs of the checkins containing that file:

#!/bin/sh

filename=$1

want=$(git-hash-object $filename)

git-rev-list --since="6 months ago" HEAD | while read commit ; do
    treeish=$(git-cat-file commit $commit | awk '/^tree/ {print $2}')
    git-ls-tree -r $treeish | while read perm type hash filename; do 
        if test "$want" = "$hash"; then
            echo matched $filename in commit $commit
        fi
    done
done

Of course, much more could be done to make that script nicer and more friendly, like outputting more information about the commit (nearest tag, for example, or the commit message), but it's not necessary for my purposes.

Update 1: Thanks to Bart for helping me figure out how to do this.

Update 2: It's been pointed out to me by mountie that this can be made even simpler by removing the git-cat-file, as git-ls-tree takes anything tree-ish, including a commit. So... the revised version:

#!/bin/sh

filename=$1

want=$(git-hash-object $filename)

git-rev-list --since="6 months ago" HEAD | while read commit ; do
    git-ls-tree -r $commit | while read perm type hash filename; do 
        if test "$want" = "$hash"; then
            echo matched $filename in commit $commit
        fi
    done
done