Thursday, August 1, 2013

Importing lots of files into git-annex

If, like me, you find git-annex very appealing, but have been moving files without it for a while, here is a little script that may help you through the transition. I've just converted my MP3 library to git-annex, but, now I must convert all my copies of the library on various computers to track one (or more) of them. The trick is to teach git-annex about the already existing files, as re-downloading a 20GB library through a 80 KB/s upload DSL line doesn't sound that funny. Here is how I did it.

First, convert one repository to using git-annex. Then, switch to another computer, move away the directory containing the files to a backup directory, and git clone the first repository. Then, from the directory created, run:


~ ./import path/to/old/backup/copy

Where import is the following script:


#! /bin/bash

src="$1"

IFS=$'\n' 
for f in $(find -type l); do 
    if stat -L "$f" >/dev/null 2>/dev/null; then
        echo "File $f ok"
    else
        tg=$(readlink "$f" | sed 's/.*git/.git/' )
        dir=$(dirname "$tg")
        if [ -r "$src/$f" ]; then
            mkdir -p "$dir"
            cp -avl "$src/$f" "$tg"
        fi
    fi
done

Once this is done, you'll need to run

~ git annex fsck

Of course, there is no warranty !! It saved me lots of download time, but it could irretrievably damage your data, put your dog in danger or set fire to your house, so use with care !

2 comments:

Justin A said...

git-annex can do this for you:

http://git-annex.branchable.com/tips/recover_data_from_lost+found/

Vincent Fourmond said...

That's nice, I spent some time looking for a feature like that before giving up and coming up with the script.