Git-annex

From Thomas-Krenn-Wiki
Jump to: navigation, search

Git-annex manages files in the git repository without playing their contents directly into thegit repo. This seems somewhat paradoxical at first, but keeps git from having to manage to large of files in the repo. Here only the file name and associated data is located directly in the git repo. The data in the files themselves are stored in a separate folder and are managed by git-annex.

Git-annex provides different usage scenarios and security functions. It can ensure that multiple copies of a file are included in the repositories. As a result, a file cannot be accidentally deleted, because git-annex checks the number of copies. Furthermore, the complete contents of a file no longer need to be available on each system. They can be retrieved by other systems via git annex get when needed.

Installation

The git-annex installation on Ubuntu can be performed manually or from the repos:

  1. The git-annex version lags behind the current development versions from the repos:
  2. Manually
    • git-annex is available as pre-compiled software.[1] The binaries are unpacked and used on the target system:
:~$ wget 'http://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-amd64.tar.gz'
:~$ tar xzf git-annex-standalone-amd64.tar.gz
:~$ PATH="$PATH:$HOME/downloads/git-annex.linux"
:~$ git-annex version
git-annex version: 4.20131003-gbe0b734
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV Inotify DBus XMPP Feeds Quvi TDFA
key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SHA256 SHA1 SHA512 SHA224 SHA384 WORM URL
remote types: git gcrypt S3 bup directory rsync web webdav glacier hook

The path changes can also be set permanently on Ubuntu (see Permanently setting environment variables on Ubuntu). The following line is added to the file of the home directory .pam_environment:

PATH DEFAULT=${PATH}:${HOME}/Applications/git-annex.linux

Internal Structures

Git-annex manages the file name like symbolic links in their actual content in the so-called "Indirect Mode":[2]

:~/annex$ ls -la
[...]
debhelper-slides.pdf -> .git/annex/objects/32/64/SHA256E-s1988981--8aaa02dda217
bbabd79a11a5f93fdd4ca8ae4e723c86b4bb91c69d4095a84006.pdf/SHA256E-s1988981--8aaa
02dda217bbabd79a11a5f93fdd4ca8ae4e723c86b4bb91c69d4095a84006.pdf

If the file contents to a file name are available, then a valid symbolic link exists. Otherwise, the link is shown as empty and the file content must first be retrieved from another repository using git annex get.

Modifying Files in Indirect Mode

To edit a file that is located in the git-annex repository, the file must first be unlocked. This step should be made first to preserve the file in case of accidental loss (git-annex checks to see if there are numcopies of the file in other repos):

:~/annex$ git annex unlock debhelper-slides.pdf
unlock debhelper-slides.pdf (copying...) ok
:~/annex$ ls -la
[...]
-rw-r--r--  1 tktest tktest 1988981 Oct  9 12:30 debhelper-slides.pdf

After unlocking it the file can be edited. A subsequent commit again generates the symbolic link via a post-commit hook for git-annex.

Direct Mode

In addition to the Indirect Mode, the Direct Mode offers the convenience of editing the files directly. The security features git-annex normally provides are not needed. Therefore, all git-annex repositories start, in general, in Indirect Mode. The repositories that were created via the web interface by git-annex assistant are the exception. The modes can also switch between indirect and direct using commands:

:~/annex$ git annex direct
commit  
# On branch master
nothing to commit (working directory clean)
ok
direct debhelper-slides.pdf ok
direct git-pkg-2011.pdf ok
direct  ok

Create and Manage Your First Directory

Alice and Bob synchronize their repositories via SSH.

In the following example, Alice and Bob exchange data directly. They both have a git-annex repository where the data is managed. Since they can communicate directly with each other, via SSH, with git-annex:

  • Alice adds a file to git-annex on her system and updates the metadata with git annex sync.
  • Bob calls up git annex sync to also synchronize. In the first step he receives a broken symbolic link in the file.
  • Bob calls up git annex get and gets the contents of the system file.
  • Alice actively copies using git annex copy the file to Bob.
  • Bob locally accepts the files that have come from Alice in git-annex using git annex sync.
  • Be sure that each file is present at least once. As long as the other holds the file, one or the other can drop (let the data maintain the data name or the link).

git-annex Repo Alice

Alice creates annex, a git-annex repository in her directory:

:~/annex$ git init
Initialized empty Git repository in /home/alice/annex/.git/
:~/annex$ git annex init "Alice"
init Alice ok
(Recording state in git...)

She adds the data that she wants to manage with git-annex:

:~/annex$ cp ~/Downloads/debhelper-slides.pdf .
:~/annex$ git annex add .
add debhelper-slides.pdf (checksum...) ok
(Recording state in git...)
:~/annex$ git commit -a -m "Added slides"
[master (root-commit) 761a810] Added slides
 1 file changed, 1 insertion(+)
 create mode 120000 debhelper-slides.pdf

To synchronize with Bob, she adds the git-annex repository as a remote repository:[3]

:~/annex$ git remote add bob ssh://bob@192.168.56.104/home/bob/annex

git-annex Repo Bob

Bob clones the existing repo of Alice and initializes his git-annex repository with his name:

:~/annex$ git clone ssh://alice@192.168.56.1/home/alice/annex .
Cloning into '.'...
remote: Counting objects: 13, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 13 (delta 2), reused 0 (delta 0)
Receiving objects: 100% (13/13), done.
Resolving deltas: 100% (2/2), done.
:~/annex$ git annex init "Bob"
init Bob ok
(Recording state in git...)

He also adds Alice remotely:

:~/annex$ git remote add alice ssh://alice@192.168.56.1/home/alice/annex

Alice and Bob are now Synchronized

Alice and Bob can update on their pages via

git annex sync

each others repos and share changes with each other:

:~/annex$ git annex sync bob
commit  
ok
pull bob 
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 5 (delta 0), reused 1 (delta 0)
Unpacking objects: 100% (5/5), done.
From ssh://192.168.56.104/home/bob/annex
 * [new branch]      git-annex  -> bob/git-annex
 * [new branch]      master     -> bob/master
ok
(merging bob/git-annex into git-annex...)
(Recording state in git...)
push bob 
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 435 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To ssh://bob@192.168.56.104/home/bob/annex
 * [new branch]      git-annex -> synced/git-annex
 * [new branch]      master -> synced/master
ok

When Alice has added new files, Bob first receives a broken symbolic link without the actual file data:

:~/annex$ git annex sync alice
(merging synced/git-annex origin/git-annex into git-annex...)
commit  
ok
pull alice 
From ssh://192.168.56.1/home/alice/annex
 * [new branch]      git-annex  -> alice/git-annex
 * [new branch]      master     -> alice/master
 * [new branch]      synced/master -> alice/synced/master
ok

The following find command lists broken symbolic links:

:~/annex$ find -L . -type l
./git-pkg-2011.pdf

Via a git annex get from Alice, he can also receive the file contents:

:~/annex$ git annex get .
get debhelper-slides.pdf (from alice...) 
SHA256E-s1988981--8aaa02dda217bbabd79a11a5f93fdd4ca8ae4e723c86b4bb91c69d4095a84006.pdf
     1988981 100%   24.63MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 1989376 bytes  265254.13 bytes/sec
total size is 1988981  speedup is 1.00
ok
(Recording state in git...)

Alice Copies the Data to Bob

:~/annex$ cp ~/Downloads/git-pkg-2011.pdf .
:~/annex$ git annex add .
add git-pkg-2011.pdf (checksum...) ok
(Recording state in git...)
:~/annex$ git commit -a -m "Added tutorial"
[master 50c8091] Added tutorial
 1 file changed, 1 insertion(+)
 create mode 120000 git-pkg-2011.pdf
:~/annex$ git annex copy . --to bob
copy debhelper-slides.pdf (checking bob...) ok
copy git-pkg-2011.pdf (checking bob...) (to bob...) 
SHA256E-s359984--e87901d377b5c31377a87eb07a28cd133b07feed380f869867abb04bc85d3e47.pdf
      359984 100%   52.01MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 360173 bytes  received 31 bytes  720408.00 bytes/sec
total size is 359984  speedup is 1.00
ok
(Recording state in git...)

When Bob synchronizes the file the file contents are also included. Without git annex copy Bob would have only found a broken system link.:

:~/annex$ git annex sync
commit  
ok
pull origin
:~/annex$ ls
debhelper-slides.pdf  git-pkg-2011.pdf

Since Alice copied the file, it is now located in both Alice and Bob:

:~/annex$ git annex whereis .
whereis debhelper-slides.pdf (2 copies) 
  	de5e57a3-4517-4a05-84ee-60708bbd9d3b -- here (Bob)
   	e3d44122-8756-4f1c-aa5b-5ecdfe01bc4b -- origin (Alice)
ok
whereis git-pkg-2011.pdf (2 copies) 
  	de5e57a3-4517-4a05-84ee-60708bbd9d3b -- here (Bob)
   	e3d44122-8756-4f1c-aa5b-5ecdfe01bc4b -- origin (Alice)
ok

References

  1. git-annex Installations-Pakete (git-annex.branchable.com)
  2. git annex direct mode (git-annex.branchable.com)
  3. git-remote Manual Page (kernel.org)

Author: Georg Schönberger

Related articles

Git Server Configuration
Git-annex Archive with Git-annex Assistant
Git-annex Repository on an External Hard Drive