Site Links:

Home / Index
Qt Programming and Hints
Understanding IPv6
My Technical Projects

Everyday GIT Hints

Sometimes you just find yourself in a jam. There is a firewall that just does not like you, you don't want to pollute your configuration, or the grass is just greener on someone else's machine... If you do not use GIT in rather complex scenarios every day, you will probably not have come across all the hundreds of options and possibilities that it provides.

This article gives an overview over a few options that I have found useful and some scenarios that have baffled me for some time before I was able to solve them (most of the time the same way you did: by googling them)...

Note: this collection of hints is not meant as a tutorial for GIT newbies, I assume a general familiarity with GIT. Once you are familiar with GIT this article may provide a few hints on how to optimize your work.

Useful Options

GIT has three configuration locations - all of them use the same syntax and offer the same options. Not all of those options are useful in all of these configuration files, though...

System Config is usually found in /etc/gitconfig (or in the etc directory relative to where you installed GIT). Usually this file does not exist - it can be used by administrators to set defaults for all users of the system. Usually there is no good reason to do that.

Global Config is the configuration for the current user and is consequently found in the user's home directory (Linux: $HOME/.gitconfig, Windows: %USERDIR%/.gitconfig). This is where most options listed below belong - this file influences how GIT handles all the repositories of that user.

Local Config is the configuration for the local repository. Inside each repository it is found in .git/config . It usually contains settings for local branches, remote repositories tracked by this repository, etc. It is also the right place to override global options if they are just not right for this one repository.

There are two ways of changing configuration: you can use the git config command or you can edit the config files directly - don't be afraid of the latter: Git is quite tolerant towards manual edits and can handle a lot of quirks. The main difference between those two options is the notation - on the command line an option may be called user.name, while in the config file it is stored as a section [user] with the option name = ....

My own $HOME/.gitconfig looks like this:

[user]
        name = Konrad Rosenbaum
        email = kon...@...
[diff]
        renames = copies
[merge]
        defaultToUpstream = true
[alias]
        co = checkout
        ci = commit
        st = status
        stat = status
[core]
        editor = nano

Option	Description
user.name	These are pretty much the first options you should set globally (for the user) - these contain what is recorded as committer when you create new revisions. You will probably not like the automatically set values.
user.email
diff.renames	Git does not track files - it tracks content. However when you want to display file history it has to change what it tracks to something that you understand as a user. The default (false) is that it treats a new file name as something separate. If you set this to "true" it will detect when a file has been renamed (same content, different name). If you set it to "copies" it will also detect if a file has bee copied (same content, two file names).
merge.defaultToUpstream	Normally Git behaves quite dumb. If you ask it to pull or merge it will always ask you from which remote you want to pull or merge. With this option set to true both commands will default to the remote that the current branch is tracking (i.e. its upstream branch, the one it was cloned/copied from).
alias.*	Other than Subversion (or, god forbid, CVS) Git does insist that you completely spell sub-commands - it will not try to fill in the blanks (e.g. when you type `svn com` Subversion will infer that you actually meant `commit`; Git insists on `git commit`). You can use aliases to create shortcuts for existing commands or you can create elaborate scripts that are called instead.
core.editor	The core.editor option sets which program is called to edit commit messages - set it to your favorite ASCII file editor (I prefer GNU nano, other may want notepad++). When GIT needs an editor it tries these in order: 1) the one specified in the environment variable GIT_EDITOR, 2) the one set in this option, 3) the one specified in the environment variable EDITOR, 4) it tries to call `editor` directly (which is usually an alias for vi)
core.pager	This option sets which program is used for displaying long outputs (commit logs, blame/bless output, etc.). You can set it to any program that can read from stdin. When GIT needs a pager it tries these in order: 1) the one specified in the environment variable GIT_PAGER, 2) the one set in this option, 3) the one specified in the environment variable PAGER, 4) it tries to call `pager` directly (which is usually an alias for more or less).

Fetch and Pull without Remotes

Let's start easy: this is just a reminder after mindlessly setting up hundreds of repositories and their remotes -- you do not have to do that for a one off pull...

Normally you give git pull no parameter to pull from the upstream of the current branch, or you give it the alias name of a remote to pull from. Optionally you can also give it a branch from which to pull. What gets forgotten is: you can give it a URL instead of the configured name of a remote - this URL does not need to belong to a configured remote.

Some examples:

git pull
retrieves the upstream remote and branch of the current branch. Merges it to the current branch.
git pull origin
retrieves the named remote (origin) and a branch that has the same name as the current branch. Merges it.
git pull origin master
retrieves the named remote (origin) and the named branch (master) and merged it with the current branch. The specified branch can also be a tag or any other refspec (refs/...).
git pull git://example.org/home/git/example.git master
retrieves from the specified URL instead of a configured remote. The branch is optional and can also be any refspec or tag.

The fetch command takes the same kind of parameters. The difference is that it does not automatically merge with the current branch - instead it saves the retrieved revision as the temporary refspec FETCH_HEAD - you can check it out, merge it, whatever you need...

For example:

git fetch git://example.org/home/git/example.git master
git checkout -b newbranch FETCH_HEAD

First fetches the master branch from the given GIT URL (git fetch ...) and then checks it out (git checkout ...) into a new local branch (-b newbranch).

Moving and Copying Repositories

This question comes up every now and again: how can you move/copy a GIT repository from A to B?

The answer is as hard to find as it is easy: just copy it with whatever tools are locally available. The .git directory with all the meta data and packed copies of revisions has the exact same format no matter what the operating system is or on what path it finds itself.

I even regularly copy local checkouts from a host computer to a VM running a different operating system or vice versa - just using Windows Explorer, Midnight Commander, cp, copy or whatever is at hand. The only thing you have to be aware of is a possible change in line ending conventions and/or a few left-over build-files from the original directory. There are two very easy GIT commands available to rectify any problems:

git clean -dfx
removes all files that are not known to GIT, meaning all remnants of earlier builds are removed (unless someone accidentally checked them in).
git reset --hard
resets all files known to GIT to the exact state that you would expect if you checked them out freshly. This removes any local changes that have not been checked in yet (so if you need them: check in before you copy!) and it corrects line endings to the local convention of the operating system (make sure you execute this command from the target OS, otherwise the wrong convention may be used).

It's in the Mail!

From time to time you'll find yourself isolated enough that you have a need to send GIT commits either as e-mail or have to transport them via sneakernet (i.e. stored on a USB stick). There is an official way and a better way to send patches via e-mail or as file (attachment). The official way is to reformat each commit as a patch-mail and then re-apply each of those at the target site. The less official way is to use bundle files.

The official way first: the sending developer uses git format-patch to transform a range of commits. This creates one file for each commit that is formatted like an email in standard Unix mailbox format (RFC822 format). If you are on a Unix box and you are a commandline wizard you can feed those files directly to sendmail and send them on their merry way. The receiving developer stores those mails in a Unix mailbox (or maildir folder) and uses git am to apply each of them to his/her current GIT tree. The main properties of this precedure are:

it is easy to use if you are a Unix commandline guru and want to script it
the receiver can decide whether he/she wants to have the same history or whether to apply the patch at a different parent commit
it is very likely that the Commit-ID of each commit changes after applying it (the mail-transports and -clients may have changed some formatting or line-ends, history is probably different,...)
you may have to change settings of your mail client for this to work at all (GMail, Thunderbird, KMail) or it may just not be possible to get it to cooperate (Outlook)

Some project maintainers may like this procedure because it is easy to change history, others may loathe it for exactly the same reason. There are plenty of examples on how to use this on the net, so I won't spend too much time here...

Now the "better" way: bundles. Bundles are an export format of GIT - they contain commits and references that can be used to recreate the repository from which they came. Bundles can be restricted to only contain the stuff needed to get the difference between two commits. The bundle file can then be mailed as a binary attachment or be transported via USB stick.

if you do not want to include the currently checked out version (HEAD), check out whatever is the latest one you want to send; for example if you want to go 3 steps back: git checkout HEAD~3
before you do this: commit or stash your current uncommitted work, so that you are free to jump around in your repository
check (e.g. with git log) what the latest version is that you know for sure that the receiver already has - it does not hurt (except in terms of a few kB) to include more commits than necessary
create the bundle (e.g. if you know that the receiver already has version a1b2c3d):
git bundle create mybundle.git a1b2c3d..HEAD
it is important that the last component is either a branch name, tag, or implicit refspec (such as HEAD) - because GIT refuses to create the bundle otherwise
restore your tree to its previous state (e.g. if you worked on the master branch with git checkout master and/or if you stashed some changes with git stash apply)

This leaves you with a bundle file on its way to the receiver and the repository back in its original state. The receiver then uses the bundle file as if it were a normal repository path or URL:

make sure the receiving repository is in a clean state (no uncommitted changes)
pull from the bundle: git pull mybundle.git HEAD

Pulling from the bundle works the same way as pulling from a repository - it merges the content of that bundle with your local branch. If you fed someting different than "HEAD" into the bundle you have to use the correct name - e.g. if you created it with a1b2c3d..master you have to use "master" instead of "HEAD" in the pull command. A few things can happen:

if everything goes exceedingly well: your local branch will show the HEAD revision from the bundle as your new HEAD and you'll see the same history as on the remote site
if your local repository already knew a few of the ancestors stored in the bundle this is no problem: the pull command will just skip them and start with the first unknown revision
if you already did some local development: as long as there is a common ancestor the pull command will merge the two lines of development - you'll see a merge-commit as your new HEAD and you'll see both the local history and the bundles history in your new merged history
if there are some revisions missing from the bundle that do not exist locally, pull will refuse to merge - you'll have to redo the bundle and include a common ancestor this time

A note for power users: you can store multiple branches in a bundle, simply by mentioning them during creation of the bundle. Since you can use a bundle the same way that you can use a normal repository you can do this to store changes in several branches and merge them separately by using multiple pull commands.

GIT Backup

The first thing to do is: relax - you probably already have plenty of backups lying around. Every clone of a repository is a backup that can restore everything that it contains to a fresh "central" repository. So if you have one central repository and five developers who are working locally - then you have five backups of that central repository. Since developers are kind of specific about what they do, you may miss a branch or two though - so let's solve this problem...

Direct copy: since the GIT repository format is the same regardless of platform and location on the file system - the easiest way is to simply use the default backup facilities of your operating system. If there is a time at which developers usually do not push to the repository (e.g. at night) you can safely copy the repositories into your backup.

Otherwise it might be a good idea to block write access to the repositories during backup (read access does not change anything) - normally nothing bad will happen, but depending on the file order and speed of the backup system the backup may contain corrupted refspecs (branches pointing to commits missing from the backup). Finding running push operations is rather simple: just scan for processes with the name git-receive-pack or (depending on the GIT installation) git with the parameter receive-pack. There are two ways of blocking new push operations: either stop all network services that allow writing GIT access (this probably includes SSH) or use a hook to stop the updates. Each repository to be included in backups would then need a pre-receive hook that returns with an error, for example:

#!/bin/sh

if test -f /git/backup-running ; then
  echo "Sorry, Backups are running. Try again later."
  exit 1
else
  exit 0
fi

This example would imply a backup script like this one:

#!/bin/bash

#make sure GIT does not write
touch /git/backup-running

#wait for up to 2 minutes for receive-pack to terminate
#otherwise do not panic too much...
for i in `seq 1 12` ; do
  if pgrep git-receive-pack >/dev/null && pgrep -f "git receive-pack" ; then
    sleep 10
  else
    break
  fi
end

#run backup
tar cfz /backups/gitrepos.tgz /git/repositories

#done
rm -f /git/backup-running

In fact the gitolite environment uses a similar technique disabling write access when a file called .gitolite.down exists in the main directory.

To restore repositories you simply copy them back from the backup.

Using GIT itself: as mentioned above - each clone of a GIT repository is itself a backup. This can be used to make consistent backups without disabling write access during the backup process. For backups bare repositories are enough - we do not need a checkout. Both the clone and push commands have a --mirror parameter that can be used to create exact replicas of repositories. For example if all our repositories are directories whose names end in .git and exist in /git and we want to copy them to a disk mounted to /git_backup we can use a not overly complicated backup script:

#!/bin/sh

#find all repositories and iterate them
cd /git
for repo in `find . -d -a -name '*.git' ; do

   #make sure the target exists
   if test ! -d /git_backup/$i ; then
     git clone --mirror $i /git_backup/$i
   fi
   
   #mirror to it
   ( cd $i ; git push --mirror /git_backup/$i )
end

To restore from this backup you simply copy the repositories back to the original path - either with operating system functionality or with the exact same clone command as used above. In any case you will end up with a bare repository - if your original was a normal repository with checkouts do these steps:

enter the repository
create a .git directory
move all files and sub-directories into .git
execute git config --local --bool core.bare false - this converts it to non-bare
execute git reset --hard to get a clean checkout of the master branch

Using bundles: bundles can be used in a very similar way to mirrored repositories, but they are easier to handle, since they are simple files. On the other hand bundles may be a bit more demanding to create, since they are recreated each time, while repositories only sync their differences.

#!/bin/sh

#find all repositories and iterate them
cd /git
for repo in `find . -d -a -name '*.git' ; do
   #create bundle
   ( cd $i ; git bundle create /git_backup/$i.bundle --all )
end

The --all option tells the bundle command to include all branches (even those under remotes/*) and all tags. You can restore the repository with the clone command above:

git clone --mirror /git_backup/myrepo.git.bundle /git/myrepo.git

Again, you'll end up with a repository that is an exact copy of the original and is a bare repository. Use the above recipe to convert it to a non-bare repository if you need it to be non-bare. If you only need parts of the backup then you can use the pull command shown in the "It's in the Mail" section above.

Combinations: once you have copied the original repositories into clones or bundles you can also use operating system methods for backup without the need of blocking write access. The downside is of course that you have to create an extra copy of the data.

I Need a Shrink!

GIT is very efficient when it comes to disk space usage, so it is usually not necessary to force it to conserve more space. However, here are a few hints on how to do this if you desire to try.

Local clones: not an additional command, but a nice feature to know about. If you clone one local path to another and both paths are on the same disk then GIT will use hardlinks instead of copying each object in the .git directory. This saves most of the space necessary to make a complete copy of the repository and it is a bit faster than copying. After the clone those two repositories will diverge and new objects (changed files, commits, tags, ...) will not be shared, even when pulled from the other local repository. Note: this only works if you use path names, it does not work for file:///path/to/repo style URLs.

Unifying shared objects: if two repositories are on the same disk you can ask GIT to try to share their disk space. The command git relink /path/to/repo1 /path/to/repo2 will compare all objects stored in those repositories and if they are identical save space by replacing one copy with a hardlink.

Cleaning the checkout: the git clean -dfx command allows to clear away everything in a checkout that is not known to GIT. Be careful with this command: if you simply forgot to add a file to GIT before you call this command it will be lost. This can however be useful to create a "clean slate" before recompiling a big project. The options are:

f	since the clean command is dangerous it requires the user to use force, -f stands for force
d	tells GIT to remove unknown directories as well as unknown files
x	normally GIT completely ignores files that match the .gitignore file - in the case of clean this means it forgets to remove them, -x tells GIT to also remove files matching .gitignore

Toss the garbage: over time GIT repositories accumulate garbage. Abandoned commits, forgotten files, branches that are no longer tracked locally (but whose commits are still stored locally), etc. GIT automatically removes this garbage periodically, but you can enforce this by calling git gc manually. If you work on an inefficient file system it may also help to call git repack - this command moves all objects into pack files (files containing many objects in a very space efficient format) - note that this may clash with the relink command mentioned above, since packs are harder to share.