Keeping $HOME (or /etc) in Subversion

(1.1 or newer recommended)


General Approach

Keeping your $HOME in subversion accomplishes three primary goals.
  • Synchronization: First, it allows you to easily synchronize common files between computers or accounts. Subversion is not the only tool for this, but is the most appropriate for some types of use, such as $HOME. For other uses, I recommend unison. Unison is a great tool for synchronizing large files such as music. It also helps with trees which don't benefit from history, or can change a great deal in a short time, such as Maildirs.
  • Backups: Second, it acts as a distributed backup mechanism. The best backup systems have three properties: They are easy, so you actually do the backups regularly. They are spread across several machines or locations, so a given disaster doesn't knock them out. And they are tested on a regular basis, to make sure the backup works. You will get all three, as a side effect of using subversion. Just make sure to "svn commit" every day, and you can sleep safely with the assurance that you won't lose your data. You should still probably back up the repository, so you won't lose your history, but that is merely an extra precaution.
  • History: Third (and probably least important), it provides revision control so you can retrieve older versions. This can be useful for the same reasons it helps with source code, but mostly I find I use it to "safely" delete files without worrying that I might need them again someday. If I change my mind later, I can still get the file back. It helps me keep my $HOME clean.
Additionally, the approach outlined here attacks a common problem. Though you will have a lot of common files on different machines, you will also have files unique to each machine. This makes matters much more complicated. You may have a large set of files on your primary desktop, with a subset of those on your notebook, and even fewer on machines owned by clients. So, you will need a way to specify which of the common files you want on each machine. Also, each machine may have configuration which is completely unique, and you will need a way to store that sort of data.

Scope

The focus here is on the actual home directory, such as /home/selene/. Most of the files there will be "dot files", or configuration for various programs. Thus, the approach should apply equally well to /etc/, because it has a similar purpose, layout, and problem set.

To keep other subdirectories in subversion (a useful practice), you should probably create separate repositories. For example, I keep $HOME/work/ in its own repository, and $HOME/src/ in another, and otherwise create repositories (or modules within repositories) as appropriate to handle new content and projects. This is a matter of style, though. Some people prefer to keep everything in the same repository, which has various benefits and drawbacks.

For other directories, I have several repositories in an "etc" group, quite a few repositories in a "src" group, and a few web sites in a "www" group. I put my $HOME into a separate repository called "home", which is much more complex than the others. The rest of this article focuses on the "home" repository.

Initial Setup and Layout

The standard subversion repository layout works well for $HOME data. It uses three main directories: trunk, branches, and tags. Each one will become useful, in that order, as your $HOME repository spans more and more machines. For one or two machines, you may only need the trunk. For a small collection of three or more, the branches become useful. And, for dozens to hundreds of machines, you may want to tag specific versions for deployment. I will focus on a small handful of machines here, using branches and the trunk.

Subversion does not require that you use trunk, branches, and tags for your top-level repository entries. You could just as easily call them modules, hosts, and versions. Feel free to do so; I use the former terms here mostly due to convention.

Inside of the trunk, you should set up modules which can be checked out individually. Examples include shell config, email config, X11 config, and so on. Then, in the branches, set up one branch for each machine (or each machine class) you plan to use. These branches will access common files via the svn:externals mechanism (see below). As for tags, you probably won't need them. They are used to mark particular versions which are more important than the rest -- milestones of a sort. This is great for source code releases, but most people don't want to "release" their home directory.

An "external" item is like a symlink, except it references another part of the repository (or a completely different repository). And, due to a limitation in subversion, each "external" item must be a directory; it does not allow you to link to individual files yet. As a consequence of this, I use a lot of symlinks.

Before getting into details, you should know that things will probably get messy. People tend to make a mess of their $HOME anyway, and the problem is compounded by the wide variety of approaches to configuring programs. Subversion helps reduce the mess, but not enough to make it particularly tidy.

Trunk Layout

The following is a subset of my $HOME /trunk:
X/
X/.Xmodmap
X/.Xresources
bin/
bin/myscript1
bin/myscript2
bin/local -> ../.local/bin
sh/
sh/.zshrc
vim/
vim/.gvimrc
vim/.vimrc
vim/.vimrc-color
Notation:
  • -> represents a symlink.
  • external -> represents a svn:external property.
This example trunk contains four modules: X, bin, sh, and vim. I have taken my "dotfiles" and grouped them into several modules so that I can incorporate them into branches individually.

The bin/local entry is somewhat special. It assumes that there will be a .local/bin directory in each branch, and links to it. This allows me to place bin and bin/local in my $PATH, and get to both common scripts (in bin) and branch-specific scripts (in bin/local).

Branch Layout

The following is an example branch, /branches/host1:
.common/
.common/sh/ external -> /trunk/sh/
.common/vim/ external -> /trunk/vim/
.local/
.vimrc -> .common/vim/.vimrc
.vimrc-color -> .common/vim/.vimrc-color
.zshrc -> .common/sh/.zshrc
.zshrc.local
In .common/, this host has two modules checked out: sh, and vim. Several symlinks are then used to put the module contents into the actual $HOME directory. The last file, .zshrc.local, is just a regular file. It is still under revision control, but is only visible on this particular host.

The externals may not be the most intuitive thing to configure, so I'll show an example of how to set up /branches/host1. This is how you could initially create the branch:
	% cd
	% svn mkdir https://svn.example.com/home/branches/host1 -m "new host"
	Committed revision 513.
	% svn co https://svn.example.com/home/branches/host1 .
	Checked out revision 513.
	% svn mkdir .common .local
	A         .common
	A         .local
	% cat > ext
	sh	https://svn.example.com/home/trunk/sh
	vim	https://svn.example.com/home/trunk/vim
	% svn propset svn:externals -F ext .common
	property 'svn:externals' set on '.common'
	% rm ext
	% svn up
	Fetching external item into '.common/sh'
	...
	Fetching external item into '.common/vim'
	...
	Updated to revision 513.
	% ln -s .common/vim/.vimrc ; svn add .vimrc
	A         .vimrc
	% ln -s .common/vim/.vimrc-color ; svn add .vimrc-color
	A         .vimrc-color
	% ln -s .common/sh/.zshrc ; svn add .zshrc
	A         .zshrc
	% touch .zshrc.local ; svn add .zshrc.local
	A         .zshrc.local
	% svn ci -m "import basics into host1"
	...
	Committed revision 514. 
Or, if you already have a minimal, generic host branch, you can also create new branches by copying the generic branch, instead of starting from scratch. This is done with commands such as:
	% svn cp https://svn.example.com/home/branches/minimal \
		https://svn.example.com/home/branches/newbranch
	% svn co https://svn.example.com/home/branches/newbranch ~ 
Then customize the new branch however you like.

Note that all svn:externals entries must be complete URLs. They cannot (yet) be relative paths within a repository. Also, you would normally want to use svn propedit instead of svn propset, but the latter was easier to demonstrate.

This is another branch, /branches/host2. It is more complex, and somewhat more realistic:
.common/
.common/X/ external -> /trunk/X/
.common/bin/ external -> /trunk/bin/
.common/sh/ external -> /trunk/sh/
.common/vim/ external -> /trunk/vim/
.local/
.local/bin/
.local/bin/myscript1
.local/bin/myscript2
.Xmodmap -> .common/X/.Xmodmap
.Xresources -> .common/X/.Xresources
.Xresources-local
.Xsession
.gvimrc -> .common/vim/.gvimrc
.gvimrc-local
.vimrc -> .common/vim/.vimrc
.vimrc-color -> .common/vim/.vimrc-color
.zshrc -> .common/sh/.zshrc
.zshrc.local
bin -> .common/bin/
todo.txt
This host has four modules checked out instead of two. A real host would probably have at least a dozen, making for a much more complicated setup.

Other than the increased number of files, something to notice here is the convention of using a .local or -local suffix to indicate files which are host-specific. Most programs will let you include .foo.local (or similar) to incorporate another file in the configuration. This helps keep things organized, so you can put all the common settings in one place, and only configure further when you need something to be different than the rest of the machines.

One other notable detail is the arrangement of bin, .common/bin, and .local/bin. Sometimes it is desirable to put a host-specific item in a directory which otherwise contains only shared files. One way to do that is to place a symlink in the common area (trunk) which links back to a local (branch) file, and expect each branch to have that file. That's what I've done with host-specific scripts. I created bin/ as a symlink to .common/bin/, which contains a symlink .common/bin/local -> .local/bin/. Then I can put both bin/ and bin/local/ in my $PATH, knowing that shared scripts will be in bin/ and only host-specific ones will be in bin/local/. This could also be accomplished by simply using bin/ (external) and lbin/ (local), but I include the more complicated setup here for the purposes of illustration.

Daily Use

I use a small handful of Subversion commands on a daily basis, to update files and commit changes. A typical work session may look something like this:
	% svn up
	Fetching external item into '.common/bin'
	External at revision 514.

	At revision 514.
	% vim todo.txt
	% svn stat -q    # [q]uiet; does not show unversioned files
	M         todo.txt
	M         .zshrc.local

	Performing status on external item at '.common/bin'
	A         newscript
	% svn ci todo.txt -m "finished project foo"
	Committed revision 515.
	% svn ci .zshrc.local -m "added new mplayer alias"
	Committed revision 516.
	% svn ci .common/bin/newscript -m "makes some random task easier"
	Committed revision 517.
	% svn stat -N    # [N]o recursion; only current directory.
	?         tempfile
	% rm tempfile
	
I also use svn diff a lot, before checking in changes. I like to see exactly what changed before I write a commit message.

Any time I access a new host, I either make a new branch for it, or check out a copy of an existing branch. I use one "generic" branch for general shell usage on untrusted machines, and that branch gets checked out onto lots of different hosts. However, I also have three branches which are only used on one host each. My desktop, notebook, and home router all get their own host branch.

Backups

Using Subversion on multiple hosts provides an automatic backup of shared files, simply because files exist on more than one host. However, in the case that your repository server died, this would leave you with only the latest revision of each file -- you'd lose your history information, and have to rebuild the repositories (which takes a while).

So, an additional layer of backups is highly recommended. If you have more than one hard drive on your repository server, you can simply use svnadmin hotcopy to back up your repositories from one drive onto another.

Or, for more safety, you can copy your repositories to a separate host every night. There are many ways to do this, including "svk" (a layer on top of svn, providing bitkeeper-like push and pull features, and much more), shell scripts to provide push/pull features, and more.. (TODO: add links to various backup solutions!)

Personally, I'm using rsnapshot to back up my repositories. This is not recommended, though, because copying a live repo is not guaranteed to work. However, when using fsfs repositories, it works well enough for my purposes. There is always a small chance that rsync will copy a half-finished transaction, leaving the backed-up repo in an unusable state, but this is easily remedied if necessary.

BTW, do not try to use rsync, tar, or other similar tools to back up a svn repo which uses the bdb back end. It won't usually work.

Exceptions

I don't keep everything in Subversion. It works well for lots of things, but not everything. Here are some of the reasons I don't keep some files in svn:
  • Too big: I don't keep my music or video collections in svn, because they're simply too large. I don't have enough disk space to store all my anime on hard drives, much less three copies of it (one in the repo, two in each working copy). Besides, the revision features of svn would be of very limited value for these files.

    I find rsnapshot useful for doing backups of large file collections, and unison useful for synchronizing the files between hosts. For example, I have a zsh alias set up to make unison synchronize all music by a specified artist, so I can copy parts of my music collection to my notebook and back without having to copy the whole set.
  • Too many: I don't keep my email in svn either. I use Maildirs, which means one file per message. I receive at least 50 messages per day, on a slow day. Keeping it in svn would require either manually adding all those messages, or making a script to do it for me. And even worse, it would make moving messages around more difficult. My email client could probably be configured to "svn mv" files instead of using its regular move method, but that seems like more effort than it's worth. And again, history is of limited value here. I don't go back and edit messages later, so the files never change.
  • Not valuable enough: I don't keep system files or temp files in Subversion. There's no point in keeping /usr/bin/emacs under revision control, for example. And I have quite a few "scratch" files laying around which never make their way into svn. I don't bother tracking any files I don't care about.
If you're going to keep your files in Subversion, it's important to know what not to save.

Misc

Don't expect to put $HOME in svn in one day. If you have a lot of files, it may take quite a while to sort through all of them. I've had the same /home filesystem since 1996, and started keeping it in svn during 2004. Eight years of heavy use made quite a mess, so it has taken months to get everything organized. And it is not something you do once and never worry about again; organization is a continuous process, and you will need to spend time maintaining it.

I find the following shell aliases useful for making svn easier:
  • svex: aliased to svn pe svn:externals
  • svig: aliased to svn pe svn:ignore
  • svst: aliased to svn stat -q | grep . | grep -v ^Perf
I suggest keeping a set of projects to work on when you are bored, feeling unmotivated, or simply don't know what to do next. These should be things which are relatively easy, but perhaps time-consuming. Maintaining your $HOME repository is a good example. It can give you something useful to do between other tasks, but is unimportant enough that it doesn't matter if you let it slide for weeks or months at a time. Another task I treat in this manner is upgrading packages. It's a good idea to do periodically, but I only do it when I'm procrastinating or have nothing better to do.

The main reason for using subversion 1.1 here is that it supports symlinks. Otherwise, older versions such as 1.0 will work fine. Older versions just won't give you versioned symlinks, which is somewhat of a problem for complex $HOME layouts.

Some programs have difficulty with symlinked config files. X-Chat, for example, will delete its config file and then create a new one when it saves its settings. I haven't found a way to deal with this other than making some custom scripts to fix the files periodically.

This article may seem very similar to Joey Hess' article on the same topic. That's because my setup was inspired by his earlier article about keeping $HOME in CVS. I used some of the ideas in his first article, but used Subversion instead of CVS. At the time, there was not an article about what I did, so I wrote one. That was on 2004-10-01, though I have since updated it. Joey published a similar article on 2005-01-06, which I didn't see until 2005-04-15. Hopefully that explains any overlap.
Last modified: November 19, 2010 @ 2:52 MST
Copyright (C) 1996-2024 Selene ToyKeeper