Backing up TB’s of data on Mac OS X

The problem

I recently had to increase my disk storage capacity and I took this opportunity to completely revise my backup strategy. I used to only rely on Apple’s Time Machine using one (big) disk located right next to my computer, but it turns out not to be sufficient for a totally handy and secure backup.

The main problem is that I have several “layers” of files to back up with different sizes, backup frequencies and importance:

  • OS and system files that run my machine: a few hundred GB’s ideally backed up every day but not critical,
  • documents and scripts resulting from my research activities: a few GB’s, highly important (worth years of research), and needed to be backed up as often as possible and on physically separated locations,
  • large data files (research products, produced from scripts or downloaded from elsewhere), TB’s of data, backed up once in a while, and not critical.

For instance it would be useful to use different configurations of Time Machine for each layer, but I’m not aware of such flexibility. Fortunately, Mac OS X comes with two very convenient tools which are rsync and launchd. The former allows to synchronise two distant machines and the later to automatically schedule the process.

So I came up with a twofold backup strategy:

  • Time Machine backs up my system files and documents/scripts on a local disk, every hour,
  • an automated bash script using rsync synchronises my documents/scripts and large data files on a distant disk every night

I use three local disks:

  • HD1 (1TB): contains system files + documents/scripts
  • HD2 (1TB): for HD1 backup
  • data (4TB): to store large data files

and a disk of 6 TB on a distant machine (in the lab server, different building).

Backing up system files + documents/scripts

The first step is to put the large data files on the “data” disk (/Volumes/data) and create a symbolic link on the “HD1” home directory (/Volumes/HD1/Users/coupon):

ln -s /Volumes/data /Volumes/HD1/Users/coupon/data

HD1 is then backed up with Time Machine on HD2, but the “data” folder is excluded: to do this I went to Time Machine Preferences -> Options -> clicked on “+” to add the “data” folder.

This way I may use the nice feature of Time machine to offer hourly backups, in case I need to restore a previous (recent) version of a document I’m currently working on. It also permits to quickly restore the whole system in case HD1 crashes and needs to be replaced.

Backing up large files

Now we need to address the two remaining issues: backing up the data and the documents/scripts on a physically separated location. To do that I wrote a bash script wrapping up the rsync command (“ojingo” is the name of my local machine):

#! /bin/sh
# This scripts synchronises ojingo's home directory with
# distant machine using rsync
# Excluded directories are listed in $HOME/local/bin/backup_ojingo_exclude.list
# IMPORTANT: make sure to use 'dir' instead of 'dir/' for rsync SRC

# Options
MYNAME=`basename $0`             # this script's name
USER=`whoami`                    # user name
LOG=/tmp/${MYNAME}.log           # log file to keep track of stderr and stdout messages     # to send log file if error occurs
MOUNT_DIR=/local/path            # mounted directory on local machine
BKP_DIR=$MOUNT_DIR/backup_ojingo # where the backup goes
# ---------------------------------------------- #
# Help message
# ---------------------------------------------- #
if [ $# -lt 1 ]; then
    echo '\n\n      Backup data from ojingo\n\n'
    echo "Usage: $MYNAME option"
    echo "mount: mount distant machine [requires root provileges] "
    echo "backup: backup data from ojingo\n\n"


# ---------------------------------------------- #
# Option "mount": mount distance machine
# and create BKP_DIR as user
# ---------------------------------------------- #
if [ "$1" == "mount" ]; then

    mkdir -p $MOUNT_DIR


    # create $BKP_DIR only if mount procedure generates no error
    sudo mount  -o resvport -t nfs $DIST_MACHINE:$DIST_DIR $MOUNT_DIR && sudo -u $USER mkdir -p $BKP_DIR


# ---------------------------------------------- #
# Option "backup": back up ojingo's home directory
# ---------------------------------------------- #
if [ "$1" == "backup" ]; then

    # Set list of directories to back up in $HOME
    DIR=( '.' )

    # Set error variable to 0

    # Check if disk is properly mounted and if $BKP_DIR exists. Otherwise send an alert by email
    if [ ! -d $BKP_DIR  ]; then
	echo "$BKP_DIR is not mounted" | mail -s "Ojingo's backup has failed"  $EMAIL

    # Start loop of directories to back up. Redirect messages to LOG file
    echo "Started backup on" `date` >  $LOG
    for(( i=0;i<${#DIR[@]};i++ )); do

	# Synchronise local and distant directories
	rsync -avzcupL --delete --progress --exclude-from=$HOME/local/bin/backup_ojingo_exclude.list \
	      $HOME/${DIR[i]} $BKP_DIR >> $LOG 2>&1  || ERR=1


    # If an error is generated during rsync command, send email
    if (( $ERR )); then
	echo "Error occured on" `date` >>  $LOG
	mail -s "Ojingo's backup has failed"  $EMAIL < $LOG

    echo "Ended backup with no errors on" `date` >>  $LOG


Here the NFS protocol is used so that once the distant disk is mounted on my local machine (option “mount”), no other action is needed during the synchronisation process. Of course, one may use other protocols, such as scp with SSH pair keys.

Then backing up my entire home directory becomes as simple as running this command: backup

All stderr and stdout messages are redirected to a log file, which is probably safer to check every time the script is run (and in the morning if done every night, see below). Also, in case an error occurs during the process, an alert is automatically sent by email to me, so I know if the backup did no go through.

The option --exclude-from=$HOME/local/bin/backup_ojingo_exclude.list means that all files or directories listed in this file will be excluded from the backup. I use this to exclude e.g. Applications, Music, or Pictures folders (these are backed up by Time Machine anyway and are not critically important on my professional machine).

The rsync options -avzcupL --delete are set to “mirror” the local file structure. It means that if a file is deleted on the local machine, it will be deleted as well on the distant machine. The -L option means that the symbolic links are followed and the data they point to will be backed up. This is especially important as, if you remember, the data folder is a symbolic link pointing to the “data” disk.

Finally, I need to automatically run this script every night. For this I use launchd (for those working on linux, it is the equivalent of cron). Before going on, I invite you to read this tutorial if you’re not familiar with launchd.

Since I have root access to my Mac, I had two options: running the launchd script as user or as root. I chose to run as root, as it may run even if I’m not logged in.

Here is my launchd script that runs every night at 3am:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "">
<plist version="1.0">

which I put in /Library/LaunchDaemons/ as root.

The UserName key allows to run the script as “user” (such that $HOME = /Users/coupon, etc.). For details about the other options see here.

Finally, to load the script, one has to run the following command (still as root):

launchctl load  /Library/LaunchDaemons/

and to test it, one may simply run:

launchctl start

If the computer sleeps at the time of execution, launchd will execute when the computer wakes up. To avoid that, one may schedule the computer to automatically wakes up right before in System Preferences -> Energy saver, or simply set the computer to never sleep (which is the case for mine).

That’s it!

Backing up TB’s of data on Mac OS X

4 thoughts on “Backing up TB’s of data on Mac OS X

    1. I use a similar strategy for backing up over 10TBs of data with rsync. It takes about an hour to run through all of the directories that I am backing up, but anything that has changed or been updated is copied over to the backup location. I use cron instead of launchd to trigger it and then I have a bash script that captures the tail of the log and emails it to me in the morning so I can see if any errors happened overnight.

Leave a Reply

Your email address will not be published.

Scroll to top