Backing up TB’s of data on Mac OS X

The problem

I recently had to increase my disk storage capacity and I took this opportunity to completely revise my backup strategy. I used to only rely on Apple’s Time Machine using one (big) disk located right next to my computer, but it turns out not to be sufficient for a totally handy and secure backup.

The main problem is that I have several “layers” of files to back up with different sizes, backup frequencies and importance:

  • OS and system files that run my machine: a few hundred GB’s ideally backed up every day but not critical,
  • documents and scripts resulting from my research activities: a few GB’s, highly important (worth years of research), and needed to be backed up as often as possible and on physically separated locations,
  • large data files (research products, produced from scripts or downloaded from elsewhere), TB’s of data, backed up once in a while, and not critical.

For instance it would be useful to use different configurations of Time Machine for each layer, but I’m not aware of such flexibility. Fortunately, Mac OS X comes with two very convenient tools which are rsync and launchd. The former allows to synchronise two distant machines and the later to automatically schedule the process.

So I came up with a twofold backup strategy:

  • Time Machine backs up my system files and documents/scripts on a local disk, every hour,
  • an automated bash script using rsync synchronises my documents/scripts and large data files on a distant disk every night

I use three local disks:

  • HD1 (1TB): contains system files + documents/scripts
  • HD2 (1TB): for HD1 backup
  • data (4TB): to store large data files

and a disk of 6 TB on a distant machine (in the lab server, different building).

Backing up system files + documents/scripts

The first step is to put the large data files on the “data” disk (/Volumes/data) and create a symbolic link on the “HD1” home directory (/Volumes/HD1/Users/coupon):

HD1 is then backed up with Time Machine on HD2, but the “data” folder is excluded: to do this I went to Time Machine Preferences -> Options -> clicked on “+” to add the “data” folder.

This way I may use the nice feature of Time machine to offer hourly backups, in case I need to restore a previous (recent) version of a document I’m currently working on. It also permits to quickly restore the whole system in case HD1 crashes and needs to be replaced.

Backing up large files

Now we need to address the two remaining issues: backing up the data and the documents/scripts on a physically separated location. To do that I wrote a bash script wrapping up the rsync command (“ojingo” is the name of my local machine):

Here the NFS protocol is used so that once the distant disk is mounted on my local machine (option “mount”), no other action is needed during the synchronisation process. Of course, one may use other protocols, such as scp with SSH pair keys.

Then backing up my entire home directory becomes as simple as running this command:

All stderr and stdout messages are redirected to a log file, which is probably safer to check every time the script is run (and in the morning if done every night, see below). Also, in case an error occurs during the process, an alert is automatically sent by email to me, so I know if the backup did no go through.

The option --exclude-from=$HOME/local/bin/backup_ojingo_exclude.list means that all files or directories listed in this file will be excluded from the backup. I use this to exclude e.g. Applications, Music, or Pictures folders (these are backed up by Time Machine anyway and are not critically important on my professional machine).

The rsync options -avzcupL --delete are set to “mirror” the local file structure. It means that if a file is deleted on the local machine, it will be deleted as well on the distant machine. The -L option means that the symbolic links are followed and the data they point to will be backed up. This is especially important as, if you remember, the data folder is a symbolic link pointing to the “data” disk.

Finally, I need to automatically run this script every night. For this I use launchd (for those working on linux, it is the equivalent of cron). Before going on, I invite you to read this tutorial if you’re not familiar with launchd.

Since I have root access to my Mac, I had two options: running the launchd script as user or as root. I chose to run as root, as it may run even if I’m not logged in.

Here is my launchd script that runs every night at 3am:

which I put in /Library/LaunchDaemons/ as root.

The UserName key allows to run the script as “user” (such that $HOME = /Users/coupon, etc.). For details about the other options see here.

Finally, to load the script, one has to run the following command (still as root):

and to test it, one may simply run:

If the computer sleeps at the time of execution, launchd will execute when the computer wakes up. To avoid that, one may schedule the computer to automatically wakes up right before in System Preferences -> Energy saver, or simply set the computer to never sleep (which is the case for mine).

That’s it!

4 Responses to “Backing up TB’s of data on Mac OS X

  • Note that I haven’t tested all features yet.

    • I use a similar strategy for backing up over 10TBs of data with rsync. It takes about an hour to run through all of the directories that I am backing up, but anything that has changed or been updated is copied over to the backup location. I use cron instead of launchd to trigger it and then I have a bash script that captures the tail of the log and emails it to me in the morning so I can see if any errors happened overnight.

  • Secret user
    3 years ago

    Would be good to know if it works!

  • There are many 1 TB tape back up systems, many with very high speeds, assuming you can feed it data fast enough. I have to wonder though.. 20 TB for a single person?

Leave a Reply

Your email address will not be published. Required fields are marked *