rsync: How to efficiently mirror websites, directories, and filesystems.


From the manpage of rsync(1):

rsync is a program that behaves in much the same way that rcp does, but has many more options and uses the rsync remote-update protocol to greatly speed up file transfers when the destination file is being updated.The rsync remote-update protocol allows rsync to transfer just the differences between two sets of files across the network connection, using an efficient checksum-search algorithm described in the technical report that accompanies this package.

rsync is not any file transfer program. It is an intelligent file transfer program, used widely to mirror websites, directories and entire filesystems. What makes rsync superior to other file transfer programs, like rcp and scp, is its ability to efficiently compare the differences between two files and to copy only if either has been updated.

At work, I do all my development inside a Virtual Machine running Slackware. Any code I write at work and, more importantly, as part of work, I keep in a separate directory cleverly named “work” inside the /home directory. With no back-up server yet in place, and fearing the day the host OS would crash or the Virtual Machine image would get corrupt, I brought my home laptop at work to mirror, at least, the work directory. I used EverythingLinux.org’s simple-to-follow tutorial to set up rsync on the Virtual Machine image, and call the rsync client from the laptop to back-up the work directory.

While I strongly suggest that both the manpage and the tutorial referenced be read thoroughly, I will, nonetheless, list down instructions to quickly get rsync up and transferring files.

The enviornment I’m using is laid out like this: I wish to make a copy of my work directory, /home/work/, on a Slackware box, bound to the IP 192.168.1.10, over to my laptop, also running Slackware, and bound to 192.168.1.247. rsync is installed on both systems. First, I need to set up rsync daemon on 192.168.1.10 by creating rsyncd.conf, the file from which the rsync daemon reads various configuration settings. The manpage of rsyncd.conf(5) thoroughly documents all configuration variables, parameters, directives, and also contains useful examples. I set up /etc/rsyncd.conf to mirror the following config:

motd file = /etc/rsyncd motd
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsyncd.lock

[work]
       path = /home/work
       comment = Code repository from Work
       uid = ayaz
       gid = ayaz
       read only = yes
       list = yes
       auth users = ayaz
       secrets file = /etc/rsyncd.scrt

A brief explanation is in order. “path” points to the directory to be mirrored. “uid” and “gid” are the user and group IDs, respectively, under which file transfers will take place. I don’t want /home/work to be altered by any client through rsync, so I have set it to be read only. I have also set “ayaz” to act as the user allowed to connect to the rsync server. A user:pass pair for the allowed user is put in plain-text in the “secrets file”. If anonymous rsync is desired, then the “auth users” and, consequently, “secrets file” directives should be taken off.

These are a small subset of the options rsyncd supports. The manpage, rsyncd.conf(5), documents every aspect and option of rsyncd in detail. Finally, to execute the rsync daemon, I call rsync with the “–daemon” flag. rsync’s daemon runs in the background, on port 873 (default).

Now, over to the client side, the laptop. I create a directory, /home/work, and modify its user and group ownership to match the “uid” and “gid” set in rsync daemon’s config file. Running the rsync client is simple. Before running it, I’d suggest a thorough read of rsync’s manpage, again, rsync(1), to understand the various switches it supports and the different ways in which it can be used.

I call rsync like this:

$ rsync -avrzogtp –rsh=ssh –exclude “*.~” –exclude “linux/” 192.168.1.10:/home/work/ /home/work/
Here’s a quick description of the various switches used. “-a” does the archiving. “-v” turns on verbosity in output. “-r” tells rsync to get into recursive mode while traversing directories. “-z” enables compression. “-p”, “-o”, and “-g” preserve, in that order, the permissions, owner and group information of files and directories to be copied. “-t” preserves the file and directory timestamps. I don’t wish to send data in plain on the wire, so, I order rsync to use “ssh” instead to tunnel data. Like tar, rsync supports –exclude switch. I tell it to exclude any files with a trailing “~” character in their names — Vim rather stubbornly does tha –, and to exclude the entire “linux/” directory. Finally, I specify the source host and source directory, followed by the local directory where the data should be moved.

And that’s it. The first time around, I was nervous running rsync. Just to be sure it wasn’t going to do anything crappy on my laptop’s filesystem (which shamefully itself isn’t backed-up yet), I ran rsync on my laptop with the “-n” switch. It does a dry-run only, in that it only generates a harmless list of files it will copy from the source system and quits. Again, read the manpage for many more options.

12 thoughts on “rsync: How to efficiently mirror websites, directories, and filesystems.

  1. That rsync command listed here doesn’t appear to use the rsyncd server. If you wanted to use the rsyncd server there would need to be an additonal colon after the host.

    For example:
    rsync -avrzogtp –rsh=ssh –exclude “*.~” –exclude “linux/” 192.168.1.10::work /home/work/

  2. Thanks for the comment, Paul.

    If I have understood the following section in the man page of rsync correctly, that the variant of the command with the single colon *does* indeed contact the rsyncd daemon, but not directly. Instead, it uses a remote-shell program as a transport:

    There are two different ways for rsync to contact a
    remote system: using a remote-shell program as the
    transport (such as ssh or rsh) or contacting an rsync daemon
    directly via TCP. The remote-shell trans-port is used
    whenever the source or destination path contains a single
    colon (:) separator after a host specification. Contacting
    an rsync daemon directly happens when the source or
    destination path contains a double colon (::) separator after
    a host specification, OR when an rsync:// URL is
    specified (see also the “USING RSYNC-DAEMON FEATURES VIA A
    REMOTE-SHELL CONNECTION” section for an exception to this
    latter rule).

    As a special case, if a single source arg is specified
    without a destination, the files are listed in an output
    format similar to “ls -l”.

    As expected, if neither the source or destination path
    specify a remote host, the copy occurs locally (see
    also the –list-only option).

  3. I am afraid to copy my backup server into our web server by accident. where do you exactly run this rsync from? the host or the guest? Host being the server you want to back (web server) and the guest being the backup server?!

    Also , If I run the rsync from the root / can it mirror our web server?
    ( also do I want to Exclude the files specific to each machine like Hostname, interfaces file under /etc/networking/ the whole /sys directory?) Or should I write over every thing)?

    Thank You !

  4. Pingback: Mirroring files on remote servers - IT Tips

  5. After checking out a few of the blog articles on your site, I honestly appreciate your way
    of blogging. I book-marked it to my bookmark webpage list and will
    be checking back in the near future. Please check out my website as well and let me know your opinion.

  6. Attractive section of content. I just stumbled upon your
    website and in accession capital to assert that I get in fact enjoyed account your blog posts.
    Any way I’ll be subscribing to your feeds and even I achievement you access consistently rapidly.

  7. One person might need the coffee maker which applications
    a particular type of coffee and additionally dispenses any cup at
    just a time, whilst yet another may choose the coffee maker that might brew the pot adequate for an entire celebration.
    I would be extremely amazed should you decide could notice this particular West
    fold Quick – Drip 10-cup coffee maker for purchase
    anyplace, but you may well be confident in order to find a any online,
    possibly in craigslist, e – Bay, or perhaps Bid – Cactus.

Leave a comment