[sllug-members]: easy backups

Scott Patten scott at pattens.net
Mon May 1 21:32:36 MDT 2006


Bart Whiteley wrote:
> On Mon, 2006-05-01 at 18:41 -0600, Mac Newbold wrote:
>   
>> Today at 6:05pm, Bart Whiteley said:
>>
>>     
>>> On Mon, 2006-05-01 at 17:50 -0600, Scott Patten wrote:
>>>       
>>>> I'd like to add a second vote for rdiff-backup.  It combines the
>>>> advantages of diff with those of rsync.  It will very efficiently
>>>> create
>>>> a copy of a directory and also keep compressed backups of the changes
>>>> that occurred  since the last backup.  When you want to restore a
>>>> file,
>>>> you just copy it.  When you want to restore an older copy then you
>>>> issue
>>>> an rdiff-backup command that decompresses and recombines the files to
>>>> give you the version that you are after.
>>>>         
>>> What are the advantages of rdiff-backup over
>>> http://www.mikerubel.org/computers/rsync_snapshots/ ?
>>>       
>> I'll answer a different question and say that rsnapshot has the advantage 
>> that every file in any of your snapshots is a complete file, usually a 
>> hard-link to identical versions of the same file. You can easily search 
>> your backups with grep or whatever you normally like to use, and restoring 
>> a certain version is as easy as doing a cp (I like cp -p too). The file 
>> permissions and ownership are perfectly preserved. The biggest advantage 
>> I've found in that is knowing that I can quickly and very easily do large 
>> sweeping restores in case of emergency. Once I know which version I want 
>> (like the most recent backup) it is extremely easy to get it and do 
>> whatever you want with it. Since it uses rsync, it also only transfers the 
>> files that have actually changed, and you can turn on compression for 
>> remote backups, so that it saves on bandwidth and time. (It doesn't store 
>> the files compressed, just gzips them during transfer over the network.)
>>
>> I suspect that an rdiff backup to a remote server would have to transfer 
>> at least the smaller of the two file versions to find the difference 
>> between them, and if so, it wouldn't provide any bandwidth/speed savings 
>> over an rsync/rsnapshot.
>>
>> The place it could provide an advantage is in the space required for a 
>> backup, especially if you have lots of large files that get small changes 
>> (like log files for example). The diffs could also be bigger than the file 
>> itself, though, if a majority of the file changed, since it shows the old 
>> way and the new way in the diff. And generally diffs are ridiculously bad 
>> with binary files, like images, compressed files, non-text documents (like 
>> the MS Office formats), etc.
>>
>> The rdiff method can make it more fragile too. This method also adds more 
>> complications for restoring or examining files in your backups, since 
>> they're not whole files, just patches, basically. To restore, you need to 
>> have a valid version of every patch in the dependency chain. I don't know 
>> how they do it in rdiff-backup, but if the full backup was on April 1st, 
>> and you wanted the version from April 27th, you might need 27 patch files, 
>> and if you didn't have all of them, perfect and without any corruption, 
>> you wouldn't be able to recover the file fully. If they do fulls monthly, 
>> an incremental to the full weekly, and an incremental to the weekly each 
>> day, then you'll only need 3 patches for everything to get back perfectly. 
>> All of this is avoided by storing only whole file versions with 
>> rsync-based methods.
>>
>>     
>
> That's what I was thinking.  As I read his description of rdiff-backup,
> I could only think of ways in which it was inferior to rsnapshot-like
> solutions.  I was just wondering if I missed something...
>   
Man you guys are amazing.  At least I read your article.  rdiff-backup
is in most respects rsnapshot-like.

If you're insane about speed then use rsnapshot.  If you're concerned 
about space or your time then try rdiff-backup.  By the way, 
rdiff-backup can save old versions as whole files, as diffs or as 
gzipped versions of the files or diffs.


 From the rdiff-backup site:

    * *rsync <http://rsync.samba.org>* - the inspiration for
      rdiff-backup. Although rsync and rdiff-backup do not share any
      code, rdiff-backup uses the rsync algorithm, invented by rsync
      author Andrew Tridgell.

      Compared to rdiff-backup, rsync is faster, so it is often the
      better choice when pure mirroring is required. Also rdiff-backup
      does not have a separate server like rsyncd (instead it relies on
      ssh-based networking and authentication).

      However, rdiff-backup uses much less memory than rsync on large
      directories. Second, by itself rsync only mirrors and does not
      keep incremental information (but see below). Third, rsync may not
      preserve all the information you want to backup. For instance, if
      you don't run rsync as root, you will lose all ownership
      information. Fourth, rdiff-backup has a number of extra features,
      like extended attribute and ACL suport, detailed file statistics,
      and SHA1 checksums.

    * *rsync-based scripts* - Because rsync does not save incremental
      information, it is usually inappropriate for backing up. There are
      several utilities which use the rsync binary, but keep old data by
      using rsync --link-dest option and rotating the destination
      directory.

      Compared to rdiff-backup, these are usually faster but use more
      memory and disk space. They make each increment appear as a
      separate complete directory, which is a neat feature. On the other
      hand, these will usually be missing the features that are missing
      from rsync (see above).

      Here are various programs which use the rsync strategy:

          o *Mike Rubel's rsync snapshots
            <http://www.mikerubel.org/computers/rsync_snapshots/>* - the
            original rsync hardlinking script
          o *rsnapshot <http://www.rsnapshot.org/>* - based off Mike
            Rubel's article
          o *Dirvish <http://www.dirvish.com/>* - perhaps the most
            feature-filled of these programs
          o *Backup Buddy
            <http://www.effortlessis.com/backupbuddy/others.php>* - a
            simple, easy-to-use option
          o *rsback <http://www.pollux.franken.de/hjb/rsback/>*
          o *Snapback2 <http://www.perusion.com/misc/Snapback2/>*
          o *rsync-incr <http://colas.nahaboo.net/software/rsync-incr/>*
          o *rsyncbackup <http://rsyncbackup.erlang.no/>*
          o *ccollect <http://linux.schottelius.org/ccollect/>*
          o *RIBS backup <http://www.ribs-backup.org/>*


  Features

For many people hard disks provide the form of persistent storage that 
is most readily available and cheapest per MB. I think that rdiff-backup 
is often the best way to back one hard drive to another.

    *

      *Easy to use:* In most cases, the command

           rdiff-backup dir1 dir2
          

      will work out-of-the-box to backup dir1 to dir2.

           rdiff-backup dir1 user at system::/dir2
          

      will backup dir1 to dir2 on a different system (provided
      rdiff-backup is installed on both systems). rdiff-backup also
      comes with a lot of up-to-date documentation
      <http://www.nongnu.org/rdiff-backup/docs.html>.
    *

      *Creates mirror:* rdiff-backup makes the backup directory into an
      almost exact copy of the source directory (the only difference is
      one extra subdirectory on the backup side). If you delete a file
      from the source directory you can simply copy it from the backup
      directory, use "find" or "locate" to find the file, or use any
      other familiar utility. Also, if the two directories are on
      different disks, you can recover almost immediately if the disk
      containing the source directory crashes, just by mounting the
      backup directory where the source directory used to be.

    *

      *Keeps increments:* Normally, with a mirror, any changes made to
      the source directory are immediately sent to the backup directory,
      and old changes are lost. rdiff-backup saves those changes in the
      form of reverse diffs, so you can recover the older form of the file.

      For instance, suppose last week you deleted half of some document,
      thinking that what you had written was garbage. Yesterday, your
      backup event ran, saving these changes. Today you realize that you
      were on to something and want what you deleted back. If you just
      mirrored, you would be out of luck, since the copy on your mirror
      would be the newer one. With rdiff-backup, the newer version would
      indeed be present, but in a special directory (rdiff-backup-data/)
      there would be a file that recorded this change. Running
      rdiff-backup on this file recovers the version from a week ago.

    *

      *Preserves all information:* Whether you restore from the mirror
      directory or from an earlier incremental backup, rdiff-backup will
      reproduce your files exactly as they were. Files missing at the
      time of backup will also be missing after the restore. Files hard
      linked when backed up will be hard linked after the restore.
      rdiff-backup also preserves permissions, user and group ownership,
      modification time, device files, fifos, and symlinks.

      Sometimes it is impossible for the information to be replicated
      exactly on the destination. For instance, ownership cannot usually
      be replicated without root access at the destination; windows file
      systems may not be case sensitive and have no ownership at all.
      rdiff-backup records file metadata in a separate file so that all
      information is preserved even if the destination file system is
      missing features.

    *

      *Space efficient:* Suppose you have a large database file that
      changes a little bit every day. A normal incremental backup would
      keep saving copy after copy of this database, wasting a lot of
      space. rdiff-backup uses librsync, which implements the same
      efficient diffing algorithm that rsync uses. It works on binary
      files as well as text, so only a fraction of the data in your
      database would be saved in each incremental backup.

    *

      *Bandwidth efficient:* rdiff-backup depends on librsync, and thus
      uses the same diffing algorithm as rsync (rsync and rdiff-backup
      strictly speaking do not share any code however). As a result,
      when when writing to a remote location, rdiff-backup will only
      send diffs over and can use much less bandwidth than, say, ftp or scp.

      For instance, suppose you slightly alter large file A to make
      large file A', and A is still on the remote system. When
      rdiff-backup is run, it will only send over the diff A->A' (in
      order to "copy" A' to the remote system). Neither A nor A' needs
      to be sent in its entirety.

    *

      *Transparent data format:* Except for recording the hard link
      structure of old data sets, rdiff-backup doesn't absolutely
      require any data files formatted specifically for rdiff-backup. So
      if you want to stop using rdiff-backup in the future, you won't be
      stuck with any undecipherable files in some strange format. As
      noted above, the mirror directory will just be a copy of the
      source directory as it was when rdiff-backup was last run. Earlier
      states of your files are saved just by 1) keeping a copy of them,
      2) in diff form as produced by rdiff, or 3) as a gzipped version
      of 1 or 2.

    *

      *Filesystem feature autodetection:* People use rdiff-backup in
      many different environments. The filesystem they want to back up
      may be on Linux, Windows, or Mac. It may or may not be case
      sensitive, support characters like ":", have resource forks,
      extended attributes, or access control lists. Moreover, the file
      system they are backing up to may or may not support these features.

      rdiff-backup tries to handle these situations automatically
      without the need for switches like --acl --ea --no-ownership, etc.
      When run it will run tests on both the source and destination
      filesystems to see what features each supports like case
      sensitivity, changing uid/gid ownership, resource forks, extended
      attributes, or access control lists. To see the results of this
      testing, run rdiff-backup with verbosity 4 or higher, as in |-v4|.

    *

      *Mac OS X resource fork support:* On Mac OS X systems,
      rdiff-backup will backup the resource forks which store, for
      instance, Finder information. Most unix backup programs would only
      backup the data forks and discard the resource forks.

    *

      *ACL and EA support:* If rdiff-backup can find the pylibacl
      <http://pylibacl.sourceforge.net/> and pyxattr
      <http://pyxattr.sourceforge.net/> modules, and if the file system
      supports these features, rdiff-backup will preserve Access Control
      Lists and user-level Extended Attributes.

    *

      *Keeps statistics:* After each session rdiff-backup writes summary
      statistics to a text file. You can inspect these to see how large
      your repository is, how fast it is growing, and how much space
      rdiff-backup is saving you, and more. Here is an example
      |session_statistics| file:

      StartTime 1124018521.00 (Sun Aug 14 06:22:01 2005)
      EndTime 1124019454.64 (Sun Aug 14 06:37:34 2005)
      ElapsedTime 933.64 (15 minutes 33.64 seconds)
      SourceFiles 975715
      SourceFileSize 13078345389 (12.2 GB)
      MirrorFiles 975604
      MirrorFileSize 13076177922 (12.2 GB)
      NewFiles 119
      NewFileSize 1103075 (1.05 MB)
      DeletedFiles 8
      DeletedFileSize 190653 (186 KB)
      ChangedFiles 2032
      ChangedSourceSize 395324417 (377 MB)
      ChangedMirrorSize 394069372 (376 MB)
      IncrementFiles 2233
      IncrementFileSize 6098156 (5.82 MB)
      TotalDestinationSizeChange 8265623 (7.88 MB)
      Errors 80
          

      rdiff-backup also saves very detailed statistics in a
      |file_statistics|. This file is also in (compressed) text form,
      but is usually too voluminous to read manually.



More information about the sllug-members mailing list