The subtly of backups is often overlooked. What is meant by a backup depends on the purpose, and no backup scheme can satisfy all requirements.
Some backups serve the purpose of system administrators. The goal of these backups is to provide a mirror-image restore of the file system as it was at certain planned checkpoints in time. This does not require a block-by-block copy, but that it be sufficiently identical.
Backups which serve the purposes of a users only need restore the “heart-and-soul” of the data, and not technical details of the representation or storage. The restored file can differ in unseen ways from the original, and this will be of no consequence to the user.
One example is the sparse file. Most unix systems allow that a file span over character locations in which the file has no content. The classic unix file system uses a tree of pointers to data blocks, and a null pointer signifies a missing data block. If one were to read such a file as a stream, the file system will provide zeros as the contents of the missing blocks. In fact, another viewpoint on sparse files is that no byte of the file is missing, rather blocks of all zeros is represented by a null pointer rather than a pointer to data.
While a user might not care if her sparse file were recovered as a non-sparse file, a system admin might care very much. If sparse files are not recovered as sparse, the recovered files might not fit back onto the medium from which they came. There might be other unforeseen consequences.
The following is a demonstration of sparse files, as well as a demonstration that for purposes of backup which preserves sparse files, cp cannot be used (nor tar, cpio, or scp, for that matter):
$ uname -rs
$ dd if=/dev/zero of=sparse.yes bs=1 count=1 seek=128K
1+0 records in
1+0 records out
1 bytes transferred in 0.000039 secs (25732 bytes/sec)
$ cp sparse.yes sparse.no
$ ls -ls sparse*
130 -rw-r–r– 1 burt user 131073 Jul 11 19:47 sparse.no
18 -rw-r–r– 1 burt user 131073 Jul 11 19:46 sparse.yes