[Gb-ccb] Another idea for restoring onto a live root filesystem
Martin Shepherd
mcs at astro.caltech.edu
Wed Nov 30 21:31:18 EST 2005
Regarding backing up, restoring and keeping all of the CCB
microdrives in sync.
It looks as though there may be a safe way of restoring a backup of a
root filesystem, without having to attach a CDROM drive, or boot from
the network.
When Linux boots, it loads a filesystem image called initrd (Initial
Ram Disk) into a temporary RAM disk. This image contains a minimal
root filesystem, which includes an initialization program. Said
initialization program uses utilities in the minimal root filesystem
until it is ready to mount the real root filesystem. It then tells the
kernel to switch to using the real root filesystem.
What I am thinking of doing is creating a custom initrd image, that
would do all of these steps except the last one, such that it would
continue to run Linux from the RAM disk version of the root
filesystem. I would try to include sufficient utilities in the
RAM-disk root filesystem to allow ssh connections, and to allow dump
and restore to be run. This would then enable safe backups/restores of
the hard-disk root filesystem, over the network.
I have found an example of creating a custom initrd image in the
"Preparing boot files" section of the following web page. In the case
of this web page, the custom initrd image is designed to be loaded
from the network. But I don't see any reason why it couldn't be loaded
from the hard disk instead.
http://howtos.linux.com/howtos/Clone-HOWTO/index.shtml
FWIW, the introduction of the above web-page, includes the following
paragraph:
"2.2. Why boot from a network
Booting from hard disk would limit the possibilities of copying
images. It wouldn't be possible, for instance, to safely copy to and
from a partition mounted by the booted operating system."
Note that by "copying", the above paragraph is refering to copying
images of hard disk partitions.
My personal worries about restoring a backup onto a live
root-filesystem, include the following:
1. First of all the kernel has its own cache of what it thinks is in
the root-filesystem, along with an ext3 journaling cache. However
the /sbin/restore program bypasses the kernel, and writes directly
to the underlying ext2 filesystem, including updating its
meta-data. Thus, after running /sbin/restore, the kernel's view of
the filesystem's contents and metadata, versus what is actually on
the disk, won't match. This may not matter if the kernel doesn't
attempt to write anything to the disk, between the start of running
/sbin/restore and the system being rebooted. However the root
filesystem has to be mounted read/write while the restore program
is running, and even if we immediately remounted the disk readonly,
to prevent the shutdown process from writing to it, the act of
switching it back to readonly might have the side-effect of syncing
cached data to the disk.
If data were written to the disk, then it might well be written to
the wrong place on the disk, and potentially trash either the
contents of a file or directory, or the filesystem metadata. The
system would thereafter either be completely unbootable, or worse,
contain an unknown corrupt file that could cause occasional
unexplained crashes or wierd behavior.
Thus, although we could try restoring a backup to a live
filesystem, as suggested at the telecon, and not notice any
problems, that wouldn't guarantee that some important file
somewhere on the disk didn't get corrupted.
2. Less worrying, but potentially problematic, is the fact that a
restore could overwrite a file that is read during shutdown, and
thus cause the shutdown to hang. I don't believe that the
/sbin/restore program preserves block assignments when it replaces
a file (for example people sometimes dump and then immediately
restore a filesystem to defragment it, and this depends on the
block reordering that restore performs). So the blocks previously
assigned to a re-written file, might end up containing part of a
completely different file, and if the kernel's cache of the
filesystem hierarchy pointed to the original start-block of the
file, then strange things could happen.
This might be less problematic than the first issue above, since we
could then power-cycle to recover the system. However this would
then fail if the ext3 journal didn't match, as per issue 1.
I don't know whether these worries are paranoid or not, but our
sysadmins here certainly don't think that it would be advisable to
attempt to restore a backup onto a live root-filesystem.
Martin
More information about the gb-ccb
mailing list