ioctl
Backing up and recovering a Debian installation on a machine via USBstick and system image on an XP workstation
This is just a brief summary from first principles of what should go onto a bootable USB key in order to enable a bare metal recovery of a Debian media PC. I wrote it because I couldn't find anything that actually fit the bill elsewhere, although various bits and pieces helped to put the puzzle together.
Again, this was written with James in mind: a Debian newbie (whose ambitions are limitless). His setup - several PCs on a LAN, most of which run Windows of some flavour or another. He has a single small "media PC" onto which we installed Debian. The media PC is USB- and network-bootable, but has no floppy drive or CD. Given the availability of other Windows boxes on the LAN, he wanted a simple way to back up his machine contents to one of those boxes (as a staging point from whence harder copy, such as DVD, could be produced). Additionally, he wanted a simple mechanism for a DR situation that would let him recover his media PC. The DR situation in question assumes that the other machines on the LAN are intact and have been recovered, but that he's looking at a blank or otherwise trashed media PC.
The solution falls into two pieces:
The first step is to produce a backup of the system to restore later. This should include all the information required to recreate the system, assuming a starting point of an unpartitioned disk.
The mount command can be used to locate the filesystems that we want to preserve. We only back up non-transient, real data - no pseudo filesystems.
The backup process is a simple one. Use smbmount to make the remote Windows filesystem available. Use the dump command to make a full backup of the filesystem. We compress this and split it into manageable chunks at the same time. (Not every Windows filesystem likes huge files; we split at around the 1GB mark.) Repeat for every real, on-disk filesystem. We don't need to back up /proc, /dev and the like.
Depending on the details of how you've shared out the folder you're going to be using, you may have to supply credentials to smbclient and smbmount.
Anyway, once you've mounted the remote folder, you need to dump your filesystem(s) onto it. Some Windows versions don't support large files, and some smbfs implementations don't either (although I think if you're using a modern kernel against Windows XP you should be fine) so for safety's sake, we use split to chop the backup file into manageable chunks.
The package I use for actually performing the backup is "dump", which provides the BSD-style dump/restore commands.
Real filesystem dump times will take longer. For example: in James' setup, with a single filesystem of about 2.7GB, it took around 80 minutes to write out a compressed dump that totalled about 1.3GB. That's around 33MB/s off the disk and through the compression filter and 16MB/s over the network and onto the Windows disk.
This process finishes with a small number of files on the remote share, each of which (apart from the last one, which may be smaller) is 1024000kB in size.
There is usually a small amount of additional metadata that is required in order to perform a bare-metal recovery. In this case, the actual disk layout (the details of the various fdisk partitions) will need to be known if they are needed to be recreated. We can use several tools to get this information - we picked parted in this case.
If the filesystems themselves have any special, non-default parameters set, then that information will also be required. Example parameters are things like the number of bytes per inode. You'd need to know this if you wanted to recreate the filesystems using the same parameters. In our case, we were faced with a single large ext3 root filesystem (with default parameters) and a swap partition - pretty straightforward.
The basic plan for the bare-metal recovery can be stated as follows: we prepare a bootable USB key that contains an absolutely minimal functioning operating system, together with sufficient tools to recover the system from the information we've got stashed on the (network-available) Windows share.
The tasks we'll face, given a working USB key, are as follows:
Having used SYSLINUX before to actually install Debian onto the Mini PC from a USB key, we knew that this process could be made to work (in principle). There is a version of syslinux available for Debian, so we should be able to prepare the boot stick from within the Linux environment.
The first step was to use syslinux to prepare the boot device. Inserting the keyfob into the USB port, we were able to mount it (it was already formatted and partitioned) and ready it.
There are three additional things that are needed on the boot device in order for it to work. These are: a configuration file for syslinux, a kernel image and an initial filesystem image.
We will supply a the referenced files, vmlinuz and initrd, shortly. To understand what these parameters do, you need to understand the linux boot process.
Assuming the BIOS is capable of booting from USB, the boot sector of the key (which syslinux writes) will load and run the loader, ldlinux.sys. This, in turn, uses BIOS I/O calls to load the kernel image (vmlinuz) and, if specified, an initial filesystem image (initrd). This filesystem image resides in RAM, and is used to bootstrap the system in normal operation. For our purposes, however, we will produce a custom initrd that contains just the bare essentials required to perform our recovery.
If initrd is specified, then after the kernel is running, the first process it launches is /linuxrc in the RAM filesystem. This can be anything executable: a binary file or a shell script (assuming the shell interpreter and all its requirements are also present in the ramdisk image). Once the /linuxrc process exits, the kernel expects a "real" root filesystem to be in place and will launch /sbin/init as per usual.
Details of the boot process can be found in the kernel source. If you want to have a look...
There appear to commonly be two formats for the initrd image that are used by various Linux loaders and boot systems. In no particular order, these are: a gzipped ext2 filesystem image; a special compressed boot filesystem called cramfs. The latter is not an updatable (writable) format once mounted.
Assuming that you've got a directory hierarchy in initrd-root/ and you want to turn its contents into a gzipped ext2, here's how you go about it. Note, I'm skipping ahead slightly: once the process for creating an initrd image is understood, I'll come back to the question of what contents it'll need, exactly.
If you follow the route above, you'll need to set initrd.gz as the value of the initrd parameter in syslinux.cfg.
Alternatively, you can create a cramfs image in a rather more straightforward fashion:
Which filesystem type should you use? Well, cramfs looks simpler to manage, although it has a limitation (it's not able to be written to/ updated once created) which we'll see shortly: for the purposes of experimentation, that limitation might get in the way.
The decision as to filesystem type is actually forced upon you by a chicken-and-egg situation: modern Linux kernels tend to use loadable modules for much of their functionality, filesystems included. Some functionality is built directly into the kernel, some as a loadable module. But at the point where the kernel has just been bootstrapped by ldlinux.sys, no loadable kernel modules are available: they're in the ramdisk filesystem! The kernel must have a stand-alone capability of understanding the ramdisk image, which is to say, that the filesystem type in question (ext2 or cramfs) must have been built into the kernel.
In the case of James' Debian installation, the kernel has in-built cramfs support, but ext2 is available only via a module. Rather than build a custom kernel for the USB stick, our plan was to use a copy of the working kernel (that is, whatever Debian booted into normally) as the basis of our recovery image. So we went with cramfs.
The other kernel parameters that ldlinux.sys supplies to the bootstrapped kernel are explained in the kernel source Documentation directory. The important one is devfs=mount. Although devfs is now deprecated, this pseudo-filesystem (which is mounted automatically over /dev by this parameter) is very useful: device nodes are created automatically in it as the kernel detects devices. As such, it is a great tool for telling when you've loaded all the kernel modules you need (the driver will create the device node) - as well as not having to worry about what major and minor numbers are associated with a particular driver. The alternative to devfs is to manually create the device nodes we'll require in the initrd-root directory beforehand. This is a possibility, but a clumsy one and one that we'd rather avoid if possible.
Having understood how to create an initrd image, given a pre-prepared directory hierarchy, we now address the question of what that hierarchy should contain. The initial list I used follows.
The first question is what kernel modules we should make available to our boot environment. Here, we cheated: since the plan is to use a kernel that we know works (because it is the kernel that runs the Debian installation normally), we just used lsmod to get a list of modules that are loaded after normal day-to-day operation. To that list, we add smbfs for smbmount support.
As shown above, we begin to build a root filesystem hierachy for our initrd in initrd-root. Many of the modules that a normally-running kernel requires will not be needed for recovery operation; however, this tactic is a quick way to grab pretty much everything that will be required.
A look at the output of lsmod is instructive. In particular, jot down likely drivers (as I've highlighted above) plus any modules that are depended upon by those drivers. For example, the IDE stack comprises ide_core, ide_disk and ide_generic. You may have machine-specific drivers for your chipset too; in the case of the MiniPC, we needed some VIA chipset support.
We're building a recovery filesystem, so we'll need some empty directories in there to mount filesystems over.
Remember, this is for bare-metal recovery. When we recover the system, there is no guarantee that the hard drive in it will be partitioned or have the correct filesystems on it. The recovery shell will run out of a ramdisk; so we need the utilites present to partition the hard drive, make a filesystem on it, mount the filesystem, mount the remote share, and run the recovery. Additionally, we'll need a shell, and the basic file utilities that you probably take for granted.
We use busybox as a convenient way to supply many of the standard commands. For a shell, we'll use ash. Throw into the mix the fsck tools, mkfs, smbmount, bunzip2 and restore to perform the recovery. What's missing? We use parted for hard disk partitioning. One non-obvious requirement is insmod, a low-level tool for loading modules into a running kernel. Without that, our effort locating kernel modules will have been for naught!
Many of these utilities use dynamically-loaded libraries: that is, there are a handful of hidden dependencies. The ldd command can find these library dependencies. Note: some libraries actually depend on additional libraries, so we use ldd on those too. The following small script will take care of this (this continues the previous session).
Finally, busybox can modify its behaviour depending on how it is invoked (a switch on the value of argv[0]). Similarly, we'd like to make /bin/sh a sysnonym for dash. Additionally, fsck.ext2 and fsck.ext3 are the same program. So we add some symbolic links to our initrd. In the script below, the targets of these links are absolute, not relative. That's ok; in use, the contents of initrd-root will be mounted at the root directory.
We finish our symlinking by making /linuxrc a pointer to /bin/dash; this should drop us into a shell after the kernel finishes its bootstrap process.
(Note: there are a set of scripts provided at the end of this document that take a lot of the drudgery out of the processes I describe here.)
It's time to finish up and test the boot key. We do this by remounting the keyfob filesystem and copying our kernel and initrd images back onto it.
Crunch time: time to try the recovery device! Reboot your system with the USB key inserted. You may have to fiddle with the BIOS settings to ensure that the USB key appears before your (currently functional and bootable) hard drive as a boot device. If all goes well, you'll see SYSLINUX load and start your kernel; you should see some starting kernel device probe messages, followed by a shell prompt: that's linuxrc running, which in this case is the /bin/dash interpreter.
There are some pitfalls to be aware of with our rudimentary environment. Some of these are more problematic than others. That's ok; this first session is simply to test the water.
There are a number of occasions where it is convenient to be able to write to the ramdisk image. This is not possible with a cramfs root filesystem. There is a way to replace the root filesystem with a writable memory-based filesystem, once the initial boot is complete: that is documented shortly.
Many very basic utilities, such as ifconfig and ps, require a working /proc filesystem in order to function at all. The first time I built the initrd image, I neglected to include an empty directory to mount /proc over; because cramfs is not writable, I was unable to run mkdir /proc in order to create the mount point. (Actually, it's possible to work around such a problem, but it was simpler to reboot into the operating system proper in order to update the initrd image on the USB stick.)
After my second attempt at an initrd image (this one with a /proc) the first missing utility I discovered was insmod. (You'll note I've skipped this learning step in the scripted session above.) Without this, it's impossible to load any kernel modules into the running system.
Other non-obvious missing dependencies include some additional support required for smbmount. (In fact, to speed things along, I've added the smbmnt executable to the list above - when I initially did this work, I found out that it was required because smbmount complained that it couldn't be found.) Not to worry, we can jot down a list of missing things and rebuild our initrd. Once the boot disk is proved in principle - that is, the hardware drivers are demonstrated to be working - it's possible to speed development up by using a virtual machine to debug some of the higher-level utilities. I'll outline that process later.
I use, and am used to, the GB layout for keyboards. By default, the Debian kernel uses some other keyboard layout (US?) - consequently, handy keys like the quotes, the pipe, etc, aren't necessarily where you think they might be. Mostly, I can get by with a little trial-and-error.
I must confess that I found this out the hard way. The initial environment is very basic; amongst other things, it has no support for job control. The most marked consequence of this is that the interrupt key sequence, control-C, will not work. I discovered this after I managed to get a network interface up: ping in its most basic invocation entered an infinite loop, necessitating a reboot after I'd run it. Oops!
So, we've booted into the recovery environment. We sit, staring at a root shell prompt. What're the next steps? We'll address some of the issues highlighted above, then move on to getting our recovery going.
Our first task is to get our basic environment sorted out: to get the prerequisites working before we tackle the network and disk subsystems. The initial step is to get a writable root filesystem working. (We won't need this immediately, but it makes life much simpler in the long run; and this issue is technically simpler if we address it first.) The process: we do this by creating a new, writable, virtual-memory-backed filesystem (tmpfs) - this has the added advantage that once swap space is enabled, its contents can be paged out to disk. Having made and mounted the filesystem, we carefully copy the contents of the existing root filesystem into it. Finally, we invoke pivot_root, which essentially swaps the relative positions of the two filesystems: the new filesystem becomes the new root, and the old filesystem winds up mounted in a subdirectory under it.
The next step is to get a working /proc.
At that point we can unmount the old root (and the devfs instance it contains).
The next milestone is to connect our newly-booted machine to the LAN. In our case, the machine normally lived on a private subnet, 192.168.1.0/24. It would usually have an IP address allocated out of a DHCP pool. For the purposes of recovery, we instead decided to manually allocate an IP address to the machine (one that didn't coincide with the DHCP pool). Before we can configure eth0, however, we actually have to get the network device recognised by the kernel. To do this, we need to load the appropriate modules to support the card. These will vary depending on hardware; in our case, we were able to pull the likely driver suspects out of the lsmod output from a normal session with the machine booted into its full operating system.
Now that the kernel knows about the new network device, we can configure it with an IP address and try to ping our recovery machine (at 192.168.1.1). We also configure the loopback interface at the same time. Note, we give ping a count to prevent it running an infinite loop - remember, there is no job control available in this shell.
Success! This represents a major milestone: if our recovery boot device is able to see the recovery host on the LAN, we're close to completion.
Note that we only needed to load kernel support for our particular network card. The stock Debian kernel we used has the loopback device built in, as well as the rest of the IPv4 framework pieces that we'll require.
With basic TCP/IP networking configured, the next step is to get smbmount working to access the share on the recovery PC. This proves to be trickier: SMB support comes with a whole raft of issues. We are satisfied with a less-than-complete solution providing it suffices to let us grab our dump images for the purposes of running a system recovery. If your native language is Chinese, you might need to work harder at the SMB codepage support.
We jump right in, and try to mount the remote filesystem. Here are the steps to follow:
We ignore the code page translation errors and the lack of /etc/samba/smb.conf in the rest of this write-up. If your Windows host is more demanding, you may have to supply additional dependencies in your recovery environment. Such considerations are beyond the scope of this document.
In other words, this stage represents a "good enough" state of affairs. We're not looking to use this facility for day-to-day work - our motivation is simply to create a recovery environment, no more.
I've also arranged this write-up to avoid many of the errors I encountered whilst developing this process. For instance, smbmnt tries to write to /etc - an issue which I skirted by beginning the session with the replacement of the root filesystem with a writeable one.
Now that we can see the remote dump contents, the next step is to get the kernel to talk to the local hard drive. The modules we needed to get this working on the MiniPC are shown below; your details may vary.
Note that whether the target device nodes under /devfs will actually exist at this point depends greatly on the exact state of the hard drive in the PC. Nevertheless, we can create symbolic links at this stage: once the drive has been correctly partitioned, the partition devices will appear under /devfs and the symbolic links will resolve correctly.
We can now test parted. Let's have a look at the current state of the hard drive. Note: this is a non-destructive operation.
I've used a pretty precise method here to select and load the minimal set of kernel modules that appeared to be necessary to perform these operations. If you're having problems, there is a more scatter-shot approach that may well work for you: take a running system, use lsmod to list the loaded modules. Reverse that list (since it shows the most-recently-loaded module at the top, with its dependencies beneath it) and use that order to load modules. That is:
You may have some success with this approach if all else fails.
Thus far, our effort has taken place in a relatively safe environment: all testing has been non-destructive. From this juncture, that is no longer the case - parted, mkfs and restore will be used to write to the hard drive. We want to practice our disaster recovery process before a disaster happens; but we don't want to actually precipitate a disaster during the testing process.
Our testing up to now has been to prove our concept: we can see the remote network share; we can see the hard drive. That testing has been done on real, live equipment because it is crucially dependent on having the right set of kernel drivers available. Having shown our plan to be feasible, we can now develop the rest of the process in a more controlled, less critical environment. For the purposes of testing, we use qemu, an emulator capable of reproducing a completely virtual PC.
The use of qemu has a double impact: firstly, the emulated hardware is not the same as that on the MiniPC. We will require a different set of kernel modules loaded to enable virtualised networking and the virtual hard drive. Secondly, qemu enables a much more rapid testing cycle: there are no more boot / write USB key / reboot cycles to deal with. Instead, we can prepare our boot image and launch qemu directly.
Unfortunately, qemu is not yet capable of emulating a USB boot device. Instead, we fall back on an emulated boot CD image. This can be prepared using the isolinux facilities provided by the syslinux package and the standard ISO image creation tools.
In such a rapid development environment, the bottleneck becomes the speed of building our initrd images. Fortunately, the process is sufficiently mechanical that it can be captured with a set of scripts. Those scripts are available online at http://ioctl.org/unix/debian/debianboot.tar.gz and encapsulate the discussion in this document. The tarball contents are as follows:
There are three small (28 lines in total, excluding comments) configuration files that live in the etc directory.
In addition, there is a skeleton of the initrd image:
Links, scripts, and so on can be created in this directory. Its contents will be used to form the basis of the constructed initrd image.
I've taken the time to write scripts that capture a lot of the processes outlined in this document. In particular, the current overlay contents include a set of startup scriptlets that will automatically run through the pivot_root sequence, populate the /dev directory and set up a handful of useful shell functions that automate some of the more laborious tasks outlined above.
You may wish to look through the contents of overlay: again, it is small - 77 symbolic links (most of these come from busybox) and around 10 files comprising roughly 60 lines of script. These are best read in the order they operate:
Assuming your boot device works correctly under emulation, you can issue the following sequence of commands at the shell prompt in the qemu window: the following sequence assumes you've made the Samba configuration change given above.
We turn our attention to the hard disk. For the purposes of testing, we'll be practising these steps inside qemu, until we get them right. From now on, I'll also assume the scripts above are being used to prepare a working initrd. The initrd so produced includes the startup process outlined above, that also defines a shell command, qemu, which executes the following commands:
Boot up a fresh qemu instance, and run the qemu command. The first thing to do is to ascertain the current state of the hard disk: (note, /dev/hda is a symbolic link to the devfs device node that is created by overlay/scripts/12-devices)
In this case, the hard drive is completely blank, and needs a new partition table. Continuing the session:
Better, we now have a blank partition table. The next task is to recreate the partitions that we had originally, as saved by our backup process.
The first thing to note is that the final partition layout after the parted session has completed differs from our saved layout. This is due to two features: firstly, the original disk was partitioned using a different partitioning program, which lined up the end of the disk slices differently; secondly, the qemu emulated disk may have different geometry to the original device.
Is this disparity important? If it is critical to your application, then some other partitioning utility, such as fdisk, may be better able to recreate your partitions. As it is, this difference is not critical. The dump format we use to save the filesystem contents is restored using filesystem (as opposed to block) operations and is consequently impervious to perturbations in the precise disk layout or partition size. All we need to ensure is that the resulting filesystem has sufficient space to hold our restored files.
The second thing to observe is that the "Filesystem" contents of the new partition table are blank. Although we have allocated space for the filesystem, we have not yet constructed it (it's "unformatted space", in Windows-speak). That's the next step.
Before performing this step, be careful. Although qemu is pretty clever in only allocating as much real disk space to hold its virtual disk image as it needs, the following commands will produce a qemu/hda file that's around 600MB in size. If you want to practice the process without generating such a large filesystem, you should edit the debianboot etc/construct.conf and allocate a smaller (say, 5GB) virtual disk. Note, you'll also need to trim the size of the partitions you create using parted if you do this. As a final reminder, you'll probably want to delete the qemu virual drive after you've finished practising; you can always recreate it, and you don't want it to bulk out the size of your backups.
With our target filesystem created, the next step is to mount the filesystem so that we can perform the restore. We use the /mnt/hd directory to mount the filesystem. This means that what appears as the /mnt/hd directory in our recovery environment will contain the root of our restored filesystem.
Success! The filesystem (which is currently empty) is mounted, ready to receive our restored data.
Note that the supplied debianboot.tar.gz contains a startup script (/scripts/40-filesystems) which automatically creates /etc/mtab and loads the kernel modules required to support the EXT3 filesystem; the commands required are given explicitly above, but are unnecessary if you base your recovery image on the one supplied.
Things are a little more complicated if your original file tree was comprised of multiple filesystems (that is, other filesystems were mounted under subdirectories of the root filesystem). This wasn't the case with James' small PC, so I won't cover it in detail; however the process is pretty straightforward. You need to recover the root filesystem first, or create by hand the mountpoints under your root filesystem, before restoring the other filesystem contents. It's possible to carefully juggle the order of these operations to migrate a system from one using multiple filesystems to a single filesystem, or vice-versa; however, that's beyond the scope of this document. If you understand the reasons why you might want to do it, you probably know enough to be able to perform the operation.
To review the current state of play: we have successfully mounted the remote Windows share that contains our compressed filesystem dump. We have repartitioned the hard drive and recreated our target filesystems. All that remains is to restore the dumped filesystem contents to the newly-created filesystem.
Normally, the bare-metal recovery process would use a full automated recovery (which is shown below). However, you might wish to try to recover a subset of the filesystem instead. The restore utility supports an interactive recovery session which lets you navigate the available files and select those you wish to restore.
Note that even this partial recovery will take some time. This is because the compressed filesystem dump is read as a stream (rather than being randomly accessible). The amount of time required should be roughly the same as the time taken for the full dump.
Restore can also be used in noninteractive mode: for most base-metal recovery scenarios, an automatic full restore of all files is usually the desired operation.
The full recovery will take some time: around the same amount of time as a full dump took, under normal operation.