Checkpoint Restart for Linux
----------------------------

This is version 0.01 of checkpoint-restart for Linux. Development was done
under version 0.99.14 but it should work with other recent revisons provided
that the task structure isn't too different.

I did this partly as an exercise to learn more about Linux internals,
I don't claim it is going to be useful to lots of people, but what the
heck, its nearly Christmas.

This will allow you to save a copy of a running process and restart that
process from exactly the same state at some later time - and to repeatedly
restart it if desired. There are lots of things which this version will
not be able to cope with, and a proper checkpoint-restart implementation
would be able to checkpoint a whole set of processes in one go and
recreate them all. This version will work for simple processes, nothing
too sophisticated, see the file ticker.c for a REALLY simple program.

I have never crashed my system running this code. I had a number
of minor file system corruptions during development due to getting
the counts wrong on inodes, and sometimes you cannot unmount a file
system because a count has not been decremented correctly. I think
I have these bugs out of the system now, but there could well be
cases where it will leave things in an incorrect state.

Comments, suggestions, bug fixes etc to

	lord@cray.com

-----------------------------------------------------------------

Components:

1. chkpnt [-k] pid

The chkpnt process will attach to the process specified by pid, stop it
and write its saved state out to a file called prog.pid where prog is the
name of the command being run and pid is its pid (e.g. emacs.1249). This
file is a new type of executable. The -k option will kill the process
being checkpointed after its state has been saved, otherwise it is
left to carry on.

The chkpnt process waits until the process is returning from a system
call. This is to simplify the state which must be restored on restart,
and means that there may have to be some activity in the process before
it can be checkpointed. Hopefully this can improved on in future versions.

NOTE:	chkpnt works out offsets into the kernel by reading a symbol
	table out of /usr/src/linux/tools/zSystem, if this is not
	the symbol table of the running kernel then it will read from
	the wrong locations and anything could happen (actually it
	usually just says it cannot find the process and exits).


2. Support for restart executable format.

This is a kernel mod which introduces support for the new executable
format created by chkpnt. Code changes are actually very small beyond
the new module fs/restart.c and header file restart.h

When you start up a restart file it attempts to recreate the old process
image based on data stored in the file. This includes the memory map of
the old process, all its file descriptors in the same state as they were
when chkpnt was run, and access to a tty. If the program which was stopped
was using a tty when it will be given the one it is now executing on instead
of the original one.

When saving the process state, all references to files (including the
executable and shared libraries) are saved as device and inode numbers
as path names are not available from the kernel and finding them would
be a major undertaking. This means that when restarting the process all
the same files must still exist and reside in the same inodes. So
changing shared library versions or recompiling the executable will
stop restart from working.

On the other implementation of this facility I have seen there is a
unique sequence number stored in each inode, this number is changed
each time the inode is re-used. Linux does not have this, so it is
possible for an inode to be reused and a restart file to attach to a
new file thinking it is the file which existed when it was checkpointed.


There is a huge list of things which will not work yet, many of these
cases will not be caught by the code yet:

	o processes with children

	The process will be checkpointed and restarted, but the children
	will not be touched. If there is any interaction between them
	then this will cause problems.

	o interprocess communication

	Sockets, pipes, SYS V shared memory. In a lot of cases it doesn't
	make sense to save processes which are doing these types of things.
	What you really want to do is checkpoint both ends of the
	communication and start them again together. There is going to
	have to be a major snow storm before I attempt this one!

	o unlinked files

	Processes which create a file and then unlink it from the file
	system. In this case the state of this file would have to be
	recorded in the restart file.

	o NFS

	Well I don't think this will work, if there is stuff lurking in
	buffers it will get lost.

If a problem is detected during restart then a message will be printed
to the console and a SIGKILL is delivered to the process being created.

------------------------------------

Things left to do (lots and lots)

1.	More checks when creating a restart file to see that it is actually
	worth doing.

	o check for child processes.
	o check for file systems we cannot checkpoint - pipes and nfs
	o check for sockets.
	o check for SYS V IPC usage.
	o check for open files which are unlinked from the rest of the
	  file system. These need copying into the restart file if we
	  want to cope with them.

2.	Attempt pid re-use. Often code will want to know its own pid
	for some reason. Changing the pid underneath might cause
	problems.

3.	Work out how to stop a process at a point other than return from
	system call, there will be cases where checkpointing hangs when a
	process is blocked in the kernel. Currently we wait until a process
	gets into this state as it is easy to restore from here.

4.	Checkpoint a group of processes:

	o Freeze main process.
	o Read kernel info.
	o Find all related processes and freeze them.
	o Read kernel info again.
	o Expand file format to cope with multiple processes.
	o Expand restart to fork and exec multiple images out of the
	  restart file.

5.	Checkpoint unlinked files and pipes.

	o Checkpoint pipes between processes if both ends of the pipe
	  are within the set of processes being checkpointed.

6.	Checkpoint NFS files

	This is possible - because I have seen it done.

--------------------------------------------------------------------


