

README for Linux-1.0-inline.asm				Mar 25,1994

* ======================================================================= *
* NOTE to this release:
*
* this is NOT A NEW VERSION. Some people had problems patching the
* kernel; it seems that we used a wrong (?) version of diff and patch
* (diff v2.4; patch v2.1) and that diff generated a wrong line refe-
* rence. We remade the diffs with diff v2.6 and tested with patch 2.0
* rev 12u8. Given the previous experience, we cannot guarantee that
* everyone will be happy. Please DO NOT USE these patches if you
* already succeeded with the previous release!
*
* ======================================================================= *

Here are patches against Linux 1.0.x for speeding up and optimizing
some code sequences used by the kernel, notably those in
/linux/include/linux/string.h.

The purpose of these patches is:

 1) speed up some code sequences which are inefficient on the i486
    and pentium compared to the i386. In particular, the string move
    instructions (movsb,lodsb,stosb) are slower on the new processors.
    The gain in speed is about 20-30% on the single functions; on the
    whole kernel it is much difficult to measure, but certainly smaller
    and not normally noticeable.
 2) help gcc getting a better register allocation and produce smaller
    code, as a consequence of avoiding as much as possible any
    instruction which force register use (lods,stos,movs,loop); this
    allows replacing the "c","S","D"... asm constraints with the more
    general forms "q" and "r", letting gcc choose the best register
    placement strategy.
 3) fix the behaviour of the functions in string.h, which are now
    ANSI-compliant.

All the functions were handcrafted and individually tested before their
integration in the kernel. Currently we have three i486 machines running
with these modifications without any problem.
BUT, as we cannot install/test all the possible hardware/driver combina-
tions, we cannot guarantee they will work on your system. What we did
was to compile a version of the kernel with all the drivers/options
enabled, and check that all the kernel sources compile without troubles.
We are actually confident that our patches are bug-free, but please be
careful and make a backup, or test the new kernel from a floppy before
installing it.
We successfully tested:
	net:	3c503 and depca drivers
	sound:	SoundBlaster 16, pcsndrv0.6
	tape:	QIC-02 and QIC-80 (no SCSI)
	hd:	IDE interface
	cdrom:	sbpcd
	mouse:	Microsoft serial
	... and of course the 'normal' stuff (keyboard,console,serial,
	 floppy drivers)
	file systems:	msdos,ext2fs,nfs,proc

HOW TO INSTALL:

Unpack the tar file in a temporary directory. Let it be /tmp.

Get a clean 1.0.x kernel source in /usr/src/linux (some adjustments may
be necessary if you have already a patched kernel, but they should be
minor).

Go under /usr/src and type

	patch -p0 < /tmp/Linux-1.0-inline.asm.diff

Check for errors (*# or *.rej files); there must be no errors if you
patched a clean 1.0.x kernel.

Then rebuild the kernel as usual (We assume you are used to do it).

If you get the message
  kernel.o: undefined symbol _memcpy referenced from data segment
just delete the reference to _memcpy in the /linux/kernel/ksyms.S file.

If you get a message like
 <yourdriver>.o: undefined symbol <symbol> referenced from text segment
try to add an
	#include <linux/string.h>
and delete any
	#include <memory.h>
from the <yourdriver>.c file.
If the problem persists feel free to contact us.

-----------------------------------------------------------------------------

References:

Mike Schmit, "Optimizing Pentium Code", Dr.Dobb's Journal,Jan.94, p.40
Alex Lane,   "Optimizing for Today's CPUs", Byte,Feb.94,p.81

Many thanks to Linus Torvalds for his useful suggestions.

-----------------------------------------------------------------------------

Alberto Vignani		VIGNANI%MSIE03%CRFV2@CSPCLU.CSP.IT
Davide Parodi		dav.parodi@crf.it (currently not working)
