1.) What do we have here?

This is the README file for 'strings'. 'strings' is a rewrite
and replacement for the 4BSD program of the same name.
'strings' looks for sequences of printable characters in a file and
outputs them.

Current version is 1.6.9.

(To get best results when reading these files, use an option in your
 favourite editor to expand a TAB to 4 SPACEs. E.g. in vi it is
 "set tabstop=4")

You should have the files:
	README		- this file
	COPYING		- Copyright Notice
	Makefile	- Makefile
	strings.h
	config.h
	tune.h
	strings.c	- the main source
	limits.c	- the UNIX (trademark of AT&T) specific stuff to identify
				  an initialized data segment
	output.c	- output routines
	test_input	- a file containing the 2 characters the original strings
				  stumbled over. Unpack it with atob.
	strings.1	- manual page.
	strings.txt - manual page without nroff sequences

2.) How to build strings.

Now that you have strings you will want to build it.
The program is shipped with UFLAGS undefined (see below for an explanation).
On UNIX (trademark of AT&T) systems, you should be able to build the
program by just typing "make". On non-UNIX systems you might have problems.

a.) edit Makefile. If you want to play it safe, set UFLAGS=-DSAFETY_FIRST.
	strings should now compile without any problems.
	You won't get the UNIX specific stuff: the program does not
	try to identify the initialized data segment.

b.) If you don't want to play it safe, but rather you want to configure
	strings to your system, take a look at config first. There is a
	list of systems. If one of these is yours, edit Makefile and
	set UFLAGS to nothing. When compiling the defines for the system
	are used.
	WARNING: some things may differ between different versions of the
	same system. On some machines there is no easy way to distinguish
	between such versions.
	If you were wrong, and the system you are using is not in the
	list of known symbols, the minimal defaults, like in a.) will be
	used.

c.) You want to configure strings, and your system is not in the list
	of known systems.
	Edit Makefile and set UFLAGS=-DUSE_USER_DEFINES.
	Edit tune.h and set things up for your system. The variables are
	commented.

There are 3 header files. The inclusion works like this.
(you can skip this)

        (reading strings.h)
               |
               v
        (including config.h)
	           |
	           v
	<----- is SAFETY_FIRST defined?
	|          |
	|          | -
	|          v                      +
	|  is USE_USER_DEFINES defined? ---->use stuff from tune.h ---> continue
	|          |
	|          | -
	|          v
	|      is this machine 1 ?
	|          |
	|          | -
	|          v
	|      is this machine 2 ?
	|          |
	|          | -
	|          v
	|         ...
	|          |
	|          | -
	v          v
	use safe defaults
	|
	v
	continue

The program, or rather the headerfiles know about the following machines:

- VAX 11/780 (4.3 BSD) by "unix" and "vax" and not "ultrix"
- SIEMENS PC-MX2 (SINIX v2.1) by "nsc3200" and "sinix" and "ns16000"
- Sun 3/260 (SunOS 3.5) by "unix" and "sun" and "mc68020"
- VAX 6800 (Ultrix 2.1) by "unix" and "ultrix" and "bsd4_2"	
- uVAX (VMS 5.1) by "vms" and "vax"

3.) Why is this strings better than the standard one?

	a.) This version of strings is at least 4 times faster than the original
		one. If the minimal string length is set to something else, it might
		even be 10 times faster.
	b.) The original one had several bugs.
	c.) This one is public domain. You get source.

	ad a.)
		Here are results of some tests:

		machine: PC-MX2,   OS: SINIX v2.1		file: /vmsinix	(289084)
		old :		u	43.6		s	1.1			=	44.7
		new :		u	 3.8		s	2.3			=	 6.1

		machine: VAX 11/780	   OS: 4.3BSD    	file: /vmunix	(329728)
		old :		u	18.0		s	2.7			=	20.7
		new :		u	 1.5		s	0.9			=	 2.4

		machine: SUN 3/260 	   OS: SunOS 3.5 	file: /vmunix	(558359)
		old :		u	 6.5		s	0.6			=	 7.1
		new :		u	 1.6		s	0.2			=	 1.8

		machine: VAX 6800  	   OS: Ultrix 2.1	file: /vmunix	(662528)
		old :		u	 5.2		s	0.4			=	 5.6
		new :		u	 0.6		s	0.0			=	 0.6

		User, sys and total times in seconds.

	ad b.)
		The original strings
		- thinks control-L (0x0c) is a printable character
		- under some circumstances thinks 0x80 is printable. In the
		  package there is a file, test_input. Unpack this file with
		  atob. The file now contains several lines of characters including
		  a line with control-L and one with a 0x80. The original strings
		  errs for both cases.
		- did not get the start address of the initialized data right on
		  some systems.
		- had problems when dealing with the standard input.
		The first two bugs have been found on 43BSD, SunOS and ULTRIX, the
		third only on MX2 SINIX v2.1.

4.) What about bugs?

	If you find bugs, tell me. If you fixed them or if you made an
	extension which really is one, drop me a note.

5.) Notes

	This program is about 7 times faster than the orignal one.
	There are two reasons for this:
	- It does not use fgetc/fputc to get or put characters, but
	  reads characters in blocks. It does not copy them but rather moves
	  pointers around on the input buffer. There is no
	  procedure call needed to get at each character.
	  When a sequence is found, it is put into the output buffer in one
	  block, thus there is no need, like in fputc, to check for possible
	  overflow for each character.
	- When the program searches for a sequence of printable characters
	  it only examines each min_str_len character instead of each one.
 	  min_str_len defaults to 4 and can be set with command line option
	  like "strings -3".

	It can be sped up some more, but then it would be difficult to port
	it to different systems.
	Example:
	Currently the program takes 6.0 seconds on MX2 for /vmsinix.
	The improved version only needs 5.5 seconds. It is also much smaller:
	6976 bytes compared to 10596.
	Ways to improve the program:
	- On some machines another method to test whether a character is printable
	  will be faster. Now the program uses an array (isp), uses a character
	  cast to a (signed) integer as index into this array (isp_mid is the
	  base from which offsets are computed).
	  On MX2, and, if I believe my tests, on VAX, it is faster, to use
	  unsigned characters as index into this array.
	  If you want to play around with this, just change CHAR_TYPE to
	  'unsigned byte' and define the macro IS_PRINTABLE accordingly to
	  '(isp[c])'
	- It makes a difference (although a small one), what basic type you
	  choose for the isp array. On MX2 short is best, but char is nearly
	  as good.
	- You can make it smaller. The program does not need stdio. But
	  exit normally closes file descriptors, and therefore includes a
	  large part of the stdio stuff. Well, about 4 K on some systems.
	  If you know what your exit does, you can substitute a suitable
	  routine of your own. E.g. on MX2, exit calls _cleanup, which
	  only closes all open file descriptors. As I know that only the
	  standard descriptors are open at the end of the program, I can
	  write a _cleanup which only does a close on 0, 1, and 2.

	The savings that you get are almost invisible, they are not easily
	portable, but rather require a certain amount of research on part
	of the person doing the  porting. I chose not to fit the programs with
	options to adjust these things.

	There are still some DEBUG statements in the code. You get them
	if you set DEBUGFLAGS=-DDEBUG.

6.) Status

	This program is placed into the public domain.
	The Copyright Notice in COPYING applies.

Absorb, apply and enjoy,
		Michael Greim
