{ SORT: merge and sort multiple text files, replaces DOS SORT }

{ Copyright, 1988, 1989, by J. W. Rider }

{ Syntax :

  SORT [options] [<unsorted-file-spec> ... ]

  Where available options are:

        "/r" reverses the sense of the sort,

        "/"+# sorts the lines from the data in column #
  -- a second # defines the last column of the key field.
  -- subsequent #'s are ignored.

        "/b" ignores leading blanks (spaces, tabs) in determining
             the key.

        "/c" makes the sort case-insensitive 'a'='A',

        "/d" "dictionary" vs "ascii" sort, alphanumerics count only

        "/f" interpret column numbers as "awk" field numbers,
  -- does not automatically assume "/b"

        "/h" displays help message rather than sort input.

        "/k" outputs only the key not the whole line.

        "/n" sorts the lines numerically vice alphabetically.
  -- "/n" automatically assumes "/b"; in fact, "/n" will search
  -- an entire key field for any hint of a numeric.  "DOS1," "DOS2",
  -- "DOS3.3" will all be correctly sorted.

        "/t"C makes "C" a field delimiter vice blanks.  To include
             blanks, use "/t" without any character.

        "/u" eliminates multiple copies of identical lines
  -- "/u" might not work correctly if keys other than the whole
  -- original line are specified: "/+#","/c","/n","/b"

  If first filename is missing or is '-', reads from standard input.
  Writes sorted lines to standard output.

  The first two options, "/r" and "/"+#, and default use of standard
  input and output are provided for syntax compatibility with MSDOS
  SORT.  The other options and command line file-naming are extensions that
  are inspired from Unix implementations of SORT. }

{ If the heap is not large enough to completely hold the sorted file,
  or if there is a problem with input/output file names, then
  SORT displays an error message to 'CON' and returns ERRORLEVEL 1
  to the parent process. }

{ Even if there is not enough room in the heap to sort the file in
  memory, sort tries to provide a partial sorting of the file.  The
  output can be further sorted by sort until the file is completely
  sorted. }

{$A+,B-,D+,E-,F-,I-,L-,N-,O-,R-,S-,V-}
{$M 16384,0,655360}

program sort;

uses dos; { added to facilitate wildcards in filenames }

{ In fact, my personal SORT utility "uses" considerably more units than
  what is indicated here.  However, my units are not standard. I would
  not expect the average user to have ever heard of them. Nor would I expect
  the advanced user to even *want* to use them.  Instead, I have extracted
  the components that SORT references and 'included' them instead. }

const grain = 16 ; { heap granularity; usage here requires power of two }
      defaultcase = true; { some SORTs start out with different
          case sensitivity.  Change it here. TRUE means case sensitive. }

{ Granularity of heap is set to 'grain' bytes, see Turbo Pascal
  Reference Guide, pg 199.

  SYSTEM.FREEMIN is also set to 16000 in SORTINIT.  No investigation
  has been made as to whether or not these values are optimal.
  Failure to set FREEMIN large enough will cause a run-time error
  for files that are too large. }

{$I sort.typ }     { Defines the binary tree records. }
{$I sort.var }     { Defines all global variables.  Includes
                     procedure "sortinit".}


{ General functions: Some of these functions are written in such a way as
  to be generally useful. }

{$I anstr.fun }    { Strip all non-alphanumerics from string }
{$I posnum.fun }   { Searches a string for a numeric substring. }
{$I bval.fun }     { Extract a number from a string }
{$I errexit.inc }  { Type message; Set ErrorCode on exit. }
{$I findfld.fun }  { Find starting pos for "awk" field in string }
{$I heapmem.fun }  { Some "suggested" mods to GetMem and FreeMem. }
{$I isatty.fun }   { Determines whether input has been redirected. }
{$I iswild.fun }   { Does a string contain either "*" or "?" }
{$I lcase.fun }    { Changes all upper case chars in string to lower }


{ Special functions: These functions are unique to SORT and have
  questionable utility outside of this package. }

{$I btsort.inc }   { Binary tree manipulation, output included }
{$I sortargs.inc } { Handle option switches from command line. }
{$I sorthlp.fun }  { Displays the Sort Help Message. }
{$I stdinhdr.inc } { Prompt user for data if input not redirected }


{ Process1file: is what it is all about.  The text variable "fi" has
  been previously assigned to same named file.  This procedure starts
  from the beginning of the file, reads each line and stores it into the
  binary tree structure until no more lines can be read.  The file "fi" is
  closed when we are done. }

procedure process1file; begin reset(fi);
while not eof(fi) do begin readln(fi,s); storeln(s); end;
close(fi); end;


{ MAIN: Most of this is abstracted from studies that I have made concerning
  a standard method of handling multiple-arguments and filenames in
  standard DOS filters. The general approach to decompose the large task
  of merging multiple files into a series of single file tasks. This
  skeleton can be modified to handle arguments in another manner. }

begin { sort main }

{ SORT title line: If there are no arguments on the command line and
  standard input has not been redirected from a file, then we assume
  that the user may not be completely certain as the proper method of
  using SORT.  The program provides a little message that indicates
  how further help may be obtained.  The message is not sent to
  standard output; the user will not be incovenienced if he really
  does know what he doing. }

if (paramcount=0) and isatty(0) then begin
   assign(fe,'CON'); rewrite(fe); writeln(fe,
' SORT: Copyright 1988,1989, by J. W. Rider, use "SORT /h" for help.');
   close(fe); end;


{ SORT INITialization: Initializing variables in this manner is
  time-consuming, but the cost is trivial for sorting files of even
  moderate size.  My goal for the final program is to have these
  variables as typed constants. }

sortinit;


{ Get command line ARGUMENTS: This version of SORT requires all option
  switches be positioned before any file names.  Once filenames have
  started being read, no options can be changed. }

arguments;


{ Needs Help?: If the user specifies that help is desired or if an error
  is made in the command line option switches, then just list the help
  page and quit the program without error. }

if helponly then begin helpmsg; close(output); close(fi); exit; end;


{ Key fields: If the user has not specified a subset of cols for the
  key, use the whole line. }

if keycol=0 then keycol:=1; if keycol2=0 then keycol2:=255;


{ No file names: If not input files are specified, use standard input
  as the source }

if parmcount>paramcount then begin

   { If input has not be redirected, provide a little more instruction
     on how to get the sort to work correctly.  In any case, just handle
     standard input like it was any other file. }

   stdinhdr; assign(fi,''); process1file; end


{ otherwise merge in each file listed on the command line }

else for i:=parmcount to paramcount do

   { Use standard input if the command line filename is "-". }
   if paramstr(i)='-' then begin stdinhdr; assign(fi,''); process1file; end

   { Otherwise, open each file individually. }
   else begin

          { get complete file name and extension for entry }
          fstr:=fexpand(paramstr(i)); fsplit(fstr,d,n,x);

          { If a directory is referenced, merge all included files }
          if (n='') and (x='') then fstr:=fstr+'*.*';

          { Search for all reasonable files.  Be sure to include
            directories. }
          findfirst(fstr,directory+readonly+archive,sr);

          { My preference for SORT was to ignore any attempt
            by the user to sort non-existant files.  (This could
            be modified to detect such attempts.  I just decided that
            there was little that my program could tell me about what
            files I wanted to sort.) }
          while doserror=0 do begin
             assign(fi,d+sr.name);
             if (sr.attr and directory)<>0 then

                { Search subdirectories only if they are specifically
                  named.  Do not perform recursive subdir searches. }
                if not iswild(fstr) then begin

                   { This time through, it is safe to ignore directories }
                   fstr:=fstr+'\*.*';
                   fsplit(fstr,d,n,x); findfirst(fstr,readonly+archive,sr);
                   while doserror=0 do begin assign(fi,d+sr.name);
                         process1file; findnext(sr); end; end

                { Ignore ambiguous directories }
                else findnext(sr)

             { Merge all non-directory files found. }
             else begin process1file; findnext(sr); end; end end;


{ after all files have been read, write the sorted tree out }

retrieveln; close(output); { IMPORTANT!: close output before exit }

{ If the program is unable to guarantee that the output has been correctly
  sorted, an message is generated to the console and a DOS error return is
  invoked. At worse, the output will be "partially" sorted. (Whatever
  *that* might mean.) }

if sorterror then errexit('Output may not be completely sorted.');

end. {program sort}

