CURRENT NOTES                                            MARCH 1988




            C O M P U T E R   L A N G U A G E S   F O R   T H E   S T

                           Which One Is Right For You?

                             by John H. Marable, III


        There are many programming languages available for the Atari ST
    series of computers.  If you are a first time programmer, or if you have
    learned BASIC and are ready for something else, or even if you are a
    seasoned veteran programmer, programming language selection is a
    difficult, but important decision.  Each language has its advantages and
    disadvantages.  For any given programming problem, the solution
    difficulty depends greatly on the programming language used to implement
    the solution.

        This is not an attempt to review all of the programming languages
    available for the Atari ST.  It is a description of the types of
    programming languages available, and some of the advantages and
    disadvantages of each.  First, some definitions are required.

        This discussion is primarily directed at high level languages, such
    as Basic or Pascal as opposed to low level languages such as 68000
    assembler.  High level languages, in general, are easier to program and
    are more transportable.  (A program on one machine may be compiled and
    executed on another with little or no modification.)  Assembly languages
    are difficult to use and are not transportable.  There are several
    assemblers for the Motorola 68000 processor.  Most use the standard
    Motorola assembly language pneumonics.  No assembler for the 68000 family
    could use assembly code for any other processor.  Assembly language
    results in the smallest and fastest executable program possible.

    LANGUAGE IMPLEMENTATION

        There are three basic ways that a high level language may be
    implemented.  It may be interpreted, pseudo-compiled, or compiled into
    native code.  These are features of the implementation and not the
    language itself.  In fact, some languages are available in more than one
    form.  BASIC is available as both an interpreted (Atari Basic) and a
    compiled (LDW Basic) language.  GFA Basic is available in both
    interpreted and compiled versions.

        Interpreted languages are the most common and the most familiar.
    With an interpreted language, first the programming language must be
    loaded and then the source code program (the program as typed into the
    computer by the programmer) is loaded and run.  As the program runs, each
    line of the source code is examined by the interpreter, interpreted into
    the appropriate machine language instructions, and executed.  If a line
    is executed more than once, it must be interpreted again each time.  The
    computer, in most cases, spends more time interpreting statements than it
    does executing them.  Even a remark statement must be interpreted by the
    computer to determine that it is a remark requiring no further




                                   ST - 1 - ST




       CURRENT NOTES                                            MARCH 1988



    processing.  If an error occurs during execution of the program, in most
    cases, the interpretation stops, an error message code is displayed, and
    thesource code is displayed with the line in which the error occurred
    indicated.  Most interpreted languages have an integral text editor for
    entering, displaying and correcting the program source code.

        Pseudo compilers generate a pseudo code which executes machine
    language procedures contained in a runtime library.  This runtime library
    is loaded with the pseudo code.  Sometimes the runtime library is a
    separate file that must be loaded by the source code at run time.  This
    keeps source file size down, but the run time library must be on every
    disk that contains source files.  This results in relatively fast code
    compared to the interpreter, but even a very small program might be very
    large after compilation due to the overhead of the runtime library.  Some
    pseudo compilers will incrementally compile code from a source file to
    provide some of the advantages of an interpreter.  A pseudo compiler is
    easier to write than a native code compiler.  Once it is written, it can
    be easily ported to another completely different machine.  This has
    resulted in the popularity of the UCSD Pascal P System.  Versions of this
    are available for the Atari ST series.

        Native language compilers result in the fastest, and smallest
    executable code possible for a high-level language.  From the source
    code, directly executable machine language code is generated.  Usually,
    this code is linked with machine language procedures from a library.
    This library might be similar to the runtime library of the pseudo
    compilers, however, only the procedures that are needed are linked into
    the source code.  This allows the use of larger libraries and results in
    smaller code.  The procedures are generally placed in line with the
    native code rather than being called as a subroutine, resulting in faster
    program execution.

    COMPILERS VS INTERPRETERS

        In general, interpreters produce programs that are easy to write and
    debug, but are slow in execution, and require that the interpreter be
    loaded to execute the program.  Compilers provide fast code, but writing
    and debugging the code is time consuming and tedious.

        Source code is written using a text editor, then compiled with the
    compiler (requiring as many as four passes through the source code).  If
    the compiler runs with no errors detected, then the program must be
    linked to create an executable file (another one or two passes).  If
    errors are detected in the compile or link phase, then the editor is
    reloaded and the source code is corrected and the process is continued
    until no more fatal errors are detected.  Then the program can be run,
    but wait.  Now, the run time errors must be debugged.

        With an interpreter, run time errors usually stop program execution
    and return to the program editor where the error is indicated.  The
    programmer can examine the status of variables, correct the error and
    then run the program again until all of the run time errors are
    corrected.




                                   ST - 2 - ST




       CURRENT NOTES                                            MARCH 1988



        With a compiler, a run time error is more difficult to locate.  Often
    run time errors in compiled programs will simply cause the system to
    display bombs and crash, leaving the programmer no indication of where
    the error occurred and little indication of what the error was.  Utility
    programs such as a symbolic debugger can assist the programmer, or he can
    program in debugging code to trace the execution of the program to locate
    the source of the crash, but the process is slow and difficult.

        The ideal programming environment might be a combination of several
    things.  First, a syntax checking text editor is a real time saver.
    Syntactical errors, such as unmatched quotation marks are detected by the
    text editor where they can be corrected as they occur.  An interpreter is
    then used to detect and correct syntax errors that can't be detected by
    the text editor, such as a begin without an end or a gosub without a
    return.  The interpreter is also used to debug any run time errors.  When
    the program is completely debugged, a compiler which is completely
    compatible with the interpreter is used to compile the source code.

    LANGUAGE CATEGORIES

        A high level language can be classified in one of several
    categories:

    * Unstructured Languages (Basic, Fortran, Cobol, etc) were the first
      computer languages.  They remain the most popular today.

    * Structured languages (Pascal, C, Modula-2, Ada, etc.) are the languages
      used by most professional programmers developing new application
      software such as word processors, spreadsheets and databases.

    * Threaded interpretive languages (Forth, etc.) are relatively difficult
      to use, but are particularly useful for real-time applications such as
      robotics.  Forth programmers are a small, but loyal group.

    * Symbolic languages (Lisp, Logo, etc.) are the languages of artificial
      intelligence applications such as expert systems, although declarative
      languages (Prolog, etc.) are gaining popularity in this area.

    UNSTRUCTURED LANGUAGES

        Unstructured languages were the first languages to become popular on
    mainframe computers in the 1960's.  Unstructured languages are usually
    characterized by numbered lines, although modern implementations are
    getting away from that.

        Fortran (FORmula TRANslation) has been popular with scientists and
    engineers.  Its continued popularity is due to the availability of
    "number crunching" routines written over the years and ported from
    machine to machine.  There are several Fortran compilers available for
    the ST including AC Fortran, Prospero Pro Fortran and Philon Fortran.

        Basic (Beginner's All-purpose Symbolic Instruction Code), though
    largely spurned by professional programmers, is still the most popular




                                   ST - 3 - ST




       CURRENT NOTES                                            MARCH 1988



    programming language.  This is due primarily to the availability of basic
    interpreters for virtually every computer made.  Basic was first written
    at Dartmouth University in the 60's by Kemeny and Kurtz to be an easy to
    use, first language for students.  It was the first highlevel language to
    be implemented on micro computers by Bill Gates of Microsoft fame.  Since
    then, basic interpreters have been packaged with most microcomputers at
    purchase.  Some computers have basic interpreters installed in ROM inside
    the machine (the Atari XE series for example).  Basic is usually
    implemented as an interpreted language.  There are a wide variety of
    Basic interpreters available for the Atari ST including ST Basic (the one
    bundled with every ST), GFA Basic, Fast Basic, and Real Basic.  There are
    also several compiled versions of Basic available including GFA (Basic)
    Compiler, LDW Basic, Softworks Basic, Philon Fast/Basic M and True Basic
    (by Kemeny and Kurtz, the originators of Basic).

        The greatest advantage of Basic is its ease of use.  That is not
    surprising, since that was the original concept.  Critics of Basic say
    that because of its unstructured nature it leads to "spaghetti code",
    source code that rambles through the program.  Most Basic programmers
    make indiscriminate use of the "dreaded" GOTO statement.  This makes
    program flow hard to follow without a large number of comments or REM
    statements.  All variables are global, they are available everywhere in
    the program.  This leads to unwanted modification of variable values,
    called side effects.

        Modern implementations of Basic attempt to make it more structured.
    They contain program flow control statements that make the use of GOTO
    statements unnecessary.  They even allow the localization of variables
    and procedures with parameter passing.  With all of its socalled faults,
    most programmers will admit that it is easier to get a small program up
    and running in Basic than any other language.

        Cobol (COmmon Business Oriented Language) is another popular
    unstructured language.  It remains important today because many
    businesses are still using custom applications written in Cobol.  Cobol
    programmers are still in demand to maintain and update Cobol programs
    written 20 years ago.  As yet, there is no implementation of Cobol for
    the ST.  There are, however, Cobol implementations for the IBM PC and
    even 8 bit CP/M machines.  These implementations might run on the ST with
    the help of pc Ditto or the CP/M emulator.

    STRUCTURED LANGUAGES

        Structured programming languages are characterized by (1) block
    structure (Begin ... End), (2) absence of line numbers (always), (3)
    strong data typing (mandatory declaration of variables), (4) limited
    scope of variables, (5) parameter passing by value or by address, and (6)
    the five basic control structures: IF-THENELSE, FOR-NEXT, WHILE,
    REPEAT-UNTIL, and CASEOF.  Most allow the GOTO statement but restrict its
    range to within the block.  These features are worth discussing
    individually and comparing them to those of unstructured languages.

        Block structure consists of the use of subprograms, subroutines,




                                   ST - 4 - ST




       CURRENT NOTES                                            MARCH 1988



    functions and procedures.  A block structured program usually consists of
    a main program which does little else than call subprograms.  Subprograms
    then call other subprograms or even call themselves (recursive
    programming).  Each subprogram consists of a group of statements and
    should be functionally distinct.  This makes the structure of a program
    easier to follow.  A program block has delimiters that define its start
    and finish, the "begin" and "end" of Pascal or the terse { and } of C.  A
    block may occur within a program or procedure, such as if...then
    begin...end else begin...end.

        Program line numbers began when the primary means of input to a
    computer was punch cards.  Each card was numbered and held one line of
    code.  The numbers allowed the computer to determine the correct order of
    execution in case the cards were shuffled.  Punch cards were succeeded by
    teletype terminals.  Text editors on this hardware were line oriented.
    Line numbers were necessary to reference lines for editing or listing.
    Today's CRT terminals use full screen editors that reference lines with
    the cursor.  Now, the only reason that line numbers might be required is
    for targets for goto or gosub statements.  This is resolved in structured
    languages by the use of labels.  A label is an identifier used as a
    statement.  It might be declared as a label or it might be identified by
    a trailing colon.  Eliminating line numbers makes it convenient for the
    programmer.  Now it isn't necessary to renumber a program to make room
    for adding a few lines.  Commonly used functions or procedures can be
    copied into a program from a library without the need to renumber.

        Data typing is useful in that it makes it easier for a compiler to
    reserve memory for data.  It also helps avoid several types of
    programming errors.  Variable declarations are required for variables.
    Type declarations are required for complex data structures like arrays
    and records.  A structured language carefully checks the types of
    arguments to operators and functions.  It is not allowed to multiply a
    real by an integer variable.  Because these operations are sometimes
    necessary, transfer functions or casts are available to convert data
    types.  This requires the programmer to consider data typing more
    carefully and explicitly call the necessary transfer functions rather
    than trusting the implementation to make the decisions for him.

        The scope of a variable, in structured programming languages, is
    generally limited to the procedure in which it is declared.  These are
    known as local variables.  Variables which are declared outside of any
    procedure are global variables, available anywhere in the program.
    Controlling the scope of variables limits the occurrence of "side
    effects".  If you are in the habit of using "i" as a loop counter and you
    exit a loop to execute a subroutine which also uses "i" as a loop
    counter, you might return from the subroutine with an altered value of
    "i".  In a large program, this type of error can be very difficult to
    find.  In a structured language, the loop counter declared in the
    subroutine would be distinct from that declared in the main program,
    avoiding the side effects.

        Reusable procedures in a structured programming language can be saved
    in a library file and brought into the source file using an include




                                   ST - 5 - ST




       CURRENT NOTES                                            MARCH 1988



    directive to the compiler.  Several features of a structured programming
    language make this possible, absence of line numbers and local variables
    included.  Another feature that helps make this possible is parameter
    passing.

        Arguments to the function are passed to the subprogram when it is
    called.  There are two basic ways to pass the arguments, by value or by
    address.  When a parameter is passed by value, a copy of the variable is
    given to the subprogram which can be used without affecting the actual
    variable.  In some cases, when an effect on the actual variable is
    desired, or the variable is an array or other large data structure, too
    big to copy, the address of the actual variable is passed as a parameter.
    This allows the subprogram to access and modify the actual variable.

        The five basic control structures or variations of them are found in
    all structured programming languages.  Some languages have additional
    control structures, but the basic five are all that is required for
    structured.  Note the absence of the GOTO control structure.  Most
    languages have the GOTO because it is useful for exiting from nested
    loops and other limited applications.  Some languages have included
    control structures such as BREAK and EXIT to make the use of the GOTO the
    least desirable means of control in all cases.  Still, the controversy
    continues.

        Many implementations of structured programming languages are
    available for the Atari ST.  Almost all are compilers or pseudo
    compilers.  One structured interpreter is Alice, a Pascal interpreter.
    Compiled versions of Pascal include: Personal Pascal, TDI UCSD Pascal,
    Philon Pascal, Pecan UCSD Pascal, Metacomco Pascal and Prospero Pro
    Pascal.

        There are more C compilers than any other language available for the
    ST.  The original high-level language for the ST was Alcyon C sold by
    Atari as part of the developer's kit.  Other C's for the Atari include:
    Hippo C, Lattice C, Megamax C, GST C and Mark Williams C.  There is also
    a "shareware" C compiler available.

    MODERN STRUCTURED

        There exists two structured languages which can be considered modern
    structured.  They were developed only in the last few years and are just
    now becoming available.  They are Ada and Modula 2.  In addition to the
    features of a traditional structured language, they include such features
    as modular compilation and multitasking.  Ada was developed by a
    committee appointed by the Department of Defense.  Modula 2 was written
    by Nikolas Wirth, the author of Pascal.  Both languages have the same
    basic objectives, and both have the same basic features.  Because Ada was
    designed by a committee, it is complex and has virtually every feature
    that anyone on the committee desired. Compilers for Ada are huge and not
    generally available for microcomputers.  (One is available for MS-DOS,
    however, minimum recommended system requirements include an AT-class
    machine, a hard disk and megabytes of extended memory.  Compilation still
    takes forever.)  Modula 2 is compact and efficient.  Implementations of




                                   ST - 6 - ST




       CURRENT NOTES                                            MARCH 1988



    Modula 2 are available for most computers including TDI Modula 2 for the
    Atari ST.

    THREADED INTERPRETED LANGUAGES

        Threaded interpreted languages (TIL) use a different approach to
    programming.  The only popular TIL is Forth.  In Forth, you don't write
    programs, you define words.  TILs consist of primitives, words that have
    been defined as part of the kernel of the language.  New words are
    defined in terms of the primitives and/or previously defined words.
    Definitions are built until execution of a single word is analogous to
    executing a program.  Data in Forth is manipulated on a stack using
    postscript or reverse polish notation.  This notation is unfamiliar to
    most people, but is really more efficient.  In fact, most other languages
    convert pre-fix notation to post-fix notation internally prior to
    execution.  This fact is hidden from the programmer in other languages.

        Programming in Forth requires a different way of looking at a problem
    than other languages.  Forth keeps the programmer closer to the hardware
    than other high level languages.  Forth is fast and powerful.  These
    features reflect the intentions of the author of Forth, Charles Moore,
    when he wrote the original Forth to control the operation of telescopes
    in an observatory.  Forth programmers are dedicated to this way of
    thinking.  A national Forth Interest Group has developed and placed in
    the public domain a version of Forth known as FIG-Forth.  Implementations
    of this language are available int he public domain on virtually every
    small computer including the Atari ST.  Commercial implementations of
    Forth for the Atari include: 4XForth, Mach 1 Forth, H&D Forth and Abacus
    Forth/MT.

    SYMBOLIC LANGUAGES

        It can be seen that the purpose of computer languages is to hide the
    details of computers from the programmer.  Assembly languages uses
    pneumonics to hide the ones and zeros of machine language.  High-level
    languages hide the details of machine language.  Modern structured
    languages use modular compilation to hide the details of procedures from
    the main program.  Symbolic and declarative languages hide even more of
    the details from the programmer.  The ultimate programming language is
    known as natural language programming.  The combination of natural
    language programming with voice recognition and speech synthesis hardware
    may some day make possible a computer like HAL of 2001 fame.

        Symbolic languages tend to hide the programmer from the data.  While
    program control is similar to more conventional languages, data
    structures are different.  Data is represented by symbols which are
    actually pointers to the data.  Because many operations can be performed
    without regard to the data tye, the programmer doesn't have to consider
    this.  Symbols are allocated and bound dynamically.  This means that
    arrays or lists don't have to be dimensioned and can be composed of many
    different data types.

        Lisp (LISt Processing), the first symbolic language, is almost as old




                                   ST - 7 - ST




       CURRENT NOTES                                            MARCH 1988



    as Fortran.  Although Fortran is almost the same today as it was 20 years
    ago, Lisp has evolved significantly.  XLisp is a public domain version of
    Lisp written in C and available for the ST.  Metacomco Cambridge Lisp is
    also available.  Logo is a somewhat simplified subset of Lisp.  Digital
    Research Logo is provided with the Atari ST along with Basic.

    DECLARATIVE LANGUAGES

        Declarative languages have developed as a further effort to hide the
    details.  Prolog (PROgramming LOGic) is the most common declarative
    language.  A declarative language attempts to hide the details of the
    program structure from the programmer.  In essence, the programmer
    describes the relations between the objects or data in symbolic logic and
    then asks the program to solve a problem or answer a query.  This is the
    distinction between declarative languages in which you declare the
    problem and imperative languages in which you describe each step in
    solution of a problem.  Personal Prolog and a public domain version of
    Prolog are available for the Atari ST.  Included with the XLisp
    interpreter is a version of Prolog written in Lisp.

        Symbolic and declarative languages are almost always interpreted.
    Because of this and because of the dynamic nature of the languages, they
    are slow and require large amounts of memory.  They are, however, the
    languages of choice for artificial intelligence applications and the
    development of expert systems.

    YOUR CHOICE OF LANGUAGE

        There is an enormous variety of programming languages available for
    the Atari ST.  Selection of a language can be difficult.  Factors that
    should be considered include: cost (many are public domain or shareware),
    application (some languages are particularly well suited to specific
    applications), size (large programs need a langauge or implementation
    that allows modular compilation), and ease of use.  If speed or code size
    are important, consider assembler.  If only execution speed is important,
    use an interpreter.  Most programmers have several languages and use the
    one best suited for the current problem.  Some programmers, like me, just
    collect languages and implementations and enjoy programming in each of
    them for its own elegance.  Happy programming Atarists.


















                                   ST - 8 - ST


easy to
    use, first langu