CURRENT NOTES MARCH 1988 C O M P U T E R L A N G U A G E S F O R T H E S T Which One Is Right For You? by John H. Marable, III There are many programming languages available for the Atari ST series of computers. If you are a first time programmer, or if you have learned BASIC and are ready for something else, or even if you are a seasoned veteran programmer, programming language selection is a difficult, but important decision. Each language has its advantages and disadvantages. For any given programming problem, the solution difficulty depends greatly on the programming language used to implement the solution. This is not an attempt to review all of the programming languages available for the Atari ST. It is a description of the types of programming languages available, and some of the advantages and disadvantages of each. First, some definitions are required. This discussion is primarily directed at high level languages, such as Basic or Pascal as opposed to low level languages such as 68000 assembler. High level languages, in general, are easier to program and are more transportable. (A program on one machine may be compiled and executed on another with little or no modification.) Assembly languages are difficult to use and are not transportable. There are several assemblers for the Motorola 68000 processor. Most use the standard Motorola assembly language pneumonics. No assembler for the 68000 family could use assembly code for any other processor. Assembly language results in the smallest and fastest executable program possible. LANGUAGE IMPLEMENTATION There are three basic ways that a high level language may be implemented. It may be interpreted, pseudo-compiled, or compiled into native code. These are features of the implementation and not the language itself. In fact, some languages are available in more than one form. BASIC is available as both an interpreted (Atari Basic) and a compiled (LDW Basic) language. GFA Basic is available in both interpreted and compiled versions. Interpreted languages are the most common and the most familiar. With an interpreted language, first the programming language must be loaded and then the source code program (the program as typed into the computer by the programmer) is loaded and run. As the program runs, each line of the source code is examined by the interpreter, interpreted into the appropriate machine language instructions, and executed. If a line is executed more than once, it must be interpreted again each time. The computer, in most cases, spends more time interpreting statements than it does executing them. Even a remark statement must be interpreted by the computer to determine that it is a remark requiring no further ST - 1 - ST CURRENT NOTES MARCH 1988 processing. If an error occurs during execution of the program, in most cases, the interpretation stops, an error message code is displayed, and thesource code is displayed with the line in which the error occurred indicated. Most interpreted languages have an integral text editor for entering, displaying and correcting the program source code. Pseudo compilers generate a pseudo code which executes machine language procedures contained in a runtime library. This runtime library is loaded with the pseudo code. Sometimes the runtime library is a separate file that must be loaded by the source code at run time. This keeps source file size down, but the run time library must be on every disk that contains source files. This results in relatively fast code compared to the interpreter, but even a very small program might be very large after compilation due to the overhead of the runtime library. Some pseudo compilers will incrementally compile code from a source file to provide some of the advantages of an interpreter. A pseudo compiler is easier to write than a native code compiler. Once it is written, it can be easily ported to another completely different machine. This has resulted in the popularity of the UCSD Pascal P System. Versions of this are available for the Atari ST series. Native language compilers result in the fastest, and smallest executable code possible for a high-level language. From the source code, directly executable machine language code is generated. Usually, this code is linked with machine language procedures from a library. This library might be similar to the runtime library of the pseudo compilers, however, only the procedures that are needed are linked into the source code. This allows the use of larger libraries and results in smaller code. The procedures are generally placed in line with the native code rather than being called as a subroutine, resulting in faster program execution. COMPILERS VS INTERPRETERS In general, interpreters produce programs that are easy to write and debug, but are slow in execution, and require that the interpreter be loaded to execute the program. Compilers provide fast code, but writing and debugging the code is time consuming and tedious. Source code is written using a text editor, then compiled with the compiler (requiring as many as four passes through the source code). If the compiler runs with no errors detected, then the program must be linked to create an executable file (another one or two passes). If errors are detected in the compile or link phase, then the editor is reloaded and the source code is corrected and the process is continued until no more fatal errors are detected. Then the program can be run, but wait. Now, the run time errors must be debugged. With an interpreter, run time errors usually stop program execution and return to the program editor where the error is indicated. The programmer can examine the status of variables, correct the error and then run the program again until all of the run time errors are corrected. ST - 2 - ST CURRENT NOTES MARCH 1988 With a compiler, a run time error is more difficult to locate. Often run time errors in compiled programs will simply cause the system to display bombs and crash, leaving the programmer no indication of where the error occurred and little indication of what the error was. Utility programs such as a symbolic debugger can assist the programmer, or he can program in debugging code to trace the execution of the program to locate the source of the crash, but the process is slow and difficult. The ideal programming environment might be a combination of several things. First, a syntax checking text editor is a real time saver. Syntactical errors, such as unmatched quotation marks are detected by the text editor where they can be corrected as they occur. An interpreter is then used to detect and correct syntax errors that can't be detected by the text editor, such as a begin without an end or a gosub without a return. The interpreter is also used to debug any run time errors. When the program is completely debugged, a compiler which is completely compatible with the interpreter is used to compile the source code. LANGUAGE CATEGORIES A high level language can be classified in one of several categories: * Unstructured Languages (Basic, Fortran, Cobol, etc) were the first computer languages. They remain the most popular today. * Structured languages (Pascal, C, Modula-2, Ada, etc.) are the languages used by most professional programmers developing new application software such as word processors, spreadsheets and databases. * Threaded interpretive languages (Forth, etc.) are relatively difficult to use, but are particularly useful for real-time applications such as robotics. Forth programmers are a small, but loyal group. * Symbolic languages (Lisp, Logo, etc.) are the languages of artificial intelligence applications such as expert systems, although declarative languages (Prolog, etc.) are gaining popularity in this area. UNSTRUCTURED LANGUAGES Unstructured languages were the first languages to become popular on mainframe computers in the 1960's. Unstructured languages are usually characterized by numbered lines, although modern implementations are getting away from that. Fortran (FORmula TRANslation) has been popular with scientists and engineers. Its continued popularity is due to the availability of "number crunching" routines written over the years and ported from machine to machine. There are several Fortran compilers available for the ST including AC Fortran, Prospero Pro Fortran and Philon Fortran. Basic (Beginner's All-purpose Symbolic Instruction Code), though largely spurned by professional programmers, is still the most popular ST - 3 - ST CURRENT NOTES MARCH 1988 programming language. This is due primarily to the availability of basic interpreters for virtually every computer made. Basic was first written at Dartmouth University in the 60's by Kemeny and Kurtz to be an easy to use, first language for students. It was the first highlevel language to be implemented on micro computers by Bill Gates of Microsoft fame. Since then, basic interpreters have been packaged with most microcomputers at purchase. Some computers have basic interpreters installed in ROM inside the machine (the Atari XE series for example). Basic is usually implemented as an interpreted language. There are a wide variety of Basic interpreters available for the Atari ST including ST Basic (the one bundled with every ST), GFA Basic, Fast Basic, and Real Basic. There are also several compiled versions of Basic available including GFA (Basic) Compiler, LDW Basic, Softworks Basic, Philon Fast/Basic M and True Basic (by Kemeny and Kurtz, the originators of Basic). The greatest advantage of Basic is its ease of use. That is not surprising, since that was the original concept. Critics of Basic say that because of its unstructured nature it leads to "spaghetti code", source code that rambles through the program. Most Basic programmers make indiscriminate use of the "dreaded" GOTO statement. This makes program flow hard to follow without a large number of comments or REM statements. All variables are global, they are available everywhere in the program. This leads to unwanted modification of variable values, called side effects. Modern implementations of Basic attempt to make it more structured. They contain program flow control statements that make the use of GOTO statements unnecessary. They even allow the localization of variables and procedures with parameter passing. With all of its socalled faults, most programmers will admit that it is easier to get a small program up and running in Basic than any other language. Cobol (COmmon Business Oriented Language) is another popular unstructured language. It remains important today because many businesses are still using custom applications written in Cobol. Cobol programmers are still in demand to maintain and update Cobol programs written 20 years ago. As yet, there is no implementation of Cobol for the ST. There are, however, Cobol implementations for the IBM PC and even 8 bit CP/M machines. These implementations might run on the ST with the help of pc Ditto or the CP/M emulator. STRUCTURED LANGUAGES Structured programming languages are characterized by (1) block structure (Begin ... End), (2) absence of line numbers (always), (3) strong data typing (mandatory declaration of variables), (4) limited scope of variables, (5) parameter passing by value or by address, and (6) the five basic control structures: IF-THENELSE, FOR-NEXT, WHILE, REPEAT-UNTIL, and CASEOF. Most allow the GOTO statement but restrict its range to within the block. These features are worth discussing individually and comparing them to those of unstructured languages. Block structure consists of the use of subprograms, subroutines, ST - 4 - ST CURRENT NOTES MARCH 1988 functions and procedures. A block structured program usually consists of a main program which does little else than call subprograms. Subprograms then call other subprograms or even call themselves (recursive programming). Each subprogram consists of a group of statements and should be functionally distinct. This makes the structure of a program easier to follow. A program block has delimiters that define its start and finish, the "begin" and "end" of Pascal or the terse { and } of C. A block may occur within a program or procedure, such as if...then begin...end else begin...end. Program line numbers began when the primary means of input to a computer was punch cards. Each card was numbered and held one line of code. The numbers allowed the computer to determine the correct order of execution in case the cards were shuffled. Punch cards were succeeded by teletype terminals. Text editors on this hardware were line oriented. Line numbers were necessary to reference lines for editing or listing. Today's CRT terminals use full screen editors that reference lines with the cursor. Now, the only reason that line numbers might be required is for targets for goto or gosub statements. This is resolved in structured languages by the use of labels. A label is an identifier used as a statement. It might be declared as a label or it might be identified by a trailing colon. Eliminating line numbers makes it convenient for the programmer. Now it isn't necessary to renumber a program to make room for adding a few lines. Commonly used functions or procedures can be copied into a program from a library without the need to renumber. Data typing is useful in that it makes it easier for a compiler to reserve memory for data. It also helps avoid several types of programming errors. Variable declarations are required for variables. Type declarations are required for complex data structures like arrays and records. A structured language carefully checks the types of arguments to operators and functions. It is not allowed to multiply a real by an integer variable. Because these operations are sometimes necessary, transfer functions or casts are available to convert data types. This requires the programmer to consider data typing more carefully and explicitly call the necessary transfer functions rather than trusting the implementation to make the decisions for him. The scope of a variable, in structured programming languages, is generally limited to the procedure in which it is declared. These are known as local variables. Variables which are declared outside of any procedure are global variables, available anywhere in the program. Controlling the scope of variables limits the occurrence of "side effects". If you are in the habit of using "i" as a loop counter and you exit a loop to execute a subroutine which also uses "i" as a loop counter, you might return from the subroutine with an altered value of "i". In a large program, this type of error can be very difficult to find. In a structured language, the loop counter declared in the subroutine would be distinct from that declared in the main program, avoiding the side effects. Reusable procedures in a structured programming language can be saved in a library file and brought into the source file using an include ST - 5 - ST CURRENT NOTES MARCH 1988 directive to the compiler. Several features of a structured programming language make this possible, absence of line numbers and local variables included. Another feature that helps make this possible is parameter passing. Arguments to the function are passed to the subprogram when it is called. There are two basic ways to pass the arguments, by value or by address. When a parameter is passed by value, a copy of the variable is given to the subprogram which can be used without affecting the actual variable. In some cases, when an effect on the actual variable is desired, or the variable is an array or other large data structure, too big to copy, the address of the actual variable is passed as a parameter. This allows the subprogram to access and modify the actual variable. The five basic control structures or variations of them are found in all structured programming languages. Some languages have additional control structures, but the basic five are all that is required for structured. Note the absence of the GOTO control structure. Most languages have the GOTO because it is useful for exiting from nested loops and other limited applications. Some languages have included control structures such as BREAK and EXIT to make the use of the GOTO the least desirable means of control in all cases. Still, the controversy continues. Many implementations of structured programming languages are available for the Atari ST. Almost all are compilers or pseudo compilers. One structured interpreter is Alice, a Pascal interpreter. Compiled versions of Pascal include: Personal Pascal, TDI UCSD Pascal, Philon Pascal, Pecan UCSD Pascal, Metacomco Pascal and Prospero Pro Pascal. There are more C compilers than any other language available for the ST. The original high-level language for the ST was Alcyon C sold by Atari as part of the developer's kit. Other C's for the Atari include: Hippo C, Lattice C, Megamax C, GST C and Mark Williams C. There is also a "shareware" C compiler available. MODERN STRUCTURED There exists two structured languages which can be considered modern structured. They were developed only in the last few years and are just now becoming available. They are Ada and Modula 2. In addition to the features of a traditional structured language, they include such features as modular compilation and multitasking. Ada was developed by a committee appointed by the Department of Defense. Modula 2 was written by Nikolas Wirth, the author of Pascal. Both languages have the same basic objectives, and both have the same basic features. Because Ada was designed by a committee, it is complex and has virtually every feature that anyone on the committee desired. Compilers for Ada are huge and not generally available for microcomputers. (One is available for MS-DOS, however, minimum recommended system requirements include an AT-class machine, a hard disk and megabytes of extended memory. Compilation still takes forever.) Modula 2 is compact and efficient. Implementations of ST - 6 - ST CURRENT NOTES MARCH 1988 Modula 2 are available for most computers including TDI Modula 2 for the Atari ST. THREADED INTERPRETED LANGUAGES Threaded interpreted languages (TIL) use a different approach to programming. The only popular TIL is Forth. In Forth, you don't write programs, you define words. TILs consist of primitives, words that have been defined as part of the kernel of the language. New words are defined in terms of the primitives and/or previously defined words. Definitions are built until execution of a single word is analogous to executing a program. Data in Forth is manipulated on a stack using postscript or reverse polish notation. This notation is unfamiliar to most people, but is really more efficient. In fact, most other languages convert pre-fix notation to post-fix notation internally prior to execution. This fact is hidden from the programmer in other languages. Programming in Forth requires a different way of looking at a problem than other languages. Forth keeps the programmer closer to the hardware than other high level languages. Forth is fast and powerful. These features reflect the intentions of the author of Forth, Charles Moore, when he wrote the original Forth to control the operation of telescopes in an observatory. Forth programmers are dedicated to this way of thinking. A national Forth Interest Group has developed and placed in the public domain a version of Forth known as FIG-Forth. Implementations of this language are available int he public domain on virtually every small computer including the Atari ST. Commercial implementations of Forth for the Atari include: 4XForth, Mach 1 Forth, H&D Forth and Abacus Forth/MT. SYMBOLIC LANGUAGES It can be seen that the purpose of computer languages is to hide the details of computers from the programmer. Assembly languages uses pneumonics to hide the ones and zeros of machine language. High-level languages hide the details of machine language. Modern structured languages use modular compilation to hide the details of procedures from the main program. Symbolic and declarative languages hide even more of the details from the programmer. The ultimate programming language is known as natural language programming. The combination of natural language programming with voice recognition and speech synthesis hardware may some day make possible a computer like HAL of 2001 fame. Symbolic languages tend to hide the programmer from the data. While program control is similar to more conventional languages, data structures are different. Data is represented by symbols which are actually pointers to the data. Because many operations can be performed without regard to the data tye, the programmer doesn't have to consider this. Symbols are allocated and bound dynamically. This means that arrays or lists don't have to be dimensioned and can be composed of many different data types. Lisp (LISt Processing), the first symbolic language, is almost as old ST - 7 - ST CURRENT NOTES MARCH 1988 as Fortran. Although Fortran is almost the same today as it was 20 years ago, Lisp has evolved significantly. XLisp is a public domain version of Lisp written in C and available for the ST. Metacomco Cambridge Lisp is also available. Logo is a somewhat simplified subset of Lisp. Digital Research Logo is provided with the Atari ST along with Basic. DECLARATIVE LANGUAGES Declarative languages have developed as a further effort to hide the details. Prolog (PROgramming LOGic) is the most common declarative language. A declarative language attempts to hide the details of the program structure from the programmer. In essence, the programmer describes the relations between the objects or data in symbolic logic and then asks the program to solve a problem or answer a query. This is the distinction between declarative languages in which you declare the problem and imperative languages in which you describe each step in solution of a problem. Personal Prolog and a public domain version of Prolog are available for the Atari ST. Included with the XLisp interpreter is a version of Prolog written in Lisp. Symbolic and declarative languages are almost always interpreted. Because of this and because of the dynamic nature of the languages, they are slow and require large amounts of memory. They are, however, the languages of choice for artificial intelligence applications and the development of expert systems. YOUR CHOICE OF LANGUAGE There is an enormous variety of programming languages available for the Atari ST. Selection of a language can be difficult. Factors that should be considered include: cost (many are public domain or shareware), application (some languages are particularly well suited to specific applications), size (large programs need a langauge or implementation that allows modular compilation), and ease of use. If speed or code size are important, consider assembler. If only execution speed is important, use an interpreter. Most programmers have several languages and use the one best suited for the current problem. Some programmers, like me, just collect languages and implementations and enjoy programming in each of them for its own elegance. Happy programming Atarists. ST - 8 - ST easy to use, first langu