Copyright 1984 by ABComputing May 15, 1984 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º Introduction to C º º º º by º º º º Ron Watson º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Introduction ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Welcome to the C tutorial. This column will provide instruction in how to use this powerful language. It is aimed at experienced programmers and so will not cover programming principles. This column is essentially a diary of the author's experience while studying the language, using several sources. Frequent reference will be made to "The C Programming Language" by Kernighan and Ritchie (K&R) which is the C "Bible." ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Good and Bad News ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ C was developed as a tool for systems programmers and, unlike most other languages, was defined by experienced programmers for their own use. This is both good and bad news. The good news is that the language has very few limitations; the bad news is that in the attempt to create a very terse language (systems programmers do not like to type) they evolved some syntax that can become painfully obscure. Like nuclear energy, the results can be very useful or downright destructive, depending on who is using it for what purpose. Usually, there will be several ways to express the same algorithm, including two or three alternatives that generate precisely the same object code. This freedom of expression can mean great fun for the original programmer and many sleepless nights for the poor slob who has to change it some years later. Because of the immense freedom and power of the language, the responsible programmer will exercise discipline to establish and maintain consistency of style and method or his programs will be too expensive to maintain. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Portability ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ One reason for C's recent rise in popularity is its supposed ability to be converted for use on new machines, a feature known as "portability." This belief may be so widespread because those making the judgment are comparing C to assembly language. Another reason may be the generous choice of C compilers available for nearly every cpu/operating system combination on the market. The C language was intended to be machine-independent and toward that end contains no definition for input/output commands; they are implemented in functions, the detailed operations of which can be changed without affecting the operation of the main program. That was the plan, anyway. In fact, there can be considerable difficulty transporting a C program between two different compilers implemented on the same machine under the same operating system. The methods used by various software companies to implement "standard" compilers are very different. So be warned. If you want to write programs that are easily moved between machines, choose a compiler that is available on all the machines you want to use; make sure the compiler chosen adheres to the standards defined in K&R. Also, keep all input/output operations in independently defined and compiled functions, especially those that use the display and/or keyboard. With these caveats and a little luck you might be able to "port" your program to the next generation of desktop computers without rewriting 80% of the code. The examples used in this column have been compiled and tested using the Lattice C compiler. This compiler meets the restrictions mentioned above but is not the only one that does. I will try to indicate any potential portability problems with each example. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ The Simplest C Program ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ We begin our study of C with an example of the simplest possible C program, one that does nothing at all. (The example was originally provided to the Department of Defense by a contractor at a cost of slightly less than 1.3 million dollars.) main() { } Small as it is, it demonstrates several characteristics of the language that must be understood. The word "main" is the name of the program, and every complete C program must contain one and only one of these; the language is case-sensitive, so "main" must be in lower case. The "()" can contain the definition for a list of parameters passed by the operating system and must appear even if the list is null. The left and right braces "{ }" mark the beginning and end of a block of code. Spaces, tabs, carriage returns, and line feeds, collectively known as "white space," are ignored, so the example could be written as: main(){} ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Introduction to Printf ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ A more complicated example - one that actually does something - is presented next. This example adds one line of code to the first example, a function call, to print "hello, world" on the standard output unit. (This is the first example given in the K&R tutorial.) main() { printf("hello, world\n"); } The additional line introduces several new features of the language. Notice the semicolon at the end of the line; this character is used to mark the end of each statement. Of particular importance is the method of calling functions. There is no verb such as "call" or "perform" or "gosub" used to invoke a subroutine or external procedure. Instead, merely write the name of the routine, followed by left and right parentheses. The routine being used, "printf," is a standard output routine provided with every C compiler. In the form shown, it will display the string "hello, world", followed by a "new line" on the standard output device. This example demonstrates one of the design concepts of C, namely, the omission from the language of any input/output keywords; all such operations are accomplished with function calls such as printf. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ The Escape Character ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The "\" is called the "escape character," and is used in various circumstances to facilitate the entry of special characters into a program. We will discuss the "\n" sequence later, as it is potentially troublesome when moving between machines. For now, it is sufficient to say that it will yield a carriage-return, line-feed sequence on the PC. If it were omitted from the example, the cursor would remain after the "d" in "hello, world" instead of going to the first column of the following line. Observe the use of the double-quote character to delimit a string. Although this example does exactly what it appears to do, the code generated is somewhat different than an intuitive examination might conclude. It is not important at this point in the discussion; just bear in mind that what you see is not always what you get. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Test Your Compiler ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ If you are reading this with the intention of learning how to program in C, I suggest that you stop at this point and run the above example through your compiler. Trivial as it appears, it gives the opportunity to: be sure you know how to use your compiler, check that it is installed correctly, that the output from your text editor is compatible with your compiler, and that you really do understand the principles demonstrated. If your compiler is not capable of producing the expected results from the example, as shown, you should find another compiler. If you are using DOS 2.0, try to re-direct the output from the example to the printer or a disk file. While not absolutely critical, re-direction can be a very useful debugging aid. By way of experiment, you might create a few deliberate errors to see how your compiler reacts. Leave off the terminal semicolon or one of the quotation marks; throw in an extra brace, or spell "main" in capital letters. To get a better understanding of the "printf" function, restate the program as: main() { printf("hello"); printf(", world"); printf("\n"); } This should give the same results as the previous example. Now that you understand the basic structure of a C program, we can proceed to discuss data representation and variables. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Data Representation ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The language requires that every variable used must first be declared, and most compilers require the declarations to appear in the source program before the first executable statement. The permissible types will vary somewhat between compilers, but the types "char" and "int" are always available. The adjectives long, short, and unsigned can usually be applied to int to modify its definition. The types float and double are also available, but not with every compiler. The lengths of each data type are allowed to vary according to the language definition, depending on the machine being used. In 8086/8088 implementations: a "char" is 8 bits or 1 byte, "int" is 16 bits or 2 bytes, "float," when available, is 32 bits, and "double" is 64 bits. A long int is 32 bits and a long float should be the same as a double. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Integer Variables ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The following program demonstrates how an integer variable is declared and printed: main() { int izero, imin, imax; izero = 0; imin = -32768; imax = 32767; printf("Integers = %d %d %d\n",izero,imin,imax); } There are several new concepts here. Study the manner in which variables are declared in the first line. The keyword "int" is followed by the names of variables to be defined, separated by commas. This line is terminated with a semicolon. The statement on this line could span three or four lines, with white spaces placed anywhere desired. The next three lines are used to assign values to the variables. The "=" is the assignment operator which requests that an evaluation of the right side of the equation be stored in the variable named on the left. The "%d", in the string passed to printf, is a format code that tells printf how to display the variables izero, imin, and imax. This example should begin to give you a hint as to how powerful the printf function can be. The first parameter, the quoted string, contains data to be printed along with control information used to format variables. It works somewhat like a Fortran or PL/I format statement, though simpler. The "%" is used to indicate the presence of a format description. The format string is followed by a list of variable names to be edited into the output. The example above shows the format string with three "%d"s and three variable names; in general, there should be as many "%" codes in the string as there are variable names following it. The "%' code and the variables names are in one-to-one correspondence. The "d" format code will cause an integer to be edited into the output in the position occupied by the "%d". The first "%d" is associated with the variable "izero", the second "%d" with "imin," and the last "%d" with "imax." More examples of the printf statement will be given later. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Character Variables ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Let's expand on the last program to demonstrate the use of "char," or character variables: main() { int izero, imin, imax; char cha, chb, chc; izero = 0; imin = -32768; imax = 32767; cha = 'a'; chb = 'b'; chc = 'c'; printf("Integers = %d %d %d\n",izero,imin,imax); printf("Characters = %c %c %c",cha,chb,chc); } The most important thing to note in this example is that a char type variable can contain only one character; there is no way to define a character string variable except as an array of single characters. Notice that whereas the double quote (") has been used to define a string for the printf function, the single quote ('), or apostrophe, is used to assign values to the character variables. The distinction is important. The double quote denotes a character string terminated by a binary zero; the single quote can only denote a single character. You can request a character string constant by using double quotes, but there is no way to define a character string variable except as an array of one-character elements. This turns out not to be as silly as it sounds, but it takes some getting used to, particularly if you are accustomed to PL/I or Basic. Unlike those languages, C does not maintain information about string length but rather defines all strings as having a variable length terminated by a binary zero. This gives the programmer considerable control over more complicated string manipulations at the expense of complicating the simpler operations. As we will see, there is no direct string assignment operator; the operation must be performed by a function which is provided by any decent compiler as part of its library. At this point, recompile this program to be certain you understand the principles involved. After it compiles error-free, purposely introduce errors, such as omitting a "%d" or "%c" from the printf format string. For more exciting results, insert an extra format code in the string, so there are more "%d" characters than variable names following the format string. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Conclusion ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ In my next column, program control and looping is discussed. This will provide the additional tools we need to study string processing in C. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ File Name: ÛÛ c1.txt ÛÛ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ