Day 6
So far, you've learned to read input from the standard input
file, which stores data that is entered from the keyboard. You've also
learned how to write to the standard output file, which sends
data to your screen. In today's lesson, you'll learn the following:
Before you can read from or write to a file, you must first
open the file. This operation tells the operating system that you
are currently accessing the file and that no one else can change
it while you are working with it. To open a file, call the
library function open.
The syntax for the open library function is
open (filevar, filename);
When you call open, you must supply two arguments:
The first argument passed to open is the name that the
Perl interpreter uses to refer to the file. This name is also
known as the file variable (or the file handle).
A file-variable name can be any sequence of letters, digits,
and underscores, as long as the first character is a letter.
The following are legal file-variable names:
filename MY_NAME NAME2 A_REALLY_LONG_FILE_VARIABLE_NAME
The following are not legal file-variable names:
1NAME A.FILE.NAME _ANOTHERNAME if
if is not a valid file-variable name because it has
another meaning: as you've seen, it indicates the start of an if
statement. Words such as if that have special meanings in
Perl are known as reserved words and cannot be used as
names.
Tip: It's a good idea to use all uppercase letters for your file-variable names. This makes it easier to distinguish file-variable names from other variable names and from reserved words.
The second item passed to open is the name of the file
you want to open. For example, if you are running Perl on a UNIX
file system, and your current working directory contains a file
named file1 that you would like to open, you can open it
as follows:
open(FILE1, "file1");
This statement tells Perl that you want to open the file file1
and associate it with the file variable FILE1.
If you want to open a file in a different directory, you can
specify the complete pathname, as follows:
open(FILE1, "/u/jqpublic/file1");
This opens the file /u/jqpublic/file1 and associates it
with the file variable FILE1.
Note: If you are running Perl on a file system other than UNIX, use the filename and directory syntax that is appropriate for your system. The Perl interpreter running on that system will be able to figure out where your file is located.
When you open a file, you must decide how you want to access
the file. There are three different file-access modes (or, simply, file
modes) available in Perl:
append mode Appends output supplied by the program to the existing contents of the file
By default, open assumes that a file is to be opened in
read mode. To specify write mode, put a > character in
front of the filename that you pass to open, as follows:
open (OUTFILE, ">/u/jqpublic/outfile");
This opens the file /u/jqpublic/outfile for writing and
associates it with the file variable OUTFILE.
To specify append mode, put two > characters in
front of the filename, as follows:
open (APPENDFILE, ">>/u/jqpublic/appendfile");
This opens the file /u/jqpublic/appendfile in append
mode and associates it with the file variable APPENDFILE.
Note: Here are a few things to remember when opening files:
Before you can use a file opened by the open function,
you should first check whether the open function actually
is giving you access to the file. The open function
enables you to do this by returning a value indicating whether
the file-opening operation succeeded:
As you can see, the values returned by open correspond
to the values for true and false in conditional expressions. This means
that you can use open in if and unless
statements. The following is an example:
if (open(MYFILE, "/u/jqpublic/myfile")) {
# here's what to do if the file opened
}
The code inside the if statement is executed only if
the file has been successfully opened. This ensures that your
programs read or write only to files that you can access.
Note: If open returns false, you can find out what went wrong by using the file-test operators, which you'll learn about later today.
Once you have opened a file and determined that the file is
available for use, you can read information from it.
To read from a file, enclose the file variable associated with
the file in angle brackets (< and >), as follows:
$line = <MYFILE>;
This statement reads a line of input from the file specified
by the file variable MYFILE and stores the line of input
in the scalar variable $line.
Listing 6.1 is a simple program that reads input from a file
and writes it to the standard output file.
Listing 6.1. A program that reads
lines from a file and prints them.
1: #!/usr/local/bin/perl
2:
3: if (open(MYFILE, "file1")) {
4: $line = <MYFILE>;
5: while ($line ne "") {
6: print ($line);
7: $line = <MYFILE>;
8: }
9: }
$ program6_1
Here is a line of input.
Here is another line of input.
Here is the last line of input.
$
Line 3 opens the file file1 in read mode, which means
that the file is to be made available for reading. file1
is assumed to be in the current working directory. The file
variable MYFILE is associated with the file file1.
If the call to open returns a nonzero value, the
conditional expression
open(MYFILE, "file1")
is assumed to be true, and the code inside the if
statement is executed.
Lines 4-8 print the contents of file1. The sample
output shown here assumes that file1 contains the following
three lines:
Here is a line of input. Here is another line of input. Here is the last line of input.
Line 4 reads the first line of input from the file specified
by the file variable MYFILE, which is file1. This
line of input is stored in the scalar variable $line.
Line 5 tests whether the end of the file specified by MYFILE
has been reached. If there are no more lines left in MYFILE, $line
is assigned the empty string.
Line 6 prints the text stored in $line, which is the
line of input read from MYFILE.
Line 7 reads the next line of MYFILE, preparing for the
loop to start again.
Now that you have seen how Perl programs read input from files
in read mode, take another look at a statement that reads a line
of input from the standard input file.
$line = <STDIN>;
Here's what is actually happening: The Perl program is
referencing the file variable STDIN, which represents the
standard input file. The < and > on either
side of STDIN tell the Perl interpreter to read a line of
input from the standard input file, just as the < and >
on either side of MYFILE in
$line = <MYFILE>;
tell the Perl interpreter to read a line of input from MYFILE.
STDIN is a file variable that behaves like any other
file variable representing a file in read mode. The only
difference is that STDIN does not need to be opened by the open
function because the Perl interpreter does that for you.
In Listing 6.1, you saw that the return value from open
can be tested to see whether the program actually has access to
the file. The code that operates on the opened file is contained
in an if statement.
If you are writing a large program, you might not want to put
all of the code that affects a file inside an if
statement, because the distance between the beginning of the if
statement and the closing brace (}) could get very large.
For example:
if (open(MYFILE, "file1")) {
# this could be many pages of statements!
}
Besides, after a while, you'll probably get tired of typing
the spaces or tabs you use to indent the code inside the if
statement. Perl provides a way around this using the library
function die.
The syntax for the die library function is
die (message);
When the Perl interpreter executes the die function,
the program terminates immediately and prints the message passed
to die.
For example, the statement
die ("Stop this now!\n");
prints the following on your screen and terminates the
program:
Stop this now!
Listing 6.2 shows how you can use die to smoothly test
whether a file has been opened correctly.
Listing 6.2. A program that uses die
when testing for a successful file open operation.
1: #!/usr/local/bin/perl
2:
3: unless (open(MYFILE, "file1")) {
4: die ("cannot open input file file1\n");
5: }
6:
7: # if the program gets this far, the file was
8: # opened successfully
9: $line = <MYFILE>;
10: while ($line ne "") {
11: print ($line);
12: $line = <MYFILE>;
13: }
$ program6_2
Here is a line of input.
Here is another line of input.
Here is the last line of input.
$
This program behaves the same way as the one in Listing 6.1,
except that it prints out an error message when it can't open the file.
Line 3 opens the file and tests whether the file opened
successfully. Because this is an unless statement, the
code inside the braces ({ and }) is executed unless
the file opened successfully.
Line 4 is the call to die that is executed if the file
does not open successfully. This statement prints the following
message on the screen and exits:
cannot open input file file1
Because line 4 terminates program execution when the file is
not open, the program can make it past line 5 only if the file
has been opened successfully.
The loop in lines 9-13 is identical to the loop you saw in
Listing 6.1. The only difference is that this loop is no longer
inside an if statement.
Note: Here is another way to write lines 3-5:
open (MYFILE, "file1") || die ("Could not open file");
Recall that the logical OR operator only evaluates the expression on its right if the expression on its left is false. This means that die is called only if open returns false (if the open operation fails).
If you like, you can have die print the name of the
Perl program and the line number of the statement containing the
call to die. To do this, leave off the trailing newline
character in the character string, as follows:
die ("Missing input file");
If the Perl program containing this statement is called myprog,
and this statement is line 14 of myprog, this call to die
prints the following and exits:
Missing input file at myprog line 14.
Compare this with
die ("Missing input file\n");
which simply prints the following before exiting:
Missing input file
Specifying the program name and line number is useful in two
cases:
Perl enables you to read an entire file into a single array
variable. To do this, assign the file variable to the array
variable, as follows:
@array = <MYFILE>;
This reads the entire file represented by MYFILE into
the array variable @array. Each line of the file becomes
an element of the list that is stored in @array.
Listing 6.3 is a simple program that reads an entire file into
an array.
Listing 6.3. A program that reads an
entire input file into an array.
1: #!/usr/local/bin/perl
2:
3: unless (open(MYFILE, "file1")) {
4: die ("cannot open input file file1\n");
5: }
6: @input = <MYFILE>;
7: print (@input);
$ program6_3
Here is a line of input.
Here is another line of input.
Here is the last line of input.
$
Lines 3-5 open the file, test whether the file has been opened
successfully, and terminate the program if the file cannot be opened.
Line 6 reads the entire contents of the file represented by MYFILE
into the array variable @input. @input now contains
a list consisting of the following three elements:
("Here is a line of input.\n",
"Here is another line of input.\n",
"Here is the last line of input.\n")
Note that a newline character is included as the last
character of each line.
Line 7 uses the print function to print the entire
file.
After you have opened a file in write or append mode, you can
write to the file you have opened by specifying the file variable with
the print function. For example, if you have opened a file
for writing using the statement
open(OUTFILE, ">outfile");
the following statement:
print OUTFILE ("Here is an output line.\n");
writes the following line to the file specified by OUTFILE,
which is the file called outfile:
Here is an output line.
Listing 6.4 is a simple program that reads from one file and
writes to another.
Listing 6.4. A program that opens two
files and copies one into another.
1: #!/usr/local/bin/perl
2:
3: unless (open(INFILE, "file1")) {
4: die ("cannot open input file file1\n");
5: }
6: unless (open(OUTFILE, ">outfile")) {
7: die ("cannot open output file outfile\n");
8: }
9: $line = <INFILE>;
10: while ($line ne "") {
11: print OUTFILE ($line);
12: $line = <INFILE>;
13: }
This program writes nothing to the screen because all output
is directed to the file called outfile.
Lines 3-5 open file1 for reading. If the file cannot be
opened, line 4 is executed, which prints the following message on
the screen and terminates the program:
cannot open input file file1
Lines 6-8 open outfile for writing; the > in >outfile
indicates that the file is to be opened in write mode. If outfile
cannot be opened, line 7 prints the message
cannot open output file outfile
on the screen and terminates the program.
The only other line in the program that you have not seen in
other listings in this lesson is line 11, which writes the
contents of the scalar variable $line on the file
specified by OUTFILE.
Once this program has completed, the contents of file1
are copied into outfile.
Here is a line of input. Here is another line of input. Here is the last line of input.
Caution: Make sure that files you open in write mode contain nothing valuable. When the open function opens a file in write mode, any existing contents are destroyed.
If you like, your program can reference the standard output
file by referring to the file variable associated with the output
file. This file variable is named STDOUT.
By default, the print statement sends output to the
standard output file, which means that it sends the output to the
file associated with STDOUT. As a consequence, the
following statements are equivalent:
print ("Here is a line of output.\n");
print STDOUT ("Here is a line of output.\n");
Note: You do not need to open STDOUT because Perl automatically opens it for you.
In Perl, you can open as many files as you like, provided you
define a different file variable for each one. (Actually, there
is an upper limit on the number of files you can open, but it's
fairly large and also system-dependent.) For an example of a
program that has multiple files open at one time, take a look at
Listing 6.5. This program merges two files by creating an output
file consisting of one line from the first file, one line from
the second file, another line from the first file, and so on. For example,
if an input file named merge1 contains the lines
a1 a2 a3
and another file, merge2, contains the lines
b1 b2 b3
then the resulting output file consists of
a1 b1 a2 b2 a3 b3
Listing 6.5. A program that merges
two files.
1: #!/usr/local/bin/perl
2:
3: open (INFILE1, "merge1") ||
4: die ("Cannot open input file merge1\n");
5: open (INFILE2, "merge2") ||
6: die ("Cannot open input file merge2\n");
7: $line1 = <INFILE1>;
8: $line2 = <INFILE2>;
9: while ($line1 ne "" || $line2 ne "") {
10: if ($line1 ne "") {
11: print ($line1);
12: $line1 = <INFILE1>;
13: }
14: if ($line2 ne "") {
15: print ($line2);
16: $line2 = <INFILE2>;
17: }
18: }
$ program6_5
a1
b1
a2
b2
a3
b3
$
Lines 3 and 4 show another way to write a statement that
either opens a file or calls die if the open fails. Recall
that the || operator first evaluates its left operand; if
the left operand evaluates to true (a nonzero value), the right
operand is not evaluated because the result of the expression is
true.
Because of this, the right operand, the call to die, is
evaluated only when the left operand is false--which happens only
when the call to open fails and the file merge1
cannot be opened.
Lines 5 and 6 repeat the preceding process for the file merge2.
Again, either the file is opened successfully or the program aborts
by calling die.
The program then loops repeatedly, reading a line of input
from each file each time. The loop terminates only when both
files have been exhausted. If one file is empty but the other is
not, the program just copies the line from the non-empty file to
the standard output file.
Note that the output from this program is printed on the
screen. If you decide that you want to send this output to a
file, you can do one of two things:
For a discussion of the second method, see the following
section.
When you run programs on UNIX, you can redirect input and
output using < and >, respectively, as follows:
myprog <input >output
Here, when you run the program called myprog, the input
for the program is taken from the file specified by input
instead of from the keyboard, and the output for the program is
sent to the file specified by output instead of to
the screen.
When you run a Perl program and redirect input using <,
the standard input file variable STDIN now represents the
file specified with <. For example, consider the
following simple program:
#!/usr/local/bin/perl $line = <STDIN>; print ($line);
Suppose this program is named myperlprog and is called
with the command
myperlprog <file1
In this case, the statement
$line = <STDIN>;
reads a line of input from file1 because the file
variable STDIN represents file1.
Similarly, specifying > on the command file
redirects the standard output file from the screen to the specified
file. For example, consider this command:
myperlprog <file1 >outfile
It redirects output from the standard output file to the file
called outfile. Now, the following statement writes a line
of data to outfile:
print ($line);
Besides the standard input file and the standard output file,
Perl also defines a third built-in file variable, STDERR,
which represents the standard error file. By default, text sent
to this file is written to the screen. This enables the program
to send messages to the screen even when the standard output file
has been redirected to write to a file. As with STDIN and STDOUT,
you do not need to open STDERR because it automatically is
opened for you.
Listing 6.6 provides a simple example of the use of STDERR.
The output shown in the Input-Output example assumes that the
standard input file and standard output file have been redirected
to files using < and >, as in
myprog <infile >outfile
Therefore, the only output you see is what is written to STDERR.
Listing 6.6. A program that writes to
the standard error file.
1: #!/usr/local/bin/perl
2:
3: open(MYFILE, "file1") ||
4: die ("Unable to open input file file1\n");
5: print STDERR ("File file1 opened successfully.\n");
6: $line = <MYFILE>;
7: while ($line ne "") {
8: chop ($line);
9: print ("\U$line\E\n");
10: $line = <MYFILE>;
11: }
$ program6_6
File file1 opened successfully.
$
This program converts the contents of a file into uppercase
and sends the converted contents to the standard output file.
Line 3 tries to open file1. If the file cannot be
opened, line 4 is executed. This calls die, which prints the
following message and terminates:
Unable to open input file file1
Note: The function die sends its messages to the standard error file, not the standard output file. This means that when a program terminates, the message printed by die always appears on your screen, even when you have redirected output to a file.
If the file is opened successfully, line 5 writes a message to
the standard error file, which indicates that the file has been opened.
As you can see, the standard error file is not reserved solely
for errors. You can write anything you want to STDERR at
any time.
Lines 6-11 read one line of file1 at a time and write
it out in uppercase (using the escape characters \U and \E,
which you learned about on Day 3, "Understanding Scalar
Values").
When you are finished reading from or writing to a file, you
can tell the Perl interpreter that you are finished by calling
the library function close.
The syntax for the close library function is
close (filevar);
close requires one argument: the file variable
representing the file you want to close. Once you have closed the
file, you cannot read from it or write to it without invoking open
again.
Note that you do not have to call close when you are
finished with a file: Perl automatically closes the file when the
program terminates or when you open another file using a
previously defined file variable. For example, consider the
following statements:
open (MYFILE, ">file1");
print MYFILE ("Here is a line of output.\n");
open (MYFILE, ">file2");
print MYFILE ("Here is another line of output.\n");
Here, when file2 is opened for writing, file1
automatically is closed. The file variable MYFILE is now
associated with file2. This means that the second print
statement sends the following to file2:
Here is another line of output.
DO use the <> operator, which is an easy way to read input from several files in succession. See the section titled "Reading from a Sequence of Files," later in this lesson, for more information on the <> operator.
DON'T use the same file variable to represent multiple files unless it is absolutely necessary. It is too easy to lose track of which file variable belongs to which file, especially if your program is large or has many nested conditional statements.
Many of the example programs in today's lesson call open
and test the returned result to see whether the file has been
opened successfully. If open fails, it might be useful to
find out exactly why the file could not be opened. To do this,
use one of the file-test operators.
Listing 6.7 provides an example of the use of a file-test
operator. This program is a slight modification of Listing 6.6,
which is an uppercase conversion program.
Listing 6.7. A program that checks
whether an unopened file actually exists.
1: #!/usr/local/bin/perl
2:
3: unless (open(MYFILE, "file1")) {
4: if (-e "file1") {
5: die ("File file1 exists, but cannot be opened.\n");
6: } else {
7: die ("File file1 does not exist.\n");
8: }
9: }
10: $line = <MYFILE>;
11: while ($line ne "") {
12: chop ($line);
13: print ("\U$line\E\n");
14: $line = <MYFILE>;
15: }
$ program6_7
File file1 does not exist.
$
Line 3 attempts to open the file file1 for reading. If file1
cannot be opened, the program executes the if statement
starting in line 4.
Line 4 is an example of a file-test operator. This file-test
operator, -e, tests whether its operand, a file, actually
exists. If the file file1 exists, the expression -e
"file1" returns true, the message File file1 exists,
but cannot be opened. is displayed, and the program exits. If file1
does not exist, -e "file1" is false, and the
library function die prints the following message before
exiting:
File file1 does not exist.
All file-test operators have the same syntax as the -e
operator used in Listing 6.7.
The syntax for the file-test operators is
-x expr
Here, x is an alphabetic character and expr
is any expression. The value of expr is assumed to
be a string that contains the name of the file to be tested.
Because the operand for a file-test operator can be any
expression, you can use scalar variables and string operators in
the expression if you like. For example:
$var = "file1";
if (-e $var) {
print STDERR ("File file1 exists.\n");
}
if (-e $var . "a") {
print STDERR ("File file1a exists.\n");
}
In the first use of -e, the contents of $var, file1,
are assumed to be the name of a file, and this file is tested for
existence. In the second case, a is appended to the
contents of file1, producing the string file1a. The -e
operator then tests whether a file named file1a exists.
***Begin Note***
Note: The Perl interpreter does not get confused by the expression
-e $var . "a"
because the . operator has higher precedence than the -e operator. This means that the string concatenation is performed first.
The file-test operators have higher precedence than the comparison operators but lower precedence than the shift operators. To see a complete list of the Perl operators and their precedences, refer to Day 4, "More Operators."
***End Note***
The string can be a complete path name, if you like. The
following is an example:
if (-e "/u/jqpublic/file1") {
print ("The file exists.\n");
}
This if statement tests for the existence of the file /u/jqpublic/file1.
Table 6.1 provides a complete list of the file-test operators
available in Perl. In this table, name is a placeholder
for the name of the operand being tested.
Table 6.1. The
file-test operators.
Operator Description -b Is name a
block device? -c Is name a character device? -d
Is name a directory? -e Does name exist? -f
Is name an ordinary file? -g Does name
have its setgid bit set? -k Does name
have its "sticky bit" set? -l Is name
a symbolic link? -o Is name owned by the user? -p
Is name a named pipe? -r Is name
a readable file? -s Is name a non-empty
file? -t Does name represent a terminal? -u
Does name have its setuid bit set? -w
Is name a writable file? -x Is name an
executable file? -z Is name an empty file? -A
How long since name accessed? -B Is name
a binary file? -C How long since name's
inode accessed? -M How long since name modified? -O
Is name owned by the "real user" only?* -R
Is name readable by the "real user" only?* -S
Is name a socket? -T Is name a
text file? -W Is name writable by the
"real user" only?* -X Is name executable
by the "real user" only?*
* In this case, the "real user" is the userid
specified at login, as opposed to the effective user ID, which is
the userid under which you currently are working.
(On some systems, a command such as /user/local/etc/suid
enables you to change your effective user ID.)
The following sections describe some of the more common
file-test operators and show you how they can be useful. (You'll also
learn about more of these operators on Day 12, "Working with
the File System.")
When a Perl program opens a file for writing, it destroys
anything that already exists in the file. This might not be what
you want. Therefore, you might want to make sure that your
program opens a file only if the file does not already exist.
You can use the -e file-test operator to test whether
or not to open a file for writing. Listing 6.8 is an example of a
program that does this.
Listing 6.8. A program that tests
whether a file exists before opening it for writing.
1: #!/usr/local/bin/perl
2:
3: unless (open(INFILE, "infile")) {
4: die ("Input file infile cannot be opened.\n");
5: }
6: if (-e "outfile") {
7: die ("Output file outfile already exists.\n");
8: }
9: unless (open(OUTFILE, ">outfile")) {
10: die ("Output file outfile cannot be opened.\n");
11: }
12: $line = <INFILE>;
13: while ($line ne "") {
14: chop ($line);
15: print OUTFILE ("\U$line\E\n");
16: $line = <INFILE>;
17: }
$ program6_8
Output file outfile already exists.
$
This program is the uppercase conversion program again; most
of it should be familiar to you.
The only difference is lines 6-8, which use the -e
file-test operator to check whether the output file outfile
exists. If outfile exists, the program aborts, which
ensures that the existing contents of outfile are not
lost.
If outfile does not exist, the following expression
fails:
-e "outfile"
and the program knows that it is safe to open outfile
because it does not already exist.
If you don't need to know exactly why your program is failing,
you can combine all of the tests in Listing 6.8 into a single statement,
as follows:
open(INFILE, "infile") && !(-e "outfile") &&
open(OUTFILE, ">outfile") || die("Cannot open files\n");
Can you see how this works? Here's what is happening: The &&
operator, logical AND, is true only if both of its operands are
true. In this case, the two && operators indicate
that the subexpression up to, but not including, the || is
true only if all three of the following are true:
open(INFILE, "infile") !(-e "outfile") open(OUTFILE, ">outfile")
All three are true only when the following conditions are met:
If any of these subexpressions is false, the entire expression
up to the || is false. This means that the subexpression
after the || (the call to die) is executed, and the
program aborts.
Note that each of the three subexpressions associated with the &&
operators is evaluated in turn. This means that the subexpression
!(-e "outfile")
is evaluated only if
open(INFILE, "infile")
is true, and that the subexpression
open(OUTFILE, ">outfile")
is evaluated only if
!(-e "outfile")
is true. This is exactly the same logic that Listing 6.8 uses.
If any of the subexpressions is false, the Perl interpreter
doesn't evaluate the rest of them because it knows that the final
result of
open(INFILE, "infile") && !(-e "outfile") &&
open(OUTFILE, ">outfile")
is going to be false. Instead, it goes on to evaluate the
subexpression to the right of the ||, which is the call to die.
This program logic is somewhat complicated, and you shouldn't
use it unless you feel really comfortable with it. The if statements
in Listing 6.8 do the same thing and are easier to understand; however,
it's useful to know how complicated statements such as the
following one works because many Perl programmers like to write
code that works in this way:
open(INFILE, "infile") && !(-e "outfile") &&
open(OUTFILE, ">outfile") || die("Cannot open files\n");
In the next few days, you'll see several more examples of code
that exploits how expressions work in Perl. "Perl hackers"--experienced
Perl programmers--often enjoy compressing multiple statements
into shorter ones, and they delight in complexity. Be warned.
Before you can open a file for reading, you must have
permission to read the file. The -r file-test operator
tests whether you have permission to read a file.
Listing 6.9 checks whether the person running the program has
permission to access a particular file.
Listing 6.9. A program that tests for
read permission on a file.
1: #!/usr/local/bin/perl
2:
3: unless (open(MYFILE, "file1")) {
4: if (!(-e "file1")) {
5: die ("File file1 does not exist.\n");
6: } elsif (!(-r "file1")) {
7: die ("You are not allowed to read file1.\n");
8: } else {
9: die ("File1 cannot be opened\n");
10: }
11: }
$ program6_9
You are not allowed to read file1.
$
Line 3 of this program tries to open file1. If the call
to open fails, the program tries to find out why.
First, line 4 tests whether the file actually exists. If the
file exists, the Perl interpreter executes line 6, which tests
whether the file has the proper read permission. If it does not, die
is called; it then prints the following message and exits:
You are not allowed to read file1.
***Begin Note***
Note: You do not need to use the -e file-test operator before using the -r file-test operator. If the file does not exist, -r returns false because you can't read a file that isn't there.
The only reason to use both -e and -r is to enable your program to determine exactly what is wrong.
You can use file-test operators to test for other permissions
as well. To check whether you have write permission on a file, use
the -w file-test operator.
if (-w "file1") {
print STDERR ("I can write to file1.\n");
} else {
print STDERR ("I can't write to file1.\n");
}
The -x file-test operator checks whether you have
execute permission on the file (in other words, whether the
system thinks this is an executable program, and whether you have
permission to run it if it is), as illustrated here:
if (-x "file1") {
print STDERR ("I can run file1.\n");
} else {
print STDERR ("I can't run file1.\n");
}
***Begin Note***
Note: If you are the system administrator (for example, you are running as user ID root) and have permission to access any file, the -r and -w file-test operators always return true if the file exists. Also, the -x test operator always returns true if the file is an executable program.
The -z file-test operator tests whether a file is
empty. This provides a more refined test for whether or not to
open a file for writing: if the file exists but is empty, no
information is lost if you overwrite the existing file.
Listing 6.10 shows how to use -z.
Listing 6.10. A program that tests
whether the file is empty before opening it for writing.
1: #!/usr/local/bin/perl
2:
3: if (-e "outfile") {
4: if (!(-w "outfile")) {
5: die ("Missing write permission for outfile.\n");
6: }
7: if (!(-z "outfile")) {
8: die ("File outfile is non-empty.\n");
9: }
10: }
11: # at this point, the file is either empty or doesn't exist,
12: # and we have permission to write to it if it exists
$ program6_10
File outfile is non-empty.
$
Line 3 checks whether the file outfile exists using e.
If it exists, it can only be opened if the program has permission
to write to the file; line 4 checks for this using w.
Line 7 uses z to test whether the file is empty.
If it is not, line 7 calls die to terminate program execution.
The opposite of z is the s file-test
operator, which returns a nonzero value if the file is not empty.
$size = -s "outfile";
if ($size == 0) {
print ("The file is empty.\n");
} else {
print ("The file is $size bytes long.\n");
}
The s file-test operator actually returns the
size of the file in bytes. It can still be used in conditional expressions,
though, because any nonzero value (indicating that the file is
not empty) is treated as true.
Listing 6.11 uses s to return the size of a file
that has a name which is supplied via the standard input file.
Listing 6.11. A program that prints
the size of a file in bytes.
1: #!/usr/local/bin/perl
2:
3: print ("Enter the name of the file:\n");
4: $filename = <STDIN>;
5: chop ($filename);
6: if (!(-e $filename)) {
7: print ("File $filename does not exist.\n");
8: } else {
9: $size = -s $filename;
10: print ("File $filename contains $size bytes.\n");
11: }
[ic: output]$ program6_11
Enter the name of the file:
file1
File file1 contains 128 bytes.
$
Lines 3-5 obtain the name of the file and remove the trailing
newline character.
Line 6 tests whether the file exists. If the file doesn't
exist, the program indicates this.
Line 9 stores the size of the file in the scalar variable $size.
The size is measured in bytes (one byte is equivalent to one character
in a character string).
Line 10 prints out the number of bytes in the file.
You can use file-test operators on file variables as well as
character strings. In the following example the file-test
operator -z tests the file represented by the file
variable MYFILE:
if (-z MYFILE) {
print ("This file is empty!\n");
}
As before, this file-test operator returns true if the file is
empty and false if it is not.
***Begin Caution***
Caution: Remember that file variables can be used only after you open the file. If you need to test a particular condition before opening the file (such as whether the file is nonzero), test it using the name of the file.
Many UNIX utility programs are invoked using the following
command syntax:
programname file1 file2 file3 ...
A program that uses this command syntax operates on all of the
files specified on the command line in order, starting with file1.
When file1 has been processed, the program then
proceeds on to file2, and so on until all of the
files have been exhausted.
In Perl, it's easy to write programs that process an arbitrary
number of files because there is a special operator, the <> operator,
that does all of the file-handling work for you.
To understand how the <> operator works, recall
what happens when you put < and > around a file variable:
$list = <MYFILE>;
This statement reads a line of input from the file represented
by the file variable MYFILE and stores it in the scalar
variable $list. Similarly, the statement
$list = <>;
reads a line of input and stores it in the scalar variable $list;
however, the file it reads from is contained on the command line. Suppose,
for example, a program containing a statement using the <>
operator, such as the statement
$list = <>;
is called myprog and is called using the command
$ myprog file1 file2 file3
In this case, the first occurrence of the <>
operator reads the first line of input from file1. Successive
occurrences of <> read more lines from file1.
When file1 is exhausted, <> reads the first
line from file2, and so on. When the last file, file3,
is exhausted, <> returns an empty string, which
indicates that all the input has been read.
***Begin Note***
Note: If a program containing a <> operator is called with no command-line arguments, the <> operator reads input from the standard input file. In this case, the <> operator is equivalent to <STDIN>.
If a file named in a command-line argument does not exist, the Perl interpreter writes the following message to the standard error file:
Can't open name: No such file or directory
Here, name is a placeholder for the name of the file that the Perl interpreter cannot find. In this case, the Perl interpreter ignores name and continues on with the next file in the command line.
***End Note***
To see how the <> operator works, look at Listing
6.12, which displays the contents of the files specified on the
command line. (If you are familiar with UNIX, you will recognize
this as the behavior of the UNIX utility cat.) The output
from Listing 6.12 assumes that files file1 and file2 are specified
on the command line and that each file contains one line.
Listing 6.12. A program that displays
the contents of one or more files.
1: #!/usr/local/bin/perl
2:
3: while ($inputline = <>) {
4: print ($inputline);
5: }
$ program6_12 file1 file2
This is a line from file1.
This is a line from file2.
$
Once again, you can see how powerful and useful Perl is. This
entire program consists of only five lines, including the header comment
and a blank line.
Line 3 both reads a line from a file and tests to see whether
the line is the empty string. Because the assignment operator = returns
the value assigned, the expression
$inputline = <>
has the value "" (the null string) if and
only if <> returns the null string, which happens
only when there are no more lines to read from any of the input
files. This is exactly the point at which the program wants to
stop looping. (Recall that a "blank line" in a file is
not the same as the null string because the blank line contains
the newline character.) Because the null string is equivalent to
false in a conditional expression, there is no need to use a
conditional operator such as ne.
When line 3 is executed for the first time, the first line in
the first input file, file1, is read and stored in the
scalar variable $inputline. Because file1
contains only one line, the second pass through the loop, and the
second execution of line 3, reads the first line of the second
input file, file2.
After this, there are no more lines in either file1
or file2, so line 3 assigns the null string to $inputline, which
terminates the loop.
***Begin Caution***
Caution: When it reaches the end of the last file on the command line, the <> operator returns the empty string. However, if you use the <> operator after it has returned the empty string, the Perl interpreter assumes that you want to start reading input from the standard input file. (Recall that <> reads from the standard input file if there are no files on the command line.)
This means that you have to be a little more careful when you use <> than when you are reading using <MYFILE> (where MYFILE is a file variable). If MYFILE has been exhausted, repeated attempts to read using <MYFILE> continue to return the null string because there isn't anything left to read.
As you have seen, if you read from a file using <STDIN>
or <MYFILE> in an assignment to an array variable,
the Perl interpreter reads the entire contents of the file into
the array, as follows:
@array = <MYFILE>;
This works also with <>. For example, the
statement
@array = <>;
reads all the contents all of the files on the command line
into the array variable @array.
As always, be careful when you use this because you might end
up with a very large array.
As you've seen, the <> operator assumes that its
command-line arguments are files. For example, if you start up
the program shown in Listing 6.12 with the command
$ program6_12 myfile1 myfile2
the Perl interpreter assumes that the command-line arguments myfile1
and myfile2 are files and displays their contents.
Perl enables you to use the command-line arguments any way you
want by defining a special array variable called @ARGV. When
a Perl program starts up, this variable contains a list
consisting of the command-line arguments. For example, the command
$ program6_12 myfile1 myfile2
sets @ARGV to the list
("myfile1", "myfile2")
***Begin Note***
Note: The shell you are running (sh, csh, or whatever you are using) is responsible for turning a command line such as
program6_12 myfile1 myfile2
into arguments. Normally, any spaces or tab characters are assumed to be separators that indicate where one command-line argument stops and the next begins. For example, the following are identical:
program6_12 myfile1 myfile2 program6_12 myfile1 myfile2
In each case, the command-line arguments are myfile1 and myfile2.
See your shell documentation for details on how to put blank spaces or tab characters into your command-line arguments.
***End Note***
As with all other array variables, you can access individual
elements of @ARGV. For example, the statement
$var = $ARGV[0];
assigns the first element of @ARGV to the scalar
variable $var.
You even can assign to some or all of @ARGV if you
like. For example:
$ARGV[0] = 43;
If you assign to any or all of @ARGV, you overwrite
what was already there, which means that any command-line
arguments overwritten are lost.
To determine the number of command-line arguments, assign the
array variable to a scalar variable, as follows:
$numargs = @ARGV;
As with all array variables, using an array variable in a
place where the Perl interpreter expects a scalar variable means
that the length of the array is used. In this case, $numargs
is assigned the number of command-line arguments.
***Begin Caution***
Caution: C programmers should take note that the first element of @ARGV, unlike argv[0] in C, does not contain the name of the program. In Perl, the first element of @ARGV is the first command-line argument.
To get the name of the program, use the system variable $0, which is discussed on Day 17, "System Variables."
***End Caution***
To see how you can use @ARGV in a program, examine
Listing 6.13. This program assumes that its first argument is a
word to look for. The remaining arguments are assumed to be files
in which to look for the word. The program prints out the searched-for
word, the number of occurrences in each file, and the total
number of occurrences.
This example assumes that the files file1 and file2
are defined and that each file contains the single line
This file contains a single line of input.
This example is then run with the command
$ programname single file1 file2
where programname is a placeholder for the name
of the program. (If you are running the program yourself, you can
name the program anything you like.)
Listing 6.13. A word-search and
counting program.
1: #!/usr/local/bin/perl
2:
3: print ("Word to search for: $ARGV[0]\n");
4: $filecount = 1;
5: $totalwordcount = 0;
6: while ($filecount <= @ARGV-1) {
7: unless (open (INFILE, $ARGV[$filecount])) {
8: die ("Can't open input file $ARGV[$filecount]\n");
9: }
10: $wordcount = 0;
11: while ($line = <INFILE>) {
12: chop ($line);
13: @words = split(/ /, $line);
14: $w = 1;
15: while ($w <= @words) {
16: if ($words[$w-1] eq $ARGV[0]) {
17: $wordcount += 1;
18: }
19: $w++;
20: }
21: }
22: print ("occurrences in file $ARGV[$filecount]: ");
23: print ("$wordcount\n");
24: $filecount++;
25: $totalwordcount += $wordcount;
26: }
27: print ("total number of occurrences: $totalwordcount\n");
$ program6_13 single file1 file2
Word to search for: single
occurrences in file file1: 1
occurrences in file file2: 1
total number of occurrences: 2
$
Line 3 prints the word to search for. The program assumes that
this word is the first argument in the command line and, therefore,
is the first element of the array @ARGV.
Lines 7-9 open a file named on the command line. The first
time line 7 is executed, the variable $filecount
has the value 1, and the file whose name is in $ARGV[1] is
opened. The next time through, $filecount is 2 and
the file named in $ARGV[2] is opened, and so on. If a file
cannot be opened, the program terminates.
Line 11 reads a line from a file. As before, the conditional
expression
$line = <INFILE>
reads a line from the file represented by the file INFILE
and assigns it to $line. If the file is empty, $line
is assigned the null string, the conditional expression is false,
and the loop in lines 11-21 is terminated.
Line 13 splits the line into words, and lines 15-20 compare
each word with the search word. If the word matches, the word count
for this file is incremented. This word count is reset when a new
file is opened.
In Perl, the <> operator actually contains a
hidden reference to the array @ARGV. Here's how it works:
shift(@ARGV);
If you like, you can modify your program to retrieve a value
from the command line and then fix @ARGV so that the <> operator
can work properly. If you modify Listing 6.13 to do this, the result
is Listing 6.14.
Listing 6.14. A word-search and
counting program that uses <>.
1: #!/usr/local/bin/perl
2:
3: $searchword = $ARGV[0];
4: print ("Word to search for: $searchword\n");
5: shift (@ARGV);
6: $totalwordcount = $wordcount = 0;
7: $filename = $ARGV[0];
8: while ($line = <>) {
9: chop ($line);
10: @words = split(/ /, $line);
11: $w = 1;
12: while ($w <= @words) {
13: if ($words[$w-1] eq $searchword) {
14: $wordcount += 1;
15: }
16: $w++;
17: }
18: if (eof) {
19: print ("occurrences in file $filename: ");
20: print ("$wordcount\n");
21: $totalwordcount += $wordcount;
22: $wordcount = 0;
23: $filename = $ARGV[0];
24: }
25: }
26: print ("total number of occurrences: $totalwordcount\n");
$ program6_14 single file1 file2
Word to search for: single
occurrences in file file1: 1
occurrences in file file2: 1
total number of occurrences: 2
$
Line 3 assigns the first command-line argument, the search
word, to the scalar variable $searchword. This is
necessary because the call to shift in line 5 destroys the
initial value of $ARGV[0].
Line 5 adjusts the array @ARGV so that the <>
operator can use it. To do this, it calls the library function shift.
This function "shifts" the elements of the list stored
in @ARGV. The element in $ARGV[1] is moved to $ARGV[0],
the element in $ARGV[2] is moved to $ARGV[1], and
so on. After shift is called, @ARGV contains the
files to be searched, which is exactly what the <> operator
is looking for.
Line 7 assigns the current value of $ARGV[0] to the
scalar variable $filename. Because the <> operator
in line 8 calls shift, the value of $ARGV[0] is
lost unless the program does this.
Line 8 uses the <> operator to open the file
named in $ARGV[0] and to read a line from the file. The
array variable @ARGV is shifted at this point.
Lines 9-16 behave as in Listing 6.13. The only difference is
that the search word is now in $searchword, not in $ARGV[0].
Line 18 introduces the library function eof. This
function indicates whether the program has reached the end of the
file being read by <>. If eof returns true,
the next use of <> opens a new file and shifts @ARGV
again.
Lines 19-23 prepare for the opening of a new file. The number
of occurrences of the search word is printed, the current word count
is added to the total word count, and the word count is reset to 0. Because
the new filename to be opened is in $ARGV[0], line 23
preserves this filename by assigning it to $filename.
***Begin Note***
Note: You can use the <> operator to open and read any file you like by setting the value of @ARGV yourself. For example
@ARGV = ("myfile1", "myfile2");
while ($line = <>) {
...
}
Here, when the statement containing the <> is executed for the first time, the file myfile1 is opened and its first line is read. Subsequent executions of <> each read another line of input from myfile1. When myfile1 is exhausted, myfile2 is opened and read one line at a time.
On machines running the UNIX operating system, two commands
can be linked using a pipe. In this case, the standard
output from the first command is linked, or piped, to the
standard input to the second command.
Perl enables you to establish a pipe that links a Perl output
file to the standard input file of another command. To do this, associate
the file with the command by calling open, as follows:
open (MYPIPE, "| cat >hello");
The | character tells the Perl interpreter to establish
a pipe. When MYPIPE is opened, output sent to MYPIPE
becomes input to the command
cat >hello
Because the cat command displays the contents of the
standard input file when called with no arguments, and >hello
redirects the standard output file to the file hello, the open
statement given here is identical to the statement
open (MYPIPE, ">hello");
You can use a pipe to send mail from within a Perl program.
For example:
open (MESSAGE, "| mail dave");
print MESSAGE ("Hi, Dave! Your Perl program sent this!\n");
close (MESSAGE);
The call to open establishes a pipe to the command mail
dave. The file variable MESSAGE is now associated with
this pipe. The call to print adds the line
Hi, Dave! Your Perl program sent this!
to the message to be sent to user ID dave.
The call to close closes the pipe referenced by MESSAGE,
which tells the system that the message is complete and can be sent.
As you can see, the call to close is useful here because
you can control exactly when the message is to be sent. (If you do
not call close, MESSAGE is closed--and the message
is sent--when the program terminates.)
Perl accesses files by means of file variables. File variables
are associated with files by the open statement.
Files can be opened in any of three modes: read mode, write
mode, and append mode. A file opened in read mode cannot be written
to; a file opened in either of the other modes cannot be read.
Opening a file in write mode destroys the existing contents of
the file.
To read from an opened file, reference it using <name>,
where name is a placeholder for the name of the
file variable associated with the file. To write to a file,
specify its file variable when calling print.
Perl defines three built-in file variables:
You can redirect STDIN and STDOUT by specifying <
and >, respectively, on the command line. Messages sent
to STDERR appear on the screen even if STDOUT is
redirected to a file.
The close function closes the file associated with a
particular file variable. close never needs to be called
unless you want to control exactly when a file is to be made
inaccessible.
The file-test operators provide a way of retrieving
information on a particular file. The most common file-test
operators are
You can use -w and -z to ensure that you do not
overwrite a non-empty file.
The <> operator enables you to read data from
files specified on the command line. This operator uses the
built-in array variable @ARGV, whose elements consist of
the items specified on the command line.
Perl enables you to open pipes. A pipe links the output from
your Perl program to the input to another program.
The Workshop provides quiz questions to help you solidify your
understanding of the material covered and exercises to give you
experience in using what you've learned. Try and understand the quiz
and exercise answers before you go on to tomorrow's lesson.
$ myprog file1 file2 file3