Today's lesson describes the options you can specify to
control how your Perl program operates. These options provide many
features, including those that perform the following tasks:
Today's lesson begins with a description of how to supply
options to your Perl program.
There are two ways to supply options to a Perl program:
The following sections describe these methods of supplying
options.
One way to specify options for a Perl program is to enter them
on the command line when you enter the command that starts your
program.
The syntax for specifying options on the command line is
perl options program
Here, program is the name of the Perl program
you want to run, and options is the list of options you
want to supply to the program.
For example, the following command runs the Perl program named test1
and passes it the options s and w.
(You'll learn about these, and other options, later today.)
$ perl -s -w test1
Some options need to be specified along with a value. For
example, the 0 option requires an integer to be
passed with it:
$ perl -0 26 test1
Here, the integer 26 is associated with the option 0.
If desired, you can omit the space between the option and its
associated value, as in the following:
$ perl -026 test1
As before, this command associates 26 with the 0
option. In either case, the value associated with an option must
always immediately follow the option.
Note: If an option does not require an associated value, you can put another option immediately after it without specifying an additional character or space. For example, the following commands are equivalent:
$ perl -s -w test1 $ perl -sw test1
You can put an option that requires a value as part of a group of options, provided that it is last in the group. For example, the following commands are equivalent:
$ perl -s -w -0 26 test1 $ perl -sw026 test1
Another way to specify a command option is to include it as
part of the header comment for the program. For example, suppose
that the first line of your Perl program is this:
#!/usr/local/bin/perl -w
In this case, the w option is automatically
specified when you start the program.
Caution: Perl 4 enables you to specify only one option (or group of options) on the header comment line. This means that the following line generates an "unrecognized switch" error message:
#!/usr/local/bin/perl -w -s
Perl 5 enables as many switches as you like on the command line. However, some operating systems chop the header line after 32 characters, so be careful if you are planning to use a large number of switches.
Caution: Options specified on the command line override options specified in the header comment. For example, if your header comment is
#!/usr/local/bin/perl -w
and you start your program with the command
$ perl -s test1
the program will run with the s option specified but not the w option.
The v option enables you to find out what version
of Perl is running on your machine. When the Perl interpreter
sees this option, it prints information on itself, and then exits
without running your program.
This means that if you supply a command such as the following,
the file test1 is not executed:
$ perl -v test1
Here is sample output from the v command:
This is perl, version 5.001
Unofficial patch level 1m
Copyright 1987-1994, Larry Wall
Perl may be copied only under the terms of either the Artistic License
or the GNU General Public License, which may be found in the Perl 5.0
source kit.
The only really useful things here, besides the copyright
notice, are the version number of the Perl you are running--in
this case, 4.0--and the patch level, which indicates how many
repairs, or patches, have been made to this version. Here, the patch level
is 36 (which, at this writing, is the latest release of
Perl version 4.0).
No other options should be specified if you specify the v
option, because none of them would do anything in this case anyway.
The c option tells the Perl interpreter to check
whether your Perl program is correct without actually running it.
If it is correct, the Perl interpreter prints the following
message (in which filename is the name of your program)
and then exits without executing your program:
filename syntax OK
If the Perl interpreter detects errors, it displays them just
as it normally does. After printing the error messages, it prints
the following message, in which filename is the name of
your program:
filename had compilation errors
Again, there is no point in supplying other options if you
specify the c option because the Perl interpreter
isn't actually running the program; the only exception is the w
option, which prints warnings. This option is described in the
following section.
As you have seen on the preceding days, some mistakes are easy
to make when you are writing a Perl program, such as accidentally
typing the wrong variable name, or using == when you
really mean to use eq. Because certain mistakes crop up frequently,
the Perl interpreter provides an option that checks for them.
This option, the w option, prints a warning every
time the Perl interpreter sees something that might cause a
problem. For example, if the interpreter sees the statement
$y = $x;
and hasn't seen $x before (which means that $x
is undefined), it prints a warning message in the following form
if you are running Perl 4:
Possible typo: "x" at filename line linenum.
Here, filename is the name of your Perl program,
and linenum is the number of the line on which the
interpreter has detected a potential problem.
If you are running Perl 5, the message is similar, but also
includes the name of the current package:
Identifier "main::x" used only once: possible typo at filename line linenum.
For more information on packages, see Day 18,
"Object-Oriented Programming."
The following sections provide a partial list of the potential
problems detected by the w option. (If you are
running Perl 5, the -w option provides dozens of useful
warnings. Consult the Perl manual pages for a complete list.)
Note: The w option can be combined with the c option to provide a means of checking your syntax for errors and problems before you actually run the program.
As you have seen, a statement such as the following one leads
to a warning message if $x has not been previously
defined:
$y = $x;
The "possible typo" error message also appears in
the following circumstances, among others:
Of course, the possible-typo message might flag lines that
don't actually contain typos. Following are two of the most common
situations in which a possible typo actually is correct code:
format BLANK = .
Possible typo: "BLANK" at file1 line 26.
$~ = "BLANK";
($d1, $d2, $groupid) = getgrnam ($groupname);
One useful feature of the w option is that it
checks whether two subroutines of the same name have been defined
in the program. (Normally, if the Perl interpreter sees two
subroutines of the same name, it quietly replaces the first one
with the second one and carries on.)
If, for example, two subroutines named x are defined in
a program, the w option prints a message similar to
the following one:
Subroutine x redefined at file1 line 46.
The line number specified is the line that starts the second
subroutine.
When the w option has detected this problem, you
can decide which subroutine to rename or throw away.
Another really helpful feature of the w option is
that it checks whether you are trying to compare a string using
the == operator.
In a statement such as the following:
if ($x == "humbug") {
...
}
the conditional expression
$x == "humbug"
is equivalent to the expression
$x == 0
because all character strings are converted to 0 when
used in a numeric context (a place where a number is expected).
This is correct in Perl, but it is not likely to be what you
want.
If the w option is specified and the Perl
interpreter sees a statement such as this one, it prints a message
similar to the following if you are running Perl 4:
Possible use of == on string value at file1 line 26.
In Perl 5, the following warning is printed:
Argument "humbug" isn't numeric for numeric eq at file1 line 26.
In either case, this warning enables you detect these
incorrect == operators and replace them with eq
operators, which compare strings.
Caution: The w operator doesn't detect the opposite problem, namely:
if ($x eq 46) {
...
}
In this case, the Perl interpreter converts 46 to the string 46 and performs a string comparison.
Because a number and its string equivalent usually mean the same thing, this normally doesn't cause a problem. Watch out, though, for octal numbers in string comparisons, as in the following example:
if ($x eq 046) {
...
}
Here, the octal value 046 is converted to the number 38 before being converted to a string. If you really want to compare $x to 046, this code will not produce the results you expect.
Another thing to watch out for is this: In Perl 4, the w option does not check for conditional expressions such as the following:
if ($x = 0) {
...
}
because there are many cases in Perl in which the = assignment operator belongs inside a conditional expression. You will have to manually check that you are not specifying = (assignment) when you really mean to use == (equality comparison).
Perl 5 flags this with the following message:
Found = in conditional, should be == at filename line filenum.
The e option enables you to execute a Perl
program from your shell command line. For example, the command
$ perl -e "print ('Hello');"
prints the following string on your screen:
Hello
You can also specify multiple e options. In this
case, the Perl statements are executed left to right. For
example, the command
$ perl -e "print ('Hello');" -e "print (' there');"
prints the following string on your screen:
Hello there
By itself, the e option is not all that useful.
It becomes useful, however, when you use it in conjunction with
some of the other options you'll see in today's lesson.
Caution: You can leave off the closing semicolon in a Perl statement passed via the e option, if you want to:
$ perl -e "print ('Hello')"
If you are supplying two or more e options, however, the Perl interpreter strings them together and treats them as though they are a single Perl program. This means that the following command generates an error, because there must be a semicolon after the statement specified with the first e option:
$ perl -e "print ('Hello')" -e "print (' there')"
As you can see from this chapter, you can control the behavior
of Perl by specifying various command-line options. You can control
the behavior of your own Perl programs by specifying command-line
options for them too. To do this, specify the s option
when you call the program.
Here's an example of a command that passes an option to a Perl
program:
$ perl -s testfile -q
This command starts the Perl program testfile and
passes it the q option.
Caution: To be able to pass options to your program, you must specify the Perl s option. The following command does not pass q as an option:
$ perl testfile -q
In this case, q is just an ordinary argument that is passed to your program and stored in the built-in array variable @ARGV.
The easiest way to remember to include s is to specify it as part of your header comment:
#!/usr/local/bin/perl -s
This ensures that your program always will check for options. (Unless, of course, you override the option check by providing other Perl options on the command line when you invoke the program.)
If an option is specified when you invoke your Perl program,
the scalar variable whose name is the same as the option is automatically
set to 1 before program execution begins. For example, if a Perl program
named testfile is called with the q option,
as in the following, the scalar variable $q is automatically
set to 1:
$ perl -s testfile -q
You then can use this variable in a conditional expression to
test whether the option has been set.
Note: If q is treated as an option, it does not appear in the system variable @ARGV. A command-line argument either sets an option or is added to @ARGV.
Options can be longer than a single character. For example,
the following command sets the value of the scalar variable $potato
to 1:
$ perl -s testfile -potato
You also can set an option to a value other than 1 by
specifying = and the desired value on the command line:
$ perl -s testfile -potato="hot"
This line sets the value of $potato to hot.
Listing 16.1 is a simple example of a program that uses
command-line options to control its behavior. This program prints information
about the user currently logged in.
Listing 16.1. An example of a program
that uses command-line options.
1: #!/usr/local/bin/perl -s
2:
3: # This program prints information as specified by
4: # the following options:
5: # -u: print numeric user ID
6: # -U: print user ID (name)
7: # -g: print group ID
8: # -G: print group name
9: # -d: print home directory
10: # -s: print login shell
11: # -all: print everything (overrides other options)
12:
13: $u = $U = $g = $G = $d = $s = 1 if ($all);
14: $whoami = "whoami";
15: chop ($whoami);
16: ($name, $d1, $userid, $groupid, $d2, $d3, $d4,
17: $homedir, $shell) = getpwnam ($whoami);
18: print ("user id: $userid\n") if ($u);
19: print ("user name: $name\n") if ($U);
20: print ("group id: $groupid\n") if ($g);
21: if ($G) {
22: ($groupname) = getgrgid ($groupid);
23: print ("group name: $groupname\n");
24: }
25: print ("home directory: $homedir\n") if ($d);
26: print ("login shell: $shell\n") if ($s);
$ program16_1 -U -d
user name: dave
home directory: /ag1/dave
$
The header comment in line 1 specifies that the s
option is to be automatically specified when this Perl program is invoked. This
ensures that options can always be passed to this program (unless,
of course, you override the s option on the command
line, as described earlier).
The comments in lines 3-11 provide information on what options
the program supports. This information is useful when someone is
reading or modifying the program, because there is no other way
to tell which scalar variables are used to test options.
The option all indicates that the program is to
print everything; if this option is specified, the scalar variable $all
is set to 1. To cut down on the number of comparisons later, line
13 checks whether $all is 1; if it is, the other scalar
variables corresponding to command-line options are set to 1. This
technique ensures that the following commands are equivalent
(assuming that your program is named program16_1):
$ program16_1 -all $ program16_1 -u -U -g -G -d -s
The scalar variables listed in line 13 can be assigned to,
even though they correspond to possible command-line options, because
they behave just like other Perl scalar variables.
Lines 14-17 provide the raw material for the various print
operations in this program. To start, when the Perl interpreter sees the
string 'whoami', it calls the system command whoami,
which returns the name of the user running the program. This name is
then passed to getpwnam, which searches the password file /etc/passwd
and retrieves the entry for this particular user.
Line 18 checks whether the -u option has been
specified. To do this, it checks whether $u has a nonzero
value. If it does, the user ID is printed. (The user ID is also
printed if all has been specified, because line 13
sets $u to a nonzero value in this case.)
Similarly, line 19 prints the user name if U has
been specified, line 20 prints the group ID if g has been
specified, line 25 prints the home directory if d
has been specified, and line 26 prints the filename of the login
shell if s has been specified.
Lines 21-24 check whether to print the group name. If g
has been specified, $g is nonzero, and line 22 calls getgrid
to retrieve the group name.
Tip: Because command-line options can change the initial values of scalar variables, it is a good idea to always assign a value to a scalar variable before you use it. Consider the following example:
#!/usr/local/bin/perl
while ($count < 10) {
print ("$count\n");
$count++;
}
This program normally prints the numbers from 0 to 9, because $count is assumed to have an initial value of 0. However, if this program is called with the count option, the initial value of $count becomes something other than 0, and the program behaves differently.
If you add the following statement before the while loop, the program always prints the numbers 0 to 9 regardless of what options are specified on the command line:
$count = 0;
You can supply both options and command-line arguments to your
program (provided that you supply the s option to
Perl). These are the rules that the Perl interpreter follows:
This means, for example, that the following command treats w
as an option to testfile, and foo and e
as ordinary arguments:
$ perl -s testfile -w foo -e
The special argument also indicates
"end of options." For example, the following command treats w
as an option and e as an ordinary argument. The
is thrown away.
$ perl -s testfile -w -e
The C preprocessor is a program that takes code written in the
C programming language and searches for special preprocessor
statements. In Perl, the P option enables you to use
this preprocessor with your Perl program:
$ perl -P myprog
Here, the Perl program myprog is first run through the
C preprocessor. The resulting output is then passed to the Perl interpreter
for execution.
Note: Perl provides no way to just run the C preprocessor on a Perl program. To do this, you'll need a C compiler that provides an option which specifies "preprocessor only."
Refer to the documentation for your C compiler for details about how to do this.
The following sections describe some of the most commonly used
C preprocessor commands.
C preprocessor statements always employ the following syntax:
#command value
Each C preprocessor statement starts with a #
character. command is the preprocessor operation to
perform, and value is the (optional) value
associated with this operation.
The most common preprocessor statement is #define. This
statement tells the preprocessor to replace every occurrence of a particular
character string with a specified value.
The syntax for #define is
#define macro value
This statement replaces all occurrences of the character
string macro with the value specified by value.
This operation is known as macro substitution. macro
can contain letters, digits, or underscores.
The value specified in a #define statement can be any
character string or number. For example, the following statement replaces
all occurrences of USERNAME with the string "dave"
(including the quotation marks):
#define USERNAME "dave"
The following statement replaces EXPRESSION with the
string (14+6), including the parentheses:
#define EXPRESSION (14+6)
Caution: When you are using #define with a value that is an expression, it is usually a good idea to enclose the value in parentheses. For example, consider the following Perl statement:
$result = EXPRESSION * 5;
If your preprocessor command is
#define EXPRESSION 14+6
the resulting Perl statement becomes
$result = 14 + 6 * 5;
which assigns 44 to $result (because the multiplication is performed first). If you enclose the preprocessor expression in parentheses, as in
#define EXPRESSION (14+6)
the statement becomes
$result = (14 + 6) * 5;
which yields the result 100, which is likely what you want.
Also, you always should enclose any parameters (described in the following section) in parentheses, for the same reason.
You can specify one or more parameters with your #define
statement. This capability enables you to treat the preprocessor command
like a simple function that accepts arguments. For example, the following
preprocessor statement takes a specified value and uses it as an
exponent:
#define POWEROFTWO(val) (2 ** (val))
In the Perl statement
$result = POWEROFTWO(1.3 + 2.6) + 4;
the preprocessor substitutes the expression 1.3 + 2.6
for val and produces this:
$result = (2 ** (1.3 + 2.6)) + 4;
You can supply more than one parameter with a #define
statement. For example, consider the following statement:
#define EXPONENT (base, exp) ((base) ** (exp))
Now, the statement
$result = EXPONENT(4, 11);
yields the following result after preprocessing:
$result = ((4) ** (11));
The Perl interpreter ignores the extra parentheses.
Tip: By convention, macros defined using #define normally use all uppercase letters (plus occasional digits and underscores). This makes it easier to distinguish macros from other variable names or character strings.
Listing 16.2 is an example of a Perl program that uses a #define
statement to perform macro substitution. This listing is just Listing
17.4 (from the preceding chapter) with the preprocessor statement
added.
Listing 16.2. A program that uses a #define
statement.
1: #!/usr/local/bin/perl -P
2:
3: #define AF_INET 2
4: print ("Enter an Internet address:\n");
5: $machine = <STDIN>;
6: $machine =~ s/^\s+|\s+$//g;
7: @addrbytes = split (/\./, $machine);
8: $packaddr = pack ("C4", @addrbytes);
9: if (!(($name, $altnames, $addrtype, $len, @addrlist) =
10: gethostbyaddr ($packaddr, AF_INET))) {
11: die ("Address $machine not found.\n");
12: }
13: print ("Principal name: $name\n");
14: if ($altnames ne "") {
15: print ("Alternative names:\n");
16: @altlist = split (/\s+/, $altnames);
17: for ($i = 0; $i < @altlist; $i++) {
18: print ("\t$altlist[$i]\n");
19: }
20: }
$ program16_2
Enter an Internet address:
128.174.5.59
Principal name: ux1.cso.uiuc.edu
$
Line 3 defines the macro AF_INET and assigns it the
value 2. When the C preprocessor sees AF_INET in
line 10, it replaces it with 2, which is the value of AF_INET
on the current machine (as specified in the header file /usr/include/netdb.h or /usr/include/bsd/netdb.h).
If this program is moved to a machine that defines a different
value for AF_INET, all you need to do to get this program
to work is change line 3 to use the value on that machine.
You can use a previously defined macro as the value in another #define
statement. The following is an example:
#define FIRST 1 #define SECOND FIRST $result = 43 + SECOND;
Here, the macro FIRST is defined to be equivalent to
the value 1, and SECOND is defined to be equivalent
to FIRST. This means that the statement following the
macro definitions is equivalent to the following statement:
$result = 43 + 1;
The #ifdef and #endif statements control whether
a given group of statements is to be included as part of your
program.
The syntax for the #ifdef and #endif statements
is
#ifdef macro code #endif
Here, macro is any character string that can
appear in a #define statement. code is one
or more lines of your Perl program.
When the C preprocessor sees an #ifdef statement, it
checks whether the macro has been defined using the #define statement.
If it has, the code specified by code is included
as part of the program. If it has not, the code specified by code
is skipped.
Note: The code enclosed by #ifdef and #endif does not have to be a complete Perl statement. For example, the following code is legal:
$result = 14 * 2 #ifdef PLUSONE + 1 #endif ;
Here, $result is assigned 17 if PLUSONE is defined, 16 if it's not.
Be careful, though: if you abuse #ifdef, the resulting program might become difficult to read.
The #ifndef and #else statements provide
additional control over when parts of your program are to be
executed.
The #ifndef statement enables you to define code that
is to be executed when a particular macro is not defined.
The syntax for #ifndef is the same as for #ifdef:
#ifndef macro code #endif
For example
#ifndef MYMACRO $result = 26; #endif
The assignment is performed only if MYMACRO has not
appeared in a #define statement.
The #else statement enables you to specify code to be
executed if a macro is defined, and an alternative to choose if
the macro is not defined. For example
#ifdef MYMACRO
$result = 47;
#else
print ("Hello, world!\n");
#endif
Here, if MYMACRO has been defined by a #define
statement, the following statement is executed:
$result = 47;
If MYMACRO has not been defined, the following
statement is executed:
print ("Hello, world!\n");
You can use #else with #ifndef, as in the
following:
#ifndef MYMACRO
print ("Hello, world!\n");
#else
$result = 47;
#endif
This code is identical to the #ifdef-#else-#endif
sequence shown earlier in this section.
The #if statement enables you to specify that certain
lines of your program are to be included only if the expression
included with the statement is nonzero.
The syntax for the #if statement is
#if expr code #endif
Here, expr is the expression to be evaluated,
and code is the code to be executed if expr
is nonzero.
For example, the following statement is executed only if the
expression 14 + 3 is nonzero (which it always is, of
course):
#if 14 + 3 $result = 26; #endif
You can use a macro definition as part of an #if
statement. If the macro is defined, it has a nonzero value in an #if
expression; if it is not defined, it has the value zero. Consider
the following example:
#if MACRO1 || MACRO2 $result = 47; #endif
When the preprocessor sees the #if statement, it
evaluates the expression MACRO1 || MACRO2. This expression
has a nonzero value if either MACRO1 or MACRO2 is
nonzero. Therefore, the following statement is executed if either MACRO1 or MACRO2
is defined:
$result = 47;
The #if statement provides a quick way to remove lines
of code from your program temporarily:
#if 0
$result = 46;
print ("This line is not printed right now.\n");
#endif
Here, the expression included with the #if statement is
always zero, which means that the statements between #if
and #endif are always skipped.
You can use #else with #if, as in the following
example:
#if MACRO1 || MACRO2
print ("MACRO1 or MACRO2 is defined.\n");
#else
print ("MACRO1 and MACRO2 are not defined.\n");
#endif
This code includes the first print statement if MACRO1
or MACRO2 has been defined using #define, and it
includes the second print statement if neither has been defined.
Caution: You cannot use the ** (exponentiation) operator in an #if statement, because ** is not supported in the C programming language.
You can put one #ifdef-#else-#endif
construct inside another. For example
#ifdef MACRO1
#ifdef MACRO2
print ("MACRO1 yes, MACRO2 yes\n");
#else
print ("MACRO1 yes, MACRO2 no\n");
#endif
#else
#ifdef MACRO2
print ("MACRO1 no, MACRO2 yes\n");
#else
print ("MACRO1 no, MACRO2 no\n");
#endif
#endif
You also can put an #if-#else-#endif
construct or an #ifndef-#else-#endif
construct inside an #ifdef-#else-#endif
construct, or vice versa. The only restriction is that the inner
construct must be completely contained in one part of the outer
construct.
Another preprocessor command that is quite useful is the #include
command. This command tells the C preprocessor to include the
contents of the specified file as part of the program.
The syntax for the #include command is
#include filename
filename is the name of the file to be included.
For example, the following command includes the contents of myincfile.h
as part of the program:
#include <myincfile.h>
When an #include statement is found in a Perl program,
the C preprocessor searches for the file in the current directory
and the /usr/local/lib/perl directory. (The I
option, described in the following section, enables you to search
in other directories.) To instruct the C preprocessor to search
only the current directory, enclose the filename in double
quotation marks rather than angle brackets.
#include "myincfile.h"
This command limits the search for myincfile.h to the
current directory.
You can specify an entire pathname in an #include
statement, as in the following example:
#include "/u/dave/myincfile.h"
This command retrieves the contents of /u/dave/myincfile.h
and adds them to the program.
Note: Perl also enables you to include other files as part of a program using the require statement. For more information on require, refer to Day 18, "Object-Oriented Programming."
You use the I option with the P
option. It enables you to specify where to look for include files to be
processed by the C preprocessor. For example
perl -P -I /u/dave/myincdir testfile
This command tells the Perl interpreter to search the
directory /u/dave/myincdir for include files (as well as
the default directories).
To specify multiple directories to search, repeat the I
option:
perl -P -I /u/dave/dir1 -I /u/dave/dir2 testfile
This command searches in both /u/dave/dir1 and /u/dave/dir2.
Note: The directories specified in the I option also are added to the system variable @INC. This technique ensures that the require function can search in the same directories as the C preprocessor.
For more information on @INC, refer to Day 17, "System Variables." For more information on require, refer to Day 18.
One of the most common tasks in Perl programs and in UNIX
commands is to read the contents of several input files one line at
a time and process each input line as it is read. In these
programs and commands, the names of the input files are supplied on
the command line. A simple example is the UNIX command cat:
$ cat file1 file2 file3 ...
This command reads one line of input at a time and writes it
to the standard output file.
In Perl, one way to read the contents of several input files,
one line at a time, is to enclose the <> operator in
a while loop:
while ($line = <>) {
# process $line in here
}
Another method is to specify the n option. This
option takes your program and executes it once for each line of
input in each of the files specified on the command line.
Listing 16.3 is a simple example of a program that uses the n
option. It puts asterisks around each input line and then prints
it.
Listing 16.3. A simple program that
uses the n option.
1: #!/usr/local/bin/perl -n
2:
3: # input line is stored in the system variable $_
4: $line = $_;
5: chop ($line);
6: printf ("* %-52s *\n", $line);
$ program16_3
* This test file has only one line in it. *
$
The n option encloses the program shown here in
an invisible while loop. Each time the program is
executed, the next line of input from one of the input files is
read and is stored in the system variable $_. Line 4 takes
this line and copies it into another scalar variable, $line;
line 5 then removes the last character--the trailing newline
character--from this line.
Line 6 uses printf to write the input line to the
standard output file. Because printf is formatting the input,
the asterisks all appear in the same columns (column 1 and column
56) on your screen.
Note: The previous program is equivalent to the following Perl program (which does not use the n option):
#!/usr/local/bin/perl
while (<>) {
# input line is stored in the system variable $_
$line = $_;
chop ($line);
printf ("* %-72s *\n", $line);
}
The n and e options work well
together. For example, the following command is equivalent to the cat
command:
$ perl -n -e "print $_;" file1 file2 file3
The print $_; argument supplied with the e
option is a one-line Perl program. Because the n option
executes the program once for each input line, and reads each
input line into the system variable $_, the statement
print $_;
prints each input line in turn, which is exactly what the cat
command does. (Note that the parentheses that normally enclose the
argument passed to print have been omitted in this case.)
The previous command can be made even simpler:
$ perl -n -e "print" file1 file2 file3
By default, if no argument is supplied, print assumes
that it is to print the contents of $_. And, if the program
consists of a single statement, there is no need to include the
closing semicolon.
The pattern matching and substitution operators also operate
on $_ by default. For example, the following statement
examines the contents of $_ and searches for a digit:
$found = /[0-9]/;
This default behavior makes it easy to include a search or a
substitution in a single-line command. For example
$ perl -n -e "print if /[0-9]/" file1 file2 file3
This command reads each line of the files file1, file2,
and file3. If an input line contains a digit, it is printed.
Note: Several other functions use $_ as the default scalar variable to operate on, which makes those functions ideal for use with the n and e options. A full list of these functions is provided in the description of the $_ system variable, which is contained in Day 17.
The p option is similar to the n
option: it reads each line of its input files in turn. However,
the p option also prints each line it reads.
This means, for example, that you can simulate the behavior of
the UNIX cat command with the following command:
$ perl -p -e ";" file1 file2 file3
Here, the ; is a Perl program consisting of one
statement that does nothing.
The p option is designed for use with the i
option, described in the following section.
Note: If both the p and the n options are specified, the n option is ignored.
As you have seen, the n and p
options read lines from the files specified on the command line. The i
option, when used with the p option, takes the input
lines being read and writes them back out to the files from which
they came. This process enables you to edit files using commands similar
to those used in the UNIX sed command.
For example, consider the following command:
$ perl -p -i -e "s/abc/def/g;" file1 file2 file3
This command contains a one-line Perl program that examines
the scalar variable $_ and changes all occurrences of abc
into def. (Recall that the substitution operator operates
on $_ if the =~ operator is not specified.) The p
option ensures that $_ is assigned each line of each input
file in turn and that the program is executed once for each input
line. Thus, this command changes all occurrences of abc in
the files file1, file2, and file3 to def.
Caution: Do not use the i option with the n option unless you know what you're doing. The following command also changes all occurrences of abc to def, but it doesn't write out the input lines after it changes them:
$ perl -n -i -e "s/abc/def/g;" file1 file2 file3
Because the i option specifies that the input files are to be edited, the result is that the contents of file1, file2, and file3 are completely destroyed.
The i option also works on programs that do not
use the p option but do contain the <> operator inside
a loop. For example, consider the following command:
$ perl -i file1 file2 file3
In this case, the Perl interpreter copies the first file, file1,
to a temporary file and opens the temporary file for reading.
Then, it opens file1 for writing and sets the default
output file (the file used by calls to print, write,
and printf) to be file1.
After the program finishes reading the temporary file to which file1
was copied, it then copies file2 to a temporary file,
opens it for reading, opens file2 for writing, and sets
the default output file to be file2. This process
continues until the program runs out of input files.
Listing 16.4 is a simple example of a program that edits using
the i option and the <> operator. This
program evaluates any arithmetic expressions (containing
integers) it sees on a single line and replaces them with their
results.
Listing 16.4. A program that edits
files using the i option.
1: #!/usr/local/bin/perl -i
2:
3: while ($line = <>) {
4: while ($line =~
5: s#\d+\s*[*+-/]\s*\d+(\s*[*+-/]\s*\d+)*#<x>#) {
6: eval ("\$result = $&;");
7: $line =~ s/<x>/$result/;
8: }
9: print ($line);
10: }
This program produces no output because output is written to
the files specified on the command line.
The <> operator at the beginning of the while
loop (line 3) reads a line at a time from the input file or
files. Each line is searched using the pattern shown in line 5.
This pattern matches any substring containing the following
elements (in the order given):
This pattern is replaced by a placeholder substring, <x>.
Lines 6 and 7 are executed once for each pattern matched in
the input line. The matched pattern, an arithmetic expression, is automatically
stored in the system variable $&; line 6 substitutes
this expression into a character string and passes this character
string to the function eval. The call to eval
creates a subprogram that evaluates the expression and returns
the result in the scalar variable $result. Line 7 replaces
the placeholder, <x>, with the result returned in $result.
When all the arithmetic expressions have been evaluated and
substituted for, the inner while loop terminates, and line
9 calls print. Because the i option has been
set, the line is written back to the original input file from
which it came.
Note: Even though you do not know the name of the file variable that represents the file being edited, you can still set the default output file variable to some other file and change it back later.
To perform this task, recall that the select function returns the file variable associated with the current default file:
$editfile = select (MYFILE); # change default file # do your write operations here select ($editfile); # change default file back
After the second select call has been performed, the default output file is, once again, the file being edited.
By default, the i option overwrites the existing
input files. If you wish, you can save a copy of the original
input file or files before overwriting them. To do this, specify
a file extension with the i option:
$ perl -i .old file1 file2 file3
Here, the .old file extension specified with the i
option tells the Perl interpreter to copy file1 to file1.old
before overwriting it. Similarly, the interpreter copies file2
to file2.old, and file3 to file3.old.
The file extension specified with the i option
can be any character string. By convention, file extensions
usually begin with a period; this convention makes it easier for
you to spot them when you list the files in your directory.
Tip: If you are using the i option with a program you are not familiar with, it is a good idea to specify a file extension. Doing so ensures that your files are not damaged if the program does not work the way you expect.
The a option is used with the n or p
option. If the a option is set, each input line that
is read is automatically split into a list of "words"
(sequences of characters that are not white space); this list of words
is stored in a special system array variable named @F.
For example, if your input file contains the line
This is a test.
and if a program that is called with the a option
reads this line, the array @F contains the list
("This", "is", "a", "test.")
The a option is useful for extracting information
from files. Suppose that your input files contain records of the
form
company_name quantity_ordered total_cost
such as, for example,
JOHN H. SMITH 10 47.32
Listing 16.5 shows how you can use the a option
to easily produce a program that extracts the quantity and total
cost fields from these files.
Listing 16.5. An example of the a
option.
1: #!/usr/local/bin/perl
2:
3: # This program is called with the -a and -n options.
4: while ($F[0] =~ /[^\d.]/) {
5: shift (@F);
6: next if (!defined($F[0]));
7: }
8: print ("$F[0] $F[1]\n");
$ perl -a -n program16_5
10 47.32
106 11.54
$
Because the program is called with the a option,
the array variable @F contains a list, each element of
which is a word from the current input line.
Because the company name in the input file might consist of
more than one word (such as JOHN H. SMITH), the while
loop in lines 4-7 is needed to get rid of everything that isn't a
quantity field or a total cost field. After these fields have
been eliminated, line 8 can print the useful fields.
Note that this program just skips over any nonstandard input
lines.
The -F option, defined only in Perl 5, is designed to
be used in conjunction with the -a option, and specifies
the pattern to use when you split input lines into words. For
example, suppose Listing 16.5 is called as follows:
$ perl -a -n -F:: program16_5
In this case, the words in the input file are assumed to be
separated by a pair of colons, which means that the program is expecting
to read lines such as the following:
JOHN H. SMITH::10::47.32
Note: The -F option ignores opening and closing slashes if they are present, because it interprets them as pattern delimiters. This means that the following program invocations are identical:
$ perl -a -n -F:: program16_5 $ perl -a -n -F/::/ program16_5
In all the programs you have seen so far, when the Perl
interpreter reads a line from an input file or from the keyboard,
it reads until it sees a newline character. You can tell Perl
that you want the "end-of-line" input character to be
something other than the newline character by specifying the 0 option.
(The 0 here is the digit zero, not the letter O.)
With the 0 option, you specify which character is
to be the end-of-line character for your input file by providing
its ASCII representation in base 8 (octal). For example, the
command
$ perl -0 040 prog1 infile
calls the Perl program named prog1 and specifies that
it is to use the space character (ASCII 32, or 40 octal) as the end-of-line
character when it reads the input file infile (or any
other input file).
This means, for example, that if this program reads an input
file containing the following:
Test input. Here's another line.
it will read a total of four input lines:
The 0 option provides a quick way to read an
input file one word at a time, assuming that each line ends with
at least one blank character. (If it doesn't, you can quickly
write a Perl program that uses the i and p
options to add a space to the end of each line in each file.)
Listing 16.6 is an example of a program that uses 0
to read an input file one word at a time.
Listing 16.6. A program that uses the 0
option.
1: #!/usr/local/bin/perl -0040
2:
3: while ($line = <>) {
4: $line =~ s/\n//g;
5: next if ($line eq "");
6: print ("$line\n");
7: }
$ program16_6 file1
This
line
contains
five
words.
$
The header comment (line 1) specifies that the 0
option is to be used and that the space character is to become the end-of-line
character. (Recall that you do not need a space between an option
and the value associated with an option.) This means that line 3
reads from the input file until it sees a blank space.
Not everything read by line 3 is a word, of course. There are
two types of lines that are not particularly useful that the program
must check for:
Line 4 checks whether any newline characters are contained in
the current input line. The substitution in this line is a global substitution,
because an input line can contain two or more newline characters.
(This occurs when an input file contains a blank line.)
After all the newline characters have been eliminated, line 5
checks whether the resulting input line is empty. If it is, the program
continues with the next input line. If the resulting input line
is not empty, the input line must be a useful word, and line 6
prints it.
Note: If you specify the value 00 (octal zero) with the 0 option, the Perl interpreter reads until it sees two newline characters. This enables you to read an entire paragraph at a time.
If you specify no value with the 0 option, the null character (ASCII 0) is assumed.
The l option enables you to specify an output
end-of-line character for use in print statements.
Like the 0 option, the l option
accepts a base-8 (octal) integer that indicates the ASCII representation
of the character you want to use.
When the l option is specified, the Perl
interpreter does two things:
If you do not specify a value with the l option,
the Perl interpreter uses the character specified by the 0
option, if it is defined. If 0 has not been
specified, the end-of-line character is defined to be the newline
character.
Caution: If you are using both the l and the 0 option and you do not provide a value with the l option, the order of the options becomes significant, because the options are processed from left to right.
If the l option appears first, the output end-of-line character is set to the newline character. If the 0 option appears first, the output end-of-line character (set by l) becomes the same as the input end-of-line character (set by 0).
Listing 16.7 is a simple example of a program that uses l.
Listing 16.7. A program that uses the l
option.
1: #!/usr/local/bin/perl -l014
2:
3: print ("Hello!");
4: print ("This is a very simple test program!");
$ program16_7
Hello!
This is a very simple test program!
$
The l014 option in the header comment in line 1
sets the output line character to the newline character. This means
that every print statement in the program will have a
newline character added to it. As a consequence, the output from lines
3 and 4 appear on separate lines.
Note: You can control the input and output end-of-line characters also by using the system variables $/ and $\. For a description of these system variables, refer to Day 17.
The x option enables you to process a Perl
program that appears in the middle of a file (such as a file
containing an electronic mail message, which usually contains
some mail routing information). When the x option is
specified, the Perl interpreter ignores every line in the program
until it sees a header comment (a comment beginning with the #!
characters).
Caution: If you are using Perl 5, the header comment must also contain the word "perl."
After the Perl interpreter sees the header comment, it then
processes the program as usual until one of the following three conditions
occurs:
_ _END_ _
If the Perl interpreter reads one of the end-of-program lines
(the second and third conditions listed previously), it ignores everything
appearing after that line in the file.
Listing 16.8 is a simple example of a program that works if
run with the x option.
Listing 16.8. A Perl program
contained in a file.
1: Here is a Perl program that appears in the middle
2: of a file.
3: The stuff up here is junk, and the Perl interpreter
4: will ignore it.
5: The next line is the start of the actual program.
6: #!/usr/local/bin/perl
7:
8: print ("Hello, world!\n");
9: _ _END_ _
10: This line is also ignored, because it is not part
11: of the program.
$ program16_8
Hello, world!
$
If this program is started with the x option, the
Perl interpreter skips over everything until it sees line 6.
(Needless to say, if you try to run this program without
specifying the x option, the Perl interpreter will
complain.) Line 8 then prints the message Hello, world.
Line 9 is the special end-of-program line. When the Perl
interpreter sees this line, it skips the rest of the program.
Note: Of course, you can't specify the x option in the header comment itself, because the Perl interpreter has to know in advance that the program contains lines that must be skipped.
The following sections describe some of the more exotic
options you can pass to the Perl interpreter. You are not likely
to need any of these options unless you are doing something
unusual (and you really know what you are doing).
The u option tells the Perl interpreter to
generate a core dump file. This file can then be examined and
manipulated.
The U option tells the Perl interpreter to enable
you to perform "unsafe" operations in your program.
(Basically, you'll know that an operation is considered unsafe
when the Perl interpreter doesn't let you perform it without
specifying the U option!)
The S option tells the Perl interpreter that your
program might be contained in any of the directories specified by
your PATH environment variable. The Perl interpreter
checks each of these directories in turn, in the order in which
they are specified, to see whether your program is located there.
(This is the normal behavior of the shell for commands in the
UNIX environment.)
Note: You need to use S only if you are running your Perl program using the perl command, as in
$ perl myprog
If you are running the program using a command such as
$ myprog
your shell (normally) treats it like any other command and searches the directories specified in your PATH environment variable even if you don't specify the S option.
The D option sets the Perl interpreter's internal
debugging flags. This option is specified with an integer value
(for example, D 256).
For details on this option, refer to the online manual page
for Perl.
Note: The internal debugging flags specified by D have nothing to do with the Perl debugger, which is specified by the d option.
The debugging flags specified by D provide information on how Perl itself works, not on how your program works.
The -T option specifies that data obtained from the
outside world cannot be used in any command that modifies your
file system. This feature enables you to write secure programs
for system administration tasks.
This option is only available in Perl 5. If you are running
Perl 4, use a special version of Perl named taintperl. For
details on taintperl, see the online documentation
supplied with your Perl distribution.
One final option that is quite useful is d. This
option tells the Perl interpreter to run your program using the
Perl debugger. For a complete description of the Perl debugger
and how to use it, refer to Day 21, "The Perl
Debugger."
Note: If you are specifying the d option, you still can use other options.
Today you learned how to specify options when you run your
Perl programs. An option is a dash followed by a single letter, and
optionally followed by a value to be associated with the option. Options
lacking associated values can be grouped together.
You can specify options in two ways: on the command line and
in the header comment. Only one option or group of options can be
supplied in the header comment.
Available options include those that list the Perl version
number, check your syntax, display warnings, allow single-line programs
on the command line, invoke the C preprocessor, automatically read
from the input files, and edit files in place.
#include <perldef.h>
#if BSD #endif
-I /usr/local/include/bsdperl
The Workshop provides quiz questions to help you solidify your
understanding of the material covered, and exercises to give you
experience in using what you've learned. Try and understand the quiz
and exercise answers before you go on to tomorrow's lesson.
$ perl -i -n -e "s/abc/def/g";