Today's lesson describes three groups of built-in Perl
functions:
Caution: Many of the functions described today use features of the UNIX operating system. If you are using Perl on a machine that is not running UNIX, some of these functions might not be defined or might behave differently.
Check the documentation supplied with your version of Perl for details on which functions are supported or emulated on your machine.
Perl provides a wide range of functions that manipulate both
the program currently being executed and other programs (also called
processes) running on your machine. These functions are divided into
four groups:
The following sections describe these four groups of process-
and program-manipulation functions.
Several built-in functions provide different ways of creating
processes: eval, system, fork, pipe, exec,
and syscall. These functions are described in the
following subsections.
The eval function treats a character string as an
executable Perl program.
The syntax for the eval function is
eval (string);
Here, string is the character string that is to
become a Perl program.
For example, these two lines of code:
$print = "print (\"hello, world\\n\");"; eval ($print);
print the following message on your screen:
hello, world
The character string passed to eval can be a
character-string constant or any expression that has a value
which is a character string. In this example, the following
string is assigned to $print, which is then passed to eval:
print ("hello, world\n");
The eval function uses the special system variable $@
to indicate whether the Perl program contained in the character
string has executed properly. If no error has occurred, $@
contains the null string. If an error has been detected, $@
contains the text of the message.
The subprogram executed by eval affects the program
that called it; for example, any variables that are changed by
the subprogram remain changed in the main program. Listing 13.1
provides a simple example of this.
Listing 13.1. A program that
illustrates the behavior of eval.
1: #!/usr/local/bin/perl
2:
3: $myvar = 1;
4: eval ("print (\"hi!\\n\"); \$myvar = 2;");
5: print ("the value of \$myvar is $myvar\n");
$ program13_1
hi!
the value of $myvar is 2
$
The call to eval in line 4 first executes the statement
print ("hi!\n");
Then it executes the following assignment, which assigns 2
to $myvar:
$myvar = 2;
The value of $myvar remains 2 in the main program,
which means that line 5 prints the value 2. (The backslash
preceding the $ in $myvar ensures that the Perl
interpreter does not substitute the value of $myvar for
the name before passing it to eval.)
Note: If you like, you can leave off the final semicolon in the character string passed to eval, as follows:
eval ("print (\"hi!\\n\"); \$myvar = 2");
As before, this prints hi! and assigns 2 to $myvar.
The eval function has one very useful property: If the
subprogram executed by eval encounters a fatal error, the
main program does not halt. Instead, the subprogram terminates,
copies the error message into the system variable $@, and
returns to the main program.
This feature is very useful if you are moving a Perl program
from one machine to another and you are not sure whether the new
machine contains a built-in function you need. For example,
Listing 13.2 tests whether the tell function is
implemented.
Listing 13.2. A program that uses eval
to test whether a function is implemented.
1: #!/usr/local/bin/perl
2:
3: open (MYFILE, "file1") || die ("Can't open file1");
4: eval ("\$start = tell(MYFILE);");
5: if ($@ eq "") {
6: print ("The tell function is defined.\n");
7: } else {
8: print ("The tell function is not defined!\n");
9: }
$ program13_2
The tell function is defined.
$
The call to eval in line 4 creates a subprogram that
calls the function tell. If tell is defined, the
subprogram assigns the location of the next line (which, in this
case, is the first line) to read to the scalar variable $start.
If tell is not defined, the subprogram places the error
message in $@.
Line 5 checks whether $@ is the null string. If $@
is empty, the subprogram in line 4 executed without generating an
error, which means that the tell function is implemented.
(Because assignments performed in the subprogram remain in effect
in the main program, the main program can call seek using
the value in $start, if desired.) If $@ is not
empty, the program assumes that tell is not defined, and
it prints a message proclaiming that fact. (This program is
assuming that the only reason the subprogram could fail is
because tell is not defined. This is a reasonable
assumption, because you know that the file referenced by MYFILE
has been successfully opened.)
Tip: Although eval is very useful, it is best to use it only for small programs. If you need to generate a larger program, it might be better to write the program to a file and call system to execute it. (The system function is described in the following section.)
Because statements executed by eval affect the program that calls it, the behavior of complicated programs might become difficult to track if eval is used to excess.
You have seen examples of the system function in
earlier lessons.
The syntax for the system function is
system (list);
This function is passed a list as follows: The first element
of the list contains the name of a program to execute, and the
other elements are arguments to be passed to the program.
When system is called, it starts a process that runs
the program and waits until the process terminates. When the
process terminates, the error code is shifted left eight bits,
and the resulting value becomes system's return value.
Listing 13.3 is a simple example of a program that calls system.
Listing 13.3. A program that calls system.
1: #!/usr/local/bin/perl
2:
3: @proglist = ("echo", "hello, world!");
4: system(@proglist);
$ program13_3
hello, world!
$
In this program, the call to system executes the UNIX
program echo, which displays its arguments. The argument passed
to echo is hello, world!.
Caution: When you start another program using system, output data might be mixed, out of sequence, or duplicated.
To get around this problem, set the system variable $|, defined for each file, to 1. The following is an example:
select (STDOUT); $| = 1; select (STDERR); $| = 1;
When $| is set to 1, no buffer is defined for that file, and output is written out right away. This ensures that the output behaves properly when system is called.
See "Redirecting One File to Another" on Day 12, "Working with the File System," for more information on select and $|.
The fork function creates two copies of your program:
the parent process and the child process. These copies execute simultaneously.
The syntax for the fork function is
procid = fork();
fork returns zero to the child process and a nonzero
value to the parent process. This nonzero value is the process
ID of the child process. (A process ID is an integer that
enables the system to distinguish this process from the other
processes currently running on the machine.)
The return value from fork enables you to determine
which process is the child process and which is the parent. For
example:
$retval = fork();
if ($retval == 0) {
# this is the child process
exit; # this terminates the child process
} else {
# this is the parent process
}
If fork is unable to execute, the return value is a
special undefined value for which you can test by using the defined
function. (For more information on defined, see Day 14,
"Scalar Conversion and List-Manipulation Functions.")
To terminate a child process created by fork, use the
built-in function exit, which is described later in today's
lesson.
Caution: Be careful when you use the fork function. The following are a few examples of what can go wrong:
The pipe function is designed to be used in conjunction
with the fork function. It provides a way for the child
and parent processes to communicate.
The syntax for the pipe function is
pipe (infile, outfile);
pipe requires two arguments, each of which is a file
variable that is not currently in use--in this case, infile
and outfile. After pipe has been called,
information sent via the outfile file variable can be
read using the infile file variable. In effect, the output from outfile
is piped to infile.
To use pipe with fork, do the following:
The process in which outfile is still open can
now send data to the process in which infile is
still open. (The child can send data to the parent, or vice
versa, depending on which process closes input and which closes
output.)
Listing 13.4 shows how pipe works. It uses fork
to create a parent and child process. The parent process reads a
line of input, which it passes to the child process. The child
process then prints it.
Listing 13.4. A program that uses fork
and pipe.
1: #!/usr/local/bin/perl
2:
3: pipe (INPUT, OUTPUT);
4: $retval = fork();
5: if ($retval != 0) {
6: # this is the parent process
7: close (INPUT);
8: print ("Enter a line of input:\n");
9: $line = <STDIN>;
10: print OUTPUT ($line);
11: } else {
12: # this is the child process
13: close (OUTPUT);
14: $line = <INPUT>;
15: print ($line);
16: exit (0);
17: }
$ program13_4
Enter a line of input:
Here is a test line
Here is a test line
$
Line 3 defines the file variables INPUT and OUTPUT.
Data sent to OUTPUT can be now read from INPUT.
Line 4 splits the program into a parent process and a child
process. Line 5 then determines which process is which.
The parent process executes lines 7-10. Because the parent
process is sending data through OUTPUT, it has no need to access INPUT;
therefore, line 7 closes INPUT.
Lines 8 and 9 obtain a line of data from the standard input
file. Line 10 then sends this line of data to the child process
via the file variable OUTPUT.
The child process executes lines 13-16. Because the child
process is receiving data through INPUT, it does not need access to OUTPUT;
therefore, line 13 closes OUTPUT.
Line 14 reads data from INPUT. Because data from OUTPUT
is piped to INPUT, the program waits until the data is actually
sent before continuing with line 15.
Line 16 uses exit to terminate the child process. This
also automatically closes INPUT.
Note that the <INPUT> operator behaves like any
other operator that reads input (such as, for instance, <STDIN>).
If there is no more data to read, INPUT is assumed to be
at the "end of file," and <INPUT> returns
the null string.
Caution: Traffic through the file variables specified by pipe can flow only in one direction. You cannot have a process both send and receive on the same pipe.
If you need to establish two-way communication, you can open two pipes, one in each direction.
The exec function is similar to the system
function, except that it terminates the current program before
starting the new one.
The syntax for the exec function is
exec (list);
This function is passed a list as follows: The first element
of the list contains the name of a program to execute, and the
other elements are arguments to be passed to the program.
For example, the following statement terminates the Perl
program and starts the command mail dave:
exec ("mail dave");
Like system, exec accepts additional arguments
that are assumed to be passed to the command being invoked. For
example, the following statement executes the command vi file1:
exec ("vi", "file1");
You can specify the name that the system is to use as the
program name, as follows:
exec "maildave" ("mail dave");
Here, the command mail dave is invoked, but the program
name is set to maildave. (This affects the value of the
system variable $0, which contains the name of the running
program. It also affects the value of argv[0] if the
program to be invoked was originally written in C.)
exec often is used in conjunction with fork:
when fork splits into two processes, the child process starts
another program using exec.
Caution: exec has the same output-buffering problems as system. See the description of system, earlier in today's lesson, for a description of these problems and how to deal with them.
The syscall function calls a system function.
The syntax for the syscall function is
syscall (list);
syscall expects a list as its argument. The first
element of the list is the name of the system call
to invoke, and the remaining elements are arguments to be passed
to the call.
If an argument in the list passed to syscall is a
numeric value, it is converted to a C integer (type int). Otherwise,
a pointer to the string value is passed.See the syscall
UNIX manual page or the Perl documentation for more details.
Note: The Perl header file syscall.ph must be included in order to use syscall:
require ("syscall.ph")
For more information on require, see Day 20, "Miscellaneous Features of Perl."
The following sections describe the functions that terminate
either the currently executing program or a process running elsewhere
on the system: die, warn, exit, and kill.
The die and warn functions provide a way for
programs to pass urgent messages back to the user who is running
them.
The die function terminates the program and prints an
error message on the standard error file.
The syntax for the die function is
die (message);
message is the error message to be displayed.
For example, the call
die ("Cannot open input file\n");
prints the following message and then exits:
Cannot open input file
die can accept a list as its argument, in which case
all elements of the list are printed.
@diemsg = ("I'm about ", "to die\n");
die (@diemsg);
This prints out the following message and then exits:
I'm about to die
If the last argument passed to die ends with a newline
character, the error message is printed as is. If the last
argument to die does not end with a newline character, the
program filename and line number are printed, along with the line
number of the input file (if applicable). For example, if line 6 of
the file myprog is
die ("Cannot open input file");
the message it prints is
Cannot open input file at myprog line 6.
The warn function, like die, prints a message on
the standard error file.
The syntax for the warn function is
warn (message);
As with die, message is the message to be
displayed.
warn, unlike die, does not terminate. For
example, the statement
warn ("Input file is empty");
sends the following message to the standard error file, and
then continues executing.
Input file is empty at myprog line 76.
If the string passed to warn is terminated by a newline
character, the warning message is printed as is. For example, the statement
warn("Danger! Danger!\n");
sends
Danger! Danger!
to the standard error file.
Note: If eval is used to invoke a program that calls die, the error message printed by die is not printed; instead, the error message is assigned to the system variable $@.
The exit function terminates a program.
If you like, you can specify a return code to be passed to the
system by passing exit an argument using the following
syntax:
exit (retcode);
retcode is the return code you want to pass.
For example, the following statement terminates the program
with a return code of 2:
exit(2);
The kill function enables you to send a signal to a
group of processes.
The syntax for invoking the kill function is
kill (signal, proclist);
In this case, signal is the numeric signal to
send. (For example, a signal of 9 kills the listed processes.) proclist
is a list of process IDs (such as the child process ID returned
by fork).
signal also can be a signal name enclosed in
quotes, as in "INT".
For more details on the signals you can send, refer to the kill
UNIX manual page.
The sleep, wait, and waitpid functions
delay the execution of a particular program or process.
The sleep function suspends the program for a specified
number of seconds.
The syntax for the sleep function is
sleep (time);
time is the number of seconds to suspend program
execution.
The function returns the number of seconds that the program
was actually stopped.
For example, the following statement puts the program to sleep
for five seconds:
sleep (5);
The wait function suspends execution and waits for a
child process to terminate (such as a process created by fork).
The wait function requires no arguments:
procid = wait();
When a child process terminates, wait returns the
process ID, procid, of the process that has terminated.
If no child processes exist, wait returns 1.
The waitpid function waits for a particular child
process.
The syntax for the waitpid function is
waitpid (procid, waitflag);
procid is the process ID of the process to wait
for, and waitflag is a special wait flag (as
defined by the waitpid or wait4 manual page). By
default, waitflag is 0 (a normal wait). waitpid
returns 1 if the process is found and has terminated, and
it returns 1 if the child process does not exist.
Listing 13.5 shows how waitpid can be used to control
process execution.
Listing 13.5. A program that uses waitpid.
1: #!/usr/local/bin/perl
2:
3: $procid = fork();
4: if ($procid == 0) {
5: # this is the child process
6: print ("this line is printed first\n");
7: exit(0);
8: } else {
9: # this is the parent process
10: waitpid ($procid, 0);
11: print ("this line is printed last\n");
12: }
$ program13_5
this line is printed first
this line is printed last
$
Line 3 splits the program into a parent process and a child
process. The parent process is returned the process ID of the
child process, which is stored in $procid.
Lines 6 and 7 are executed by the child process. Line 6 prints
the following line:
this line is printed first
Line 7 then calls exit, which terminates the child
process.
Lines 10 and 11 are executed by the parent process. Line 10
calls waitpid and passes it the ID of the child process; therefore,
the parent process waits until the child process terminates
before continuing. This means that line 11, which prints the
second line, is guaranteed to be executed after the first line is
printed.
As you can see, wait can be used to force the order of
execution of processes.
Note: For more information on the possible values that can be passed as waitflag, examine the file wait.ph, which is available from the same place you retrieved your copy of Perl. (It might already be on your system.) You can find out more also by investigating the waitpid and wait4 manual pages.
The caller, chroot, local, and times
functions perform various process and program-related actions.
The caller function returns the name and the line
number of the program that called the currently executing
subroutine.
The syntax for the caller function is
subinfo = caller();
caller returns a three-element list, subinfo,
consisting of the following:
This routine is used by the Perl debugger, which you'll learn
about on Day 21, "Using the Perl Debugger." For more information
on packages, refer to Day 20, "Miscellaneous Features of
Perl."
The chroot function duplicates the functionality of the chroot
function call.
The syntax for the chroot function is
chroot (dir);
dir is the new root directory.
In the following example, the specified directory becomes the
root directory for the program:
chroot ("/u/jqpublic");
For more information, refer to the chroot manual page.
The local function was introduced on Day 9, "Using
Subroutines." It declares that a copy of a named variable is
to be defined for a subroutine. (Refer to that day for examples
that use local inside a subroutine.)
local can be used also to define a copy of a variable
for use inside a statement block (a collection of
statements enclosed in brace brackets), as follows:
if ($var == 14) {
local ($localvar);
# stuff goes here
}
This defines a local copy of the variable $localvar for
use inside the statement block. Any other copies of $localvar
that exist are not affected by the changes to this local copy.
DON'T use local inside a loop, as in this example:
while ($var <= 14) {
local ($myvar);
# stuff goes here
}
Here, a new copy of $myvar is defined each time the loop iterates. This is probably not what you want.
The times function returns the amount of job time
consumed by this program and any child processes of this program.
The syntax for the times function is
timelist = times
As you can see, times accepts no arguments. It returns timelist,
a list consisting of the following four floating-point numbers:
Perl provides functions that perform the standard
trigonometric operations, plus some other useful mathematical
operations. The following sections describe these functions: sin, cos, atan2, sqrt, exp, log, rand,
and srand.
The sin and cos functions are passed a scalar
value and return the sine and cosine, respectively, of the value.
The syntax of the sin and cos functions is
retval = sin (value); retval = cos (value);
value is a placeholder here. It can be the value
stored in a scalar variable or the result of an expression; it is
assumed to be in radians. See the following section, "The atan2
Function," to find out how to convert from radians to
degrees.
The atan2 function calculates and returns the
arctangent of one value divided by another, in the range
[gp] to [gp].
The syntax of the atan2 function is
retval = atan2 (value1, value2);
If value1 and value2 are equal, retval
is the value of [gp] divided by 4.
Listing 13.6 shows how you can use this to convert from
degrees to radians.
Listing 13.6. A program that contains
a subroutine that converts from degrees to radians.
1: #!/usr/local/bin/perl
2:
3: $rad90 = °rees_to_radians(90);
4: $sin90 = sin($rad90);
5: $cos90 = cos($rad90);
6: print ("90 degrees:\nsine is $sin90\ncosine is $cos90\n");
7:
8: sub degrees_to_radians {
9: local ($degrees) = @_;
10: local ($radians);
11:
12: $radians = atan2(1,1) * $degrees / 45;
13: }
$ program13_6
90 degrees:
sine is 1
cosine is 6.1230317691118962911e-17
$
The subroutine degrees_to_radians converts from degrees
to radians by multiplying by [gp] divided by 180. Because atan2(1,1)
returns [gp] divided by 4, all the subroutine needs to do after
that is divide by 45 to obtain the number of radians.
In the main body of the program, line 3 converts 90 degrees to
the equivalent value in radians ([gp] divided by 2). Line 4 then passes
this value to sin, and line 5 passes it to cos.
Note: The trigonometric operations provided here are sufficient to enable you to perform the other important trigonometric operations. For example, to obtain the tangent of a value, obtain the sine and cosine of the value by calling sin and cos, and then divide the sine by the cosine.
The sqrt function returns the square root of the value
it is passed.
The syntax for the sqrt function is
retval = sqrt (value);
value can be any positive number.
The exp function returns the number e ** value,
where e is the standard mathematical constant (the base
for the natural logarithm) and value is the
argument passed to exp.
The syntax for the exp function is
retval = exp (value);
To retrieve e itself, pass exp the value 1.
The log function takes a value and returns the natural
(base e) logarithm of the value.
The syntax for the log function is
retval = log (value);
The log function undoes exp; the expression
$var = log (exp ($var));
always leaves $var with the value it started with (if
you factor in round-off error).
The abs function returns the absolute value of a
number. This is defined as follows: if a value is less than zero, abs
negates it and returns the result.
$result = $abs(-3.5); # returns 3.5
Otherwise, the result is identical to the value:
$result = $abs(3.5); # returns 3.5 $result = $abs(0); # returns 0
The syntax for the abs function is
retval = abs (value);
value can be any number.
Note: abs is not defined in Perl 4.
The rand and srand functions enable Perl
programs to generate random numbers.
The rand function is passed an integer value and
generates a random floating-point number between 0 and the value.
The syntax for the rand function is
retval = rand (num);
num is the integer value passed to rand,
and retval is a random floating-point number
between 0 and the num.
For example, the following statement generates a number
between 0 and 10 and returns it in $retval:
$retval = rand (10);
srand initializes the random-number generator used by rand.
This ensures that the random numbers generated are, in fact, random.
(If you do not use srand, you'll get the same set of random
numbers each time.)
The syntax for the srand function is
srand (value);
srand accepts an integer value as an argument; if no
argument is supplied, srand calls the time function
and uses its return value as the random-number seed.
For an example that uses rand and srand, see the
section titled "Returning a Value from a Subroutine" on
Day 9.
Tip: The following values and functions return numbers that can make useful random-number seeds:
For best results, combine two or more of these using the | (bitwise "or") operator.
This section describes the built-in Perl functions that
manipulate character strings. These functions enable you to do
the following:
The index function provides a way of indicating the
location of a substring in a string.
The syntax for the index function is
position = index (string, substring);
string is the character string to search in, and substring
is the character string being searched for. position
returns the number of characters skipped before substring
is located; if substring is not found, position is
set to 1.
Listing 13.7 is a program that uses index to locate a
substring in a string.
Listing 13.7. A program that uses the index
function.
1: #!/usr/local/bin/perl
2:
3: $input = <STDIN>;
4: $position = index($input, "the");
5: if ($position >= 0) {
6: print ("pattern found at position $position\n");
7: } else {
8: print ("pattern not found\n");
9: }
$ program13 7
Here is the input line I have typed.
pattern found at position 8
$
This program searches for the first occurrence of the word the.
If it is found, the program prints the location of the pattern;
if it is not found, the program prints pattern not found.
You can use the index function to find more than one
copy of a substring in a string. To do this, pass a third
argument to index, which tells it how many characters to
skip before starting to search. For example:
$position = index($line, "foo", 5);
This call to index skips five characters before
starting to search for foo in the string stored in $line. As
before, if index finds the substring, it returns the total
number of characters skipped (including the number specified by
the third argument to index). If index does not
find the substring in the portion of the string that it searches,
it returns 1.
This feature of index enables you to find all
occurrences of a substring in a string. Listing 13.8 is a modified
version of Listing 13.7 that searches for all occurrences of the
in an input line.
Listing 13.8. A program that uses index
to search a line repeatedly.
1: #!/usr/local/bin/perl
2:
3: $input = <STDIN>;
4: $position = $found = 0;
5: while (1) {
6: $position = index($input, "the", $position);
7: last if ($position == -1);
8: if ($found == 0) {
9: $found = 1;
10: print ("pattern found characters skipped:");
11: }
12: print (" $position");
13: $position++;
14: }
15: if ($found == 0) {
16: print ("pattern not found\n");
17: } else {
18: print ("\n");
19: }
$ program13 8
Here is the test line containing the words.
pattern found characters skipped: 8 33
$
Line 6 of this program calls index. Because the initial
value of $position is 0, the first call to index
starts searching from the beginning of the string. Eight
characters are skipped before the first occurrence of the
is found; this means that $position is assigned 8.
Line 7 tests whether a match has been found by comparing $position
with 1, which is the value index returns when
it does not find the string for which it is looking. Because a
match has been found, the loop continues to execute.
When the loop iterates again, line 6 calls index again.
This time, index skips nine characters before beginning
the search again, which ensures that the previously found
occurrence of the is skipped. A total of 33 bytes are
skipped before the is found again. Once again, the loop
continues, because the conditional expression in line 7 is false.
On the final iteration of the loop, line 6 calls index
and skips 34 characters before starting the search. This time, the
is not found, index returns 1, and the
conditional expression in line 7 is true. At this point, the loop
terminates.
Note: To extract a substring found by index, use the substr function, which is described later in today's lesson.
The rindex function is similar to the index
function. The only difference is that rindex starts searching from
the right end of the string, not the left.
The syntax for the rindex function is
position = rindex (string, substring);
This syntax is identical to the syntax for index. string
is the character string to search in, and substring
is the character string being searched for. position
returns the number of characters skipped before substring
is located; if substring is not found, position
is set to 1.
The following is an example:
$string = "Here is the test line containing the words."; $position = rindex($string, "the");
In this example, rindex finds the second occurrence of the.
As with index, rindex returns the number of
characters between the left end of the string and the location of
the found substring. In this case, 33 characters are skipped, and $position
is assigned 33.
You can specify a third argument to rindex, indicating
the maximum number of characters that can be skipped. For
example, if you want rindex to find the first occurrence
of the in the preceding example, you can call it as
follows:
$string = "Here is the test line containing the words."; $position = rindex($string, "the", 32);
Here, the second occurrence of the cannot be matched,
because it is to the right of the specified limit of 32 skipped characters. rindex,
therefore, finds the first occurrence of the. Because
there are eight characters between the beginning of the string
and the occurrence, $position is assigned 8.
Like index, rindex returns 1 if it
cannot find the string it is looking for.
The length function returns the number of characters
contained in a character string.
The syntax for the length function is
num = length (string);
string is the character string for which you
want to determine the length, and num is the
returned length.
Here is an example using length:
$string = "Here is a string"; $strlen = length($string);
In this example, length determines that the string
in $string is 16 characters long, and it assigns 16 to $strlen.
Listing 13.9 is a program that calculates the average word
length used in an input file. (This is sometimes used to
determine the "complexity" of the text.) Numbers are
skipped.
Listing 13.9. A program that
demonstrates the use of length.
1: #!/usr/local/bin/perl
2:
3: $wordcount = $charcount = 0;
4: while ($line = <STDIN>) {
5: @words = split(/\s+/, $line);
6: foreach $word (@words) {
7: next if ($word =~ /^\d+\.?\d+$/);
8: $word =~ s/[,.;:]$//;
9: $wordcount += 1;
10: $charcount += length($word);
11: }
12: }
13: print ("Average word length: ", $charcount / $wordcount, "\n");
$ program13 9
Here is the test input.
Here is the last line.
^D
Average word length: 3.5
$
This program reads a line of input at a time from the standard
input file, breaking the input line into words. Line 7 tests whether
the word is a number, and skips it if it is. Line 8 strips any trailing
punctuation character from the word, which ensures that the
punctuation is not counted as part of the word length.
Line 10 calls length to retrieve the number of
characters in the word. This number is added to $charcount,
which contains the total number of characters in all of the words
that have been read so far. To determine the average word length
of the file, line 13 takes this value and divides it by the number
of words in the file, which is stored in $wordcount.
The tr function provides another way of determining the
length of a character string, in conjunction with the built-in
system variable $_.
The syntax for the tr function is
tr/sourcelist/replacelist/
sourcelist is the list of characters to replace,
and replacelist is the list of characters to
replace with. (For details, see the following listing and the
explanation provided with it.)
Listing 13.10 shows how tr works.
Listing 13.10. A program that uses tr
to retrieve the length of a string.
1: #!/usr/local/bin/perl
2:
3: $string = "here is a string";
4: $_ = $string;
5: $length = tr/a-zA-Z /a-zA-Z /;
6: print ("the string is $length characters long\n");
$ program13 10
the string is 16 characters long
$
Line 3 of this program creates a string named here is a
string and assigns it to the scalar variable $string.
Line 4 copies this string into a built-in scalar variable, $_.
Line 5 exploits two features of the tr operator that
have not yet been discussed:
In line 5, both the search pattern (the set of characters to
look for) and the replacement pattern (the characters to replace them
with) are the same. This pattern, /a-zA-Z /, tells tr
to search for all lowercase letters, uppercase letters, and blank spaces,
and then replace them with themselves. This pattern matches every
character in the string, which means that every character is
being translated.
Because every character is being translated, the number of
characters translated is equivalent to the length of the string.
This string length is assigned to the scalar variable $length.
tr can be used also to count the number of occurrences
of a specific character, as shown in Listing 13.11.
Listing 13.11. A program that uses tr
to count the occurrences of specific characters.
1: #!/usr/local/bin/perl
2:
3: $punctuation = $blanks = $total = 0;
4: while ($input = <STDIN>) {
5: chop ($input);
6: $total += length($input);
7: $_ = $input;
8: $punctuation += tr/,:;.-/,:;.-/;
9: $blanks += tr/ / /;
10: }
11: print ("In this file, there are:\n");
12: print ("\t$punctuation punctuation characters,\n");
13: print ("\t$blanks blank characters,\n");
14: print ("\t", $total - $punctuation - $blanks);
15: print (" other characters.\n");
$ program13 11
Here is a line of input.
This line, another line, contains punctuation.
^D
In this file, there are:
4 punctuation characters,
10 blank characters,
56 other characters.
$
This program uses the scalar variable $total and the
built-in function length to count the total number of
characters in the input file (excluding the trailing newline
characters, which are removed by the call to chop in line
5).
Lines 8 and 9 use tr to count the number of occurrences
of particular characters. Line 8 replaces all punctuation
characters with themselves; the number of replacements performed,
and hence the number of punctuation characters found, is added to the
total stored in $punctuation. Similarly, line 9 replaces
all blanks with themselves and adds the number of blanks found to the
total stored in $blanks. In both cases, tr operates
on the contents of the scalar variable $_, because the =~ operator
has not been used to specify another value to translate.
Line 14 uses $total, $punctuation, and $blanks
to calculate the total number of characters that are not blank
and not punctuation.
Note: Many other functions and operators accept $_ as the default variable on which to work. For example, lines 4-7 of this program also can be written as follows:
while (<STDIN>) {
chop();
$total += length();
For more information on $_, refer to Day 17, "System Variables."
The pos function, defined only in Perl 5, returns the
location of the last pattern match in a string. It is ideal for
use when repeated pattern matches are specified using the g
(global) pattern-matching operator.
The syntax for the pos function is
offset = pos(string);
string is the string whose pattern is being
matched. offset is the number of characters already matched
or skipped.
Listing 13.12 illustrates the use of pos.
Listing 13.12. A program that uses pos
to display pattern match positions.
1: #!/usr/local/bin/perl
2:
3: $string = "Mississippi";
4: while ($string =~ /i/g) {
5: $position = pos($string);
6: print("matched at position $position\n");
7: }
$ program13 12
matched at position 2
matched at position 5
matched at position 8
matched at position 11
This program loops every time an i in Mississippi
is matched. The number displayed by line 6 is the number of characters
to skip to reach the point at which pattern matching resumes. For example,
the first i is the second character in the string, so the second
pattern search starts at position 2.
Note: you can also use pos to change the position at which pattern matching is to resume. To do this, put the call to pos on the left side of an assignment:
pos($string) = 5;
This tells the Perl interpreter to start the next pattern search with the sixth character in the string. (To restart searching from the beginning, use 0.)
The substr function lets you assign a part of a
character string to a scalar variable (or to a component of an
array variable).
The syntax for calls to the substr function is
substr (expr, skipchars, length)
expr is the character string from which a
substring is to be copied; this character string can be the value
stored in a variable or the value resulting from the evaluation
of an expression. skipchars is the number of
characters to skip before starting copying. length
is the number of characters to copy; length can be
omitted, in which case the rest of the string is copied.
Listing 13.13 provides a simple example of substr.
Listing 13.13. A program that
demonstrates the use of substr.
1: #!/usr/local/bin/perl
2:
3: $string = "This is a sample character string";
4: $sub1 = substr ($string, 10, 6);
5: $sub2 = substr ($string, 17);
6: print ("\$sub1 is \"$sub1\"\n\$sub2 is \"$sub2\"\n");
$ program13 13
$sub1 is "sample"
$sub2 is "character string"
$
Line 4 calls substr, which copies a portion of the
string stored in $string. This call specifies that ten
characters are to be skipped before copying starts, and that a
total of six characters are to be copied. This means that the
substring sample is copied and stored in $sub1.
Line 5 is another call to substr. Here, 17 characters
are skipped. Because the length field is omitted, substr
copies the remaining characters in the string. This means that
the substring character string is copied and stored in $sub2.
Note that lines 4 and 5 do not change the contents of $string.
In Listing 13.13, which you've just seen, calls to substr
appear to the right of the assignment operator =. This
means that the return value from substr--the extracted
substring--is assigned to the variable appearing to the left of
the =.
Calls to substr can appear also on the left of the
assignment operator =. In this case, the portion of the
string specified by substr is replaced by the value
appearing to the right of the assignment operator.
The syntax for these calls to substr is basically the
same as before:
substr (expr, skipchars, length) = newval;
Here, expr must be something that can be
assigned to--for example, a scalar variable or an element of an
array variable. skipchars represents the number of
characters to skip before beginning the overwriting operation,
which cannot be greater than the length of the string. length
is the number of characters to be replaced by the overwriting
operation. If length is not specified, the remainder
of the string is replaced.
newval is the string that replaces the substring
specified by skipchars and length. If newval
is larger than length, the character string
automatically grows to hold it, and the rest of the string is pushed
aside (but not overwritten). If newval is smaller
than length, the character string automatically
shrinks. Basically, everything appears where it is supposed to
without you having to worry about it.
Note: By the way, things that can be assigned to are sometimes known as lvalues, because they appear to the left of assignment statements (the l in lvalue stands for "left"). Things that appear to the right of assignment statements are, similarly, called rvalues.
This book does not use the terms lvalue and rvalue, but you might find that knowing them will prove useful when you read other books on programming languages.
Listing 13.14 is an example of a program that uses substr
to replace portions of a string.
Listing 13.14. A program that
replaces parts of a string using substr.
1: #!/usr/local/bin/perl
2:
3: $string = "Here is a sample character string";
4: substr($string, 0, 4) = "This";
5: substr($string, 8, 1) = "the";
6: substr($string, 19) = "string";
7: substr($string, -1, 1) = "g.";
8: substr($string, 0, 0) = "Behold! ";
9: print ("$string\n");
$ program13 14
Behold! This is the sample string.
$
This program illustrates the many ways you can use substr
to replace portions of a string.
The call to substr in line 4 specifies that no
characters are to be skipped before overwriting, and that four
characters in the original string are to be overwritten. This
means that the substring Here is replaced by This,
and that the following is the new value of the string stored in $string:
This is a sample character string
Similarly, the call to substr in line 5 specifies that
eight characters are to be skipped and one character is to be
replaced. This means that the word a is replaced by the.
Now, $string contains the following:
This is the sample character string
Note that the character string is now larger than the
original, because the new substring, the, is larger than
the substring it replaced.
Line 6 is an example of a call to substr that shrinks
the string. Here, 19 characters are skipped, and the rest of the
string is replaced by the substring string (because no length
field has been specified). Now, the following is the value stored
in $string:
This is the sample string
In line 7, the call to substr is passed 1
in the skipchars field and is passed 1 in
the length field. This tells substr to
replace the last character of the string with the substring g.
(g followed by a period). $string now contains
This is the sample string.
Note: If substr is passed a skipchars value of n, where n is a positive integer, substr skips to n characters from the right end of the string. For example, the following call replaces the last two characters in $string with the string hello:
substr($string, -2, 2) = "hello";
Finally, line 8 specifies that no characters are to be skipped
and no characters are to be replaced. This means that the substring "Behold!
" (including a trailing space) is added to the front of
the existing string and that $string now contains the following:
Behold! This is the sample string.
Line 9 prints this final value of $string.
Tip: If you are a C programmer and are used to manipulating strings using pointers, note that substr with a length field of 1 can be used to simulate pointer-like behavior in Perl.
For example, you can simulate the C statement
char = *str++;
as follows in Perl:
$char = substr($str, $offset++, 1);
You'll need to define a counter variable (such as $offset) to keep track of where you are in the string. However, this is no more of a chore than remembering to initialize your C pointer variable.
You can simulate the following C statement:
*str++ = char;
by assigning values using substr in the same way:
substr($str, $offset++, 1) = $char;
You shouldn't use substr in this way unless you really have to. Perl supplies more powerful and useful tools, such as pattern matching and substitution, to get the job done more efficiently.
The study function is a special function that tells the
Perl interpreter that the specified scalar variable is about to
be searched many times.
The syntax for the study function is
study (scalar);
scalar is the scalar variable to be
"studied." The Perl interpreter takes the value stored
in the specified scalar variable and represents it in an internal
format that allows faster access.
For example
study ($myvar);
Here, the value stored in the scalar variable $myvar is
about to be repeatedly searched.
You can call study for only one scalar variable at a
time. Previous calls to study are superseded if study
is called again.
Tip: To check whether study actually makes your program more efficient, use the function times, which displays the user and CPU times for a program or program fragment. (times is discussed earlier today.)
Perl 5 provides functions that perform case conversion on
strings. These are
The syntax for the lc and uc functions is
retval = lc(string); retval = uc(string);
string is the string to be converted. retval
is a copy of the string, converted to either lowercase or uppercase:
$lower = lc("aBcDe"); # $lower is assigned "abcde"
$upper = uc("aBcDe"); # $upper is assigned "ABCDE"
The syntax for the lcfirst and ucfirst functions
is
retval = lcfirst(string); retval = ucfirst(string);
string is the string whose first character is to
be converted. retval is a copy of the string, with
the first character converted to either lowercase or uppercase:
$lower = lcfirst("HELLO"); # $lower is assigned "hELLO"
$upper = ucfirst("hello"); # $upper is assigned "Hello"
The quotemeta function, defined only in Perl 5, places
a backslash character in front of any non-word character in a
string. The following statements are equivalent:
$string = quotemeta($string); $string =~ s/(\W)/\\$1/g;
The syntax for quotemeta is
newstring = quotemeta(oldstring);
oldstring is the string to be converted. newstring
is the string with backslashes added.
quotemeta is useful when a string is to be used in a
subsequent pattern-matching operation. It ensures that there are
no characters in the string which are to be treated as special pattern-matching characters.
The join function has been used many times in this
book. It takes the elements of a list and converts them into a
single character string.
The syntax for the join function is
join (joinstr, list);
joinstr is the character string that is to be
used to glue the elements of list together.
For example
@list = ("Here", "is", "a", "list");
$newstr = join ("::", @list);
After join is called, the value stored in $newstr
becomes the following string:
Here::is::a::list
The join string, :: in this case, appears between each
pair of joined elements. The most common join string is a single
blank space; however, you can use any value as the join string,
including the value resulting from an expression.
The sprintf function behaves like the printf
function defined on Day 11, "Formatting Your Output," except
that the formatted string is returned by the function instead of
being written to a file. This enables you to assign the string to
another variable.
The syntax for the sprintf function is
sprintf (string, fields);
string is the character string to print, and fields
is a list of values to substitute into the string.
Listing 13.15 is an example that uses sprintf to build
a string.
Listing 13.15. A program that uses sprintf.
1: #!/usr/local/bin/perl
2:
3: $num = 26;
4: $outstr = sprintf("%d = %x hexadecimal or %o octal\n",
5: $num, $num, $num);
6: print ($outstr);
$ program14_9
26 = 1a hexadecimal or 32 octal
$
Lines 4 and 5 take three copies of the value stored in $num
and include them as part of a string. The field specifiers %d, %x, and %o
indicate how the values are to be formatted.
%o Indicates an integer displayed in octal (base-8) format
The created string is returned by sprintf. Once it has
been created, it behaves just like any other Perl character
string; in particular, it can be assigned to a scalar variable,
as in this example. Here, the string containing the three copies
of $num is assigned to the scalar variable $outstr.
Line 6 then prints this string.
Note: For more information on field specifiers or on how printf works, refer to Day 11, which lists the field specifiers defined and provides a description of the syntax of printf.
Today, you learned about three types of built-in Perl
functions: functions that handle process and program control,
functions that perform mathematical operations, and functions
that manipulate strings.
With the process- and program-control functions, you can start
new processes, stop the current program or other processes, or
temporarily halt the current program. You also can create a pipe that
sends data from one of your created processes to another.
With the functions that perform mathematical operations, you
can obtain the sine, cosine, and arctangent of a value. You also can
calculate the natural logarithm and square root of a value, or use the
value as an exponent of base e.
You also can generate random numbers and define the seed to
use when generating the numbers.
Functions that search character strings include index,
which searches for a substring starting from the left of a
string, and rindex, which searches for a substring
starting from the right of a string. You can retrieve the length
of a character string using length. By using the translate
operator tr in conjunction with the system variable $_,
you can count the number of occurrences of a particular character
or set of characters in a string. The pos function enables
you to determine or set the current pattern-matching location in
a string.
The function substr enables you to extract a substring
from a string and use it in an expression or assignment
statement. substr also can be used to replace a portion of
a string or append to the front or back end of the string.
The lc and uc functions convert strings to
lowercase or uppercase. To convert the first letter of a string
to lowercase or uppercase, use lcfirst or ucfirst.
quotemeta places a backslash in front of every non-word
character in a string.
You can create new character strings using join and sprintf. join
creates a string by joining elements of a list, and sprintf
builds a string using field specifiers that specify the string
format.
The Workshop provides quiz questions to help you solidify your
understanding of the material covered and exercises to give you
experience in using what you've learned. Try and understand the quiz
and exercise answers before you go on to tomorrow's lesson.
#!/usr/local/bin/perl
print ("Here is a line of output. ");
system ("w");
print ("Here is the rest of the line.\n");