Today's lesson shows you how to use subroutines to divide your
program into smaller, more manageable modules. Today, you learn
about the following:
In Perl, a subroutine is a separate body of code
designed to perform a particular task. A Perl program executes
this body of code by calling or invoking the subroutine; the act
of invoking a subroutine is called a subroutine invocation.
Subroutines serve two useful purposes:
Listing 9.1 shows how a subroutine works. This program calls a
subroutine that reads a line from the standard input file and breaks
it into numbers. The program then adds the numbers together.
Listing 9.1. A program that uses a
subroutine.
1: #!/usr/local/bin/perl
2:
3: $total = 0;
4: &getnumbers;
5: foreach $number (@numbers) {
6: $total += $number;
7: }
8: print ("the total is $total\n");
9:
10: sub getnumbers {
11: $line = <STDIN>;
12: $line =~ s/^\s+|\s*\n$//g;
13: @numbers = split(/\s+/, $line);
14: }
$ program9_1
11 8 16 4
the total is 39
$
Lines 10-14 are an example of a subroutine. The keyword sub
tells the Perl interpreter that this is a subroutine definition.
The getnumbers immediately following sub is the name of
the subroutine; the Perl program uses this name when invoking the subroutine.
The program starts execution in the normal way, beginning with
line 3. Line 4 invokes the subroutine getnumbers; the & character
tells the Perl interpreter that the following name is the name of
a subroutine. (This ensures that the Perl interpreter does not
confuse subroutine names with the names of scalar or array
variables.)
The Perl interpreter executes line 4 by jumping to the first
executable statement inside the subroutine, which is line 11. The interpreter
then executes lines 11-13.
Lines 11-13 create the array @numbers as follows:
Once line 13 is finished, the Perl interpreter jumps back to
the main program and executes the line immediately following the subroutine
call, which is line 5.
Lines 5-7 add the numbers together by using the foreach
statement to loop through the list stored in @numbers.
(Note that this program does not check whether a particular
element of @numbers actually consists of digits. Because
character strings that are not digits are converted to 0 in expressions,
this isn't a significant problem.)
The syntax for a subroutine definition is
sub subname {
statement_block
}
subname is a placeholder for the name of the
subroutine. Like all Perl names, subname consists of an
alphabetic character followed by one or more letters, digits, or
underscores.
statement_block is the body of the subroutine
and consists of one or more Perl statements. Any statement that
can appear in the main part of a Perl program can appear in a
subroutine.
Note: The Perl interpreter never confuses a subroutine name with a scalar variable name or any other name, because it can always tell from context which name you are referring to. This means that you can have a subroutine and a scalar variable with the same name. For example:
$word = 0; &word;
Here, when the Perl interpreter sees the & character in the second statement, it realizes that the second statement is calling the subroutine named word.
Caution: When you are defining names for your subroutines, it's best not to use a name belonging to a built-in Perl function that you plan to use.
For example, you could, if you like, define a subroutine named split. The Perl interpreter can always distinguish an invocation of the subroutine split from an invocation of the library function split, because the name of the subroutine is preceded by a & when it is invoked, as follows:
@words = &split(1, 2); # subroutine @words = split(/\s+/, $line); # library function
However, it's easy to leave off the & by mistake (especially if you are used to programming in C, where subroutine calls do not start with a &). To avoid such problems, use subroutine names that don't correspond to the names of library functions.
Perl subroutines can appear anywhere in a program, even in the
middle of a conditional statement. For example, Listing 9.2 is a
perfectly legal Perl program.
Listing 9.2. A program containing a
subroutine in the middle of the main program.
1: #!/usr/local/bin/perl
2:
3: while (1) {
4: &readaline;
5: last if ($line eq "");
6: sub readaline {
7: $line = <STDIN>;
8: }
9: print ($line);
10: }
11: print ("done\n");
$ program9_2
Here is a line of input.
Here is a line of input.
^D
done
$
This program just reads lines of input from the standard input
file and writes them straight back out to the standard output
file.
Line 4 calls the subroutine readaline. When you examine
this subroutine, which is contained in lines 6-8, you can see
that it reads a line of input and assigns it to the scalar
variable $line.
When readaline is finished, program execution continues
with line 5. When line 5 is executed, the program skips over the subroutine
definition and continues with line 9. The code inside the subroutine
is never directly executed, even if it appears in the middle of a
program; lines 6-8 can be executed only by a subroutine
invocation, such as is found in line 4.
Tip: Although subroutines can appear anywhere in a program, it usually is best to put all your subroutines at either the beginning of the program or the end. Following this practice makes your programs easier to read.
As you have seen, the Perl interpreter uses the &
character to indicate that a subroutine is being specified in a
statement. In Perl 5, you do not need to supply an &
character when calling a subroutine if you have already defined
the subroutine.
sub readaline {
$line = <STDIN>;
}
...
readaline;
Because the Perl interpreter already knows that readaline
is a subroutine, you don't need to specify the & when
calling it.
If you prefer to list all your subroutines at the end of your
program, you can still omit the & character provided
you supply a forward reference for your subroutine, as shown in
the following:
sub readaline; # forward reference
...
readaline;
...
sub readaline {
$line = <STDIN>;
}
The forward reference tells the Perl interpreter that readaline
is the name of a subroutine. This means that you no longer need to
supply the & when you call readaline.
Caution: occasionally, calling a subroutine without specifying the & character might not behave the way you expect. If your program is behaving strangely, or you are not sure whether or not to use the & character, supply the & character with your call.
Take another look at the getnumbers subroutine from
Listing 9.1.
sub getnumbers {
$line = <STDIN>;
$line =~ s/^\s+|\s*\n$//g;
@numbers = split(/\s+/, $temp);
}
Although this subroutine is useful, it suffers from one
serious limitation: it overwrites any existing list stored in the
array variable @numbers (as well as any value stored in $line
or $temp). This overwriting can lead to problems. For
example, consider the following:
@numbers = ("the", "a", "an");
&getnumbers;
print ("The value of \@numbers is: @numbers\n");
When the subroutine getnumbers is invoked, the value of @numbers
is overwritten. If you just examine this portion of the program,
it is not obvious that this is what is happening.
To get around this problem, you can use a useful property of
subroutines in Perl: The value of the last expression evaluated
by the subroutine is automatically considered to be the
subroutine's return value.
For example, in the subroutine getnumbers from Listing
9.1, the last expression evaluated is
@numbers = split(/\s+/, $temp);
The value of this expression is the list of numbers obtained
by splitting the line of input. This means that this list of
numbers is the return value for the subroutine.
To see how to use a subroutine return value, look at Listing
9.3, which modifies the word-counting program to use the return value
from the subroutine getnumbers.
Listing 9.3. A program that uses a
subroutine return value.
1: #!/usr/local/bin/perl
2:
3: $total = 0;
4: @numbers = &getnumbers;
5: foreach $number (@numbers) {
6: $total += $number;
7: }
8: print ("the total is $total\n");
9:
10: sub getnumbers {
11: $line = <STDIN>;
12: $line =~ s/^\s+|\s*\n$//g;
13: split(/\s+/, $line); # this is the return value
14: }
$ program9_3
11 8 16 4
the total is 39
$
Line 4, once again, calls the subroutine getnumbers. As
before, the array variable @numbers is assigned the list
of numbers read from the standard input file; however, in this program,
the assignment is in the main body of the program, not in the subroutine.
This makes the program easier to read.
The only other difference between this program and Listing 9.1
is that the call to split in line 13 no longer assigns
anything to @numbers. In fact, it doesn't assign the list
returned by split to any variable at all, because it does
not need to. Line 13 is the last expression evaluated in getnumbers, so
it automatically becomes the return value from getnumbers.
Therefore, when line 4 calls getnumbers, the list returned
by split is assigned to the array variable @numbers.
Note: If the idea of evaluating an expression without assigning it confuses you, there's nothing wrong with creating a variable inside the subroutine just for the purpose of containing the return value. For example:
sub getnumbers {
$line = <STDIN>;
$line =~ s/^\s+|\s*\n$//g;
@retval = split(/\s+/, $temp); # the return value
}
Here, it is obvious that the return value is the contents of @retval.
The only drawback to doing this is that assigning the list returned by split to @retval is slightly less efficient. In larger programs, such efficiency costs are worth it, because subroutines become much more comprehensible.
Using a special return variable also eliminates an entire class of errors, which you will see in "Return Values and Conditional Expressions," later today.
You can use a return value of a subroutine any place an
expression is expected. For example:
foreach $number (&getnumbers) {
print ("$number\n");
}
This foreach statement iterates on the list of numbers
returned by getnumbers. Each element of the list is
assigned to $number in turn, which means that this loop
prints all the numbers in the list, each on its own line.
Listing 9.4 shows another example that uses the return value
of a subroutine in an expression. This time, the return value is used
as an array subscript.
Listing 9.4. A program that uses a
return value as an array subscript.
1: #!/usr/local/bin/perl
2:
3: srand();
4: print ("Random number tester.\n");
5: for ($count = 1; $count <= 100; $count++) {
6: $randnum[&intrand] += 1;
7: }
8: print ("Totals for the digits 0 through 9:\n");
9: print ("@randnum\n");
10:
11: sub intrand {
12: $num = int(rand(10));
13: }
$ progam9_4
Random number tester.
Totals for the digits 0 through 9:
10 9 11 10 8 8 12 11 9 12
$
This program uses the following three built-in functions:
int Gets rid of the non-integer portion of a number
The subroutine intrand first calls rand to get a
random number greater than 0 and less than 10. The return value
from rand is passed to int to remove the fractional
portion of the number; this means, for example, that 4.77135
becomes 4. This number becomes the return value returned by
intrand.
Line 6 calls intrand. The return value from intrand,
an integer between 0 and 9, serves as the subscript into the
array variable randnum. If the return value from intrand
is 7, $randnum[7] has its value increased by one.
As a consequence, at any given time, the nth
value of @randnum contains the number of occurrences of n
as a random number.
Line 9 prints out the number of occurrences of each of the 10
numbers. Each number should occur approximately the same number
of times (though not necessarily exactly the same number of
times).
Because the return value of a subroutine is always the last
expression evaluated, the return value might not always be what you
expect.
Consider the simple program in Listing 9.5. This program, like
the one in Listing 9.3, reads an input line, breaks it into numbers,
and adds the numbers. This program, however, attempts to do all
the work inside the subroutine get_total.
Listing 9.5. A program illustrating a
potential problem with return values from subroutines.
1: #!/usr/local/bin/perl
2:
3: $total = &get_total;
4: print("The total is $total\n");
5:
6: sub get_total {
7: $value = 0;
8: $inputline = <STDIN>;
9: $inputline =~ s/^\s+|\s*\n$//g;
10: @subwords = split(/\s+/, $inputline);
11: $index = 0;
12: while ($subwords[$index] ne "") {
13: $value += $subwords[$index++];
14: }
15: }
$ program9_5
11 8 16 4
the total is
$
Clearly, this program is supposed to assign the contents of
the scalar variable $value to the scalar variable $total. However, when
line 4 tries to print the total, you see that the value of $total
is actually the empty string. What has happened?
The problem is in the subroutine get_total. In get_total,
as in all other subroutines, the return value is the value of the
last expression evaluated. However, in get_total, the last
expression evaluated is not the last expression in the program.
The last expression to be evaluated in get_total is the
conditional expression in line 12, which is
$subwords[$index] ne ""
The loop in lines 12-14 iterates until the value of this
expression is 0. When the value of this expression is 0, the loop terminates
and the subroutine terminates. This means that the value of the last
expression evaluated in the subroutine is 0 and that the return
value of the subroutine is 0. Because 0 is treated as the null
string by print (0 and the null string are equivalent in Perl),
line 4 prints the following, which isn't what the program is
supposed to do:
the total is
Listing 9.6 shows how you can get around this problem.
Listing 9.6. A program that corrects
the problem that occurs in Listing 9.5.
1: #!/usr/local/bin/perl
2:
3: $total = &get_total;
4: print("The total is $total.\n");
5: sub get_total {
6: $value = 0;
7: $inputline = <STDIN>;
8: $inputline =~ s/^\s+|\s*\n$//g;
9: @subwords = split(/\s+/, $inputline);
10: $index = 0;
11: while ($subwords[$index] ne "") {
12: $value += $subwords[$index++];
13: }
14: $retval = $value;
15: }
$ program9_6
11 8 16 4
the total is 39.
$
This program is identical to Listing 9.5 except for one
difference: line 15 has been added. This line assigns the total stored
in $value to the scalar variable $retval.
Line 15 ensures that the value of the last expression
evaluated in the subroutine get_total is, in fact, the
total which is supposed to become the return value. This means
that line 3 now assigns the correct total to $total, which
in turn means that line 4 now prints the correct result.
Note that you don't really need to assign to $retval.
The subroutine get_total can just as easily be the following:
sub get_total {
$value = 0;
$inputline = <STDIN>;
$inputline =~ s/^\s+|\s*\n$//g;
@subwords = split(/\s+/, $inputline);
$index = 0;
while ($subwords[$index] ne "") {
$value += $subwords[$index++];
}
$value;
}
Here, the final expression evaluated by the subroutine is
simply $value. The value of this expression is the current
value stored in $value, which is the sum of the numbers in
the line.
Tip: Subroutines, such as get_total in Listing 9.6, which assign their return value at the very end are known as single-exit modules.
Single-exit modules avoid problems like those you saw in Listing 9.5, and they usually are much easier to read. For these reasons, it is a good idea to assign to the return value at the very end of the subroutine, unless there are overwhelming reasons not to do so.
Another way to ensure that the return value from a subroutine
is the value you want is to use the return statement.
The syntax for the return statement is
return (retval);
retval is the value you want your subroutine to
return. It can be either a scalar value (including the result of
an expression) or a list.
Listing 9.7 provides an example of the use of the return
statement.
Listing 9.7. A program that uses the return
statement.
1: #!/usr/local/bin/perl
2:
3: $total = &get_total;
4: if ($total eq "error") {
5: print ("No input supplied.\n");
6: } else {
7: print("The total is $total.\n");
8: }
9:
10: sub get_total {
11: $value = 0;
12: $inputline = <STDIN>;
13: $inputline =~ s/^\s+|\s*\n$//g;
14: if ($inputline eq "") {
15: return ("error");
16: }
17: @subwords = split(/\s+/, $inputline);
18: $index = 0;
19: while ($subwords[$index] ne "") {
20: $value += $subwords[$index++];
21: }
22: $retval = $value;
23: }
$ program9_7
^D
No input supplied.
$
This program is similar to the one in Listing 9.6. The only
difference is that this program checks whether an input line
exists.
If the input line does not exist, the conditional expression
in line 14 becomes true, and line 15 is executed. Line 15 exits
the subroutine with the return value error; this means
that error is assigned to $total in line 3.
This program shows why allowing scalar variables to store
either numbers or character strings is useful. When the
subroutine get_total detects the error, it can assign a
value that is not an integer to $total, which makes it
easier to determine that something has gone wrong. Other
programming languages, which only enable you to assign either a
number or a character string to a particular variable, do not
offer this flexibility.
The subroutine get_total in Listing 9.7 defines several
variables that are used only inside the subroutine: the array
variable @subwords, and the four scalar variables $inputline, $value, $index, and $retval.
If you know for certain that these variables are going to be
used only inside the subroutine, you can tell Perl to define
these variables as local variables.
In Perl 5, there are two statements used to define local
variables:
In Perl 4, the my statement is not defined, so you must
use local to define a variable that is not known to the
main program.
Listing 9.8 shows how you can use my to define a
variable that exists only inside a subroutine.
Note: If you are using Perl 4, replace my with local in all the remaining examples in this chapter. For example, in Listing 9.8, replace my with local in lines 13 and 14, which produces
local ($total, $inputline, @subwords); local ($index, $retval);
In Perl, my and local behave identically and use the same syntax. The only difference between them is that variables created using my are not known outside the subroutine.
Listing 9.8. A program that uses
local variables.
1: #!/usr/local/bin/perl
2:
3: $total = 0;
4: while (1) {
5: $linetotal = &get_total;
6: last if ($linetotal eq "done");
7: print ("Total for this line: $linetotal\n");
8: $total += $linetotal;
9: }
10: print ("Total for all lines: $total\n");
11:
12: sub get_total {
13: my ($total, $inputline, @subwords);
14: my ($index, $retval);
15: $total = 0;
16: $inputline = <STDIN>;
17: if ($inputline eq "") {
18: return ("done");
19: }
20: $inputline =~ s/^\s+|\s*\n$//g;
21: @subwords = split(/\s+/, $inputline);
22: $index = 0;
23: while ($subwords[$index] ne "") {
24: $total += $subwords[$index++];
25: }
26: $retval = $total;
27: }
$ program9_8
11 8 16 4
Total for this line: 39
7 20 6 1
Total for this line: 34
^D
Total for all lines: 73
$
This program uses two copies of the scalar variable $total.
One copy of $total is defined in the main program and keeps
a running total of all of the numbers in all of the lines.
The scalar variable $total is also defined in the
subroutine get_total; in this subroutine, $total
refers to the total for a particular line, and line 13 defines it
as a local variable. Because this copy of $total is only
defined inside the subroutine, the copy of $total defined
in the main program is not affected by line 15 (which assigns 0
to $total).
Caution: Because a local variable is not known outside the subroutine, the local variable is destroyed when the subroutine is completed. If the subroutine is called again, a new copy of the local variable is defined.
This means that the following code does not work:
sub subroutine_count {
my($number_of_calls);
$number_of_calls += 1;
}
This subroutine does not return the number of times subroutine_count has been called. Because a new copy of $number_of_calls is defined every time the subroutine is called, $number_of_calls is always assigned the value 1.
/p>
Local variables can appear anywhere in a program, provided
they are defined before they are used. It is good programming practice
to put all your local definitions at the beginning of your subroutine.
If you like, you can assign a value to a local variable when
you declare it. For example
sub my_sub {
my($scalar) = 43;
my(@array) = ("here's", "a", "list");
# code goes here
}
Here, the local scalar variable $scalar is given an
initial value of 43, and the local array variable @array
is initialized to contain the list ("here's",
"a", "list").
You can make your subroutines more flexible by allowing them
to accept values passed from the main program; these values passed
from the main program are known as arguments.
Listing 9.9 provides a very simple example of a subroutine
that accepts three arguments.
Listing 9.9. A program that uses a
subroutine to print three numbers and their total.
1: #!/usr/local/bin/perl
2:
3: print ("Enter three numbers, one at a time:\n");
4: $number1 = <STDIN>;
5: chop ($number1);
6: $number2 = <STDIN>;
7: chop ($number2);
8: $number3 = <STDIN>;
9: chop ($number3);
10: &printnum ($number1, $number2, $number3);
11:
12: sub printnum {
13: my($number1, $number2, $number3) = @_;
14: my($total);
15: print ("The numbers you entered: ");
16: print ("$number1 $number2 $number3\n");
17: $total = $number1 + $number2 + $number3;
18: print ("The total: $total\n");
19: }
$ program9_9
Enter three numbers, one at a time:
5
11
4
The numbers you entered: 5 11 4
The total: 20
$
Line 10 calls the subroutine printnum. Three arguments
are passed to printnum: the value stored in $number1,
the value stored in $number2, and the value stored in $number3.
Note that arguments are passed to subroutines in the same way they
are passed to built-in library functions.
Line 13 defines local copies of the scalar variables $number1, $number2,
and $number3. It then assigns the contents of the system
variable @_ to these scalar variables. @_ is
created whenever a subroutine is called with arguments; it
contains a list consisting of the arguments in the order in which they
are passed. In this case, printnum is called with
arguments 5, 11, and 4, which means that @_ contains
the list (5, 11, 4).
The assignment in line 13 assigns the list to the local scalar
variables that have just been defined. This assignment works just like
any other assignment of a list to a set of scalar variables. The
first element of the list, 5, is assigned to the first
variable, $number1; the second element of the list, 11, is assigned
to $number2; and the final element, 4, is assigned
to $number3.
Note: After the array variable @_ has been created, it can be used anywhere any other array variable can be used. This means that you do not need to assign its contents to local variables.
The following subroutine is equivalent to the subroutine in lines 12-19 of Listing 9.9:
sub printnum {
my($total);
print ("The numbers you entered: ");
print ("$_[0] $_[1] $_[2]\n");
$total = $_[0] + $_[1] + $_[2];
print ("The total: $total\n");
}
Here, $_[0] refers to the first element of the array variable @_, $_[1] refers to the second element, and $_[2] refers to the third element.
This subroutine is a little more efficient, but it is harder to read.
Tip: It usually is better to define local variables and assign @_ to them because then your subroutines will be easier to understand.
Listing 9.10 is another example of a program that passes
arguments to a subroutine. This program uses the same subroutine
to count the number of words and the number of characters in a
file.
Listing 9.10. Another example of a
subroutine with arguments passed to it.
1: #!/usr/local/bin/perl
2:
3: $wordcount = $charcount = 0;
4: $charpattern = "";
5: $wordpattern = "\\s+";
6: while ($line = <STDIN>) {
7: $charcount += &count($line, $charpattern);
8: $line =~ s/^\s+|\s+$//g;
9: $wordcount += &count($line, $wordpattern);
10: }
11: print ("Totals: $wordcount words, $charcount characters\n");
12:
13: sub count {
14: my ($line, $pattern) = @_;
15: my ($count);
16: if ($pattern eq "") {
17: @items = split (//, $line);
18: } else {
19: @items = split (/$pattern/, $line);
20: }
21: $count = @items;
22: }
$ program9_10
This is a line of input.
Here is another line.
^D
Totals: 10 words, 47 characters
$
This program reads lines from the standard input file until
the file is exhausted. Each line has its characters counted and
its words counted.
Line 7 determines the number of characters in a line by
calling the subroutine count. This subroutine is passed
the line of input and the string stored in $charpattern,
which is the empty string. Inside the subroutine count,
the local variable $pattern receives the pattern passed to
it by the call in line 7. This means that the value stored in $pattern
is also the empty string.
Lines 16-20 split the input line. The pattern specified in the
call to split has the value stored in $pattern
substituted into it. Because $pattern currently contains
the empty string, the pattern used to split the line is //,
which splits the input line into individual characters. As a
result, each element of the resulting list stored in @items
is a character in the input line.
The total number of elements in the list--in other words, the
total number of characters in the input line--is assigned to $count by
line 17. Because this is the last expression evaluated in the subroutine,
the resulting total number of characters is returned by the
subroutine. Line 8 adds this total to the scalar variable $charcount.
Line 8 then removes the leading and trailing white space; this
white space is included in the total number of characters--because
spaces, tabs, and the trailing newline character count as characters--but
is not included when the line is broken into words.
Line 9 calls the subroutine count again, this time with
the pattern stored in $wordpattern, which is \s+.
(Recall that you need to use two backslashes in a string to
represent a single backslash, because the \ character is
the escape character in strings.) This value, representing one or
more whitespace characters, is assigned to $pattern inside
the subroutine, and the pattern passed to split therefore becomes /\s+/.
When split is called with this pattern, @items
is assigned a list of words. The total number of words in the
list is assigned to $count and is returned; line 11 adds
this returned value to the total number of words.
If you like, you can pass a list to a subroutine. For example,
the following subroutine adds the element of a list together and prints
the result:
sub addlist {
my (@list) = @_;
$total = 0;
foreach $item (@list) {
$total += $item;
}
print ("The total is $total\n");
}
To invoke this subroutine, pass it an array variable, a list,
or any combination of lists and scalar values.
&addlist (@mylist);
&addlist ("14", "6", "11");
&addlist ($value1, @sublist, $value2);
In each case, the values and lists supplied in the call to addlist
are merged into a single list and then passed to the subroutine.
Because values are merged into a single list when a list is
passed to a subroutine, you can only define one list as an
argument for a subroutine. The subroutine
sub twolists {
my (@list1, @list2) = @_;
}
isn't useful because it always assigns the empty list to @list2,
and because @list1 absorbs all of the contents of @_.
This means that if you want to have both scalar variables and
a list as arguments to a subroutine, the list must appear last,
as follows:
sub twoargs {
my ($scalar, @list) = @_;
}
If you call this subroutine using
&twoargs(47, @mylist);
the value 47 is assigned to $scalar, and @mylist
is assigned to @list.
If you like, you can call twoargs with a single list,
as follows:
&twoargs(@mylist);
Here, the first element of @mylist is assigned to $scalar,
and the rest of @mylist is assigned to @list.
Note: If you find this confusing, it might help to realize that passing arguments to a subroutine follows the same rules as assignment does. For example, you can have
($scalar, @list1) = @list2;
because $scalar is assigned the first element of @list2. However, you can't have this:
(@list1, $scalar) = @list2;
because all of @list1 would be assigned to @list2 and $scalar would be assigned the null string.
In Perl, you can call subroutines from other subroutines. To
call a subroutine from another subroutine, use the same subroutine-invocation
syntax you've been using all along. Subroutines that are called
by other subroutines are known as nested subroutines
(because one call is "nested" inside the other).
Listing 9.11 is an example of a program that contains a nested
subroutine. It is a fairly simple modification of Listing 9.10
and counts the number of words and characters in three lines of standard
input. It also demonstrates how to return multiple values from a
subroutine.
Listing 9.11. An example of a nested
subroutine.
1: #!/usr/local/bin/perl
2:
3: ($wordcount, $charcount) = &getcounts(3);
4: print ("Totals for three lines: ");
5: print ("$wordcount words, $charcount characters\n");
6:
7: sub getcounts {
8: my ($numlines) = @_;
9: my ($charpattern, $wordpattern);
10: my ($charcount, $wordcount);
11: my ($line, $linecount);
12: my (@retval);
13: $charpattern = "";
14: $wordpattern = "\\s+";
15: $linecount = $charcount = $wordcount = 0;
16: while (1) {
17: $line = <STDIN>;
18: last if ($line eq "");
19: $linecount++;
20: $charcount += &count($line, $charpattern);
21: $line =~ s/^\s+|\s+$//g;
22: $wordcount += &count($line, $wordpattern);
23: last if ($linecount == $numlines);
24: };
25: @retval = ($wordcount, $charcount);
26: }
27:
28: sub count {
29: my ($line, $pattern) = @_;
30: my ($count);
31: if ($pattern eq "") {
32: @items = split (//, $line);
33: } else {
34: @items = split (/$pattern/, $line);
35: }
36: $count = @items;
37: }
$ program9_11
This is a line of input.
Here is another line.
Here is the last line.
Totals for three lines: 15 words, 70 characters
$
The main body of this program now consists of only five lines
of code, including the special header comment and a blank line. This
is because most of the actual work is being done inside the
subroutines. (This is common in large programs. Most of these
programs call a few main subroutines, which in turn call other
subroutines. This approach makes programs easier to read, because
each subroutine is compact and concise.)
Line 3 calls the subroutine getcounts, which retrieves
the line and character count for the three lines from the
standard input file. Because a list containing two elements is
returned by getcounts, a standard "list to scalar
variable" assignment can be used to assign the returned list
directly to $wordcount and $charcount.
The subroutine getcounts is similar to the main body of
the program in Listing 9.10. The only difference is that the while
loop has been modified to loop only the number of times specified
by the argument passed to getcounts, which is stored in
the local variable $numlines.
The subroutine getcounts actually does the word and
character counting by calling a nested subroutine, count.
This subroutine is identical to the subroutine of the same name
in Listing 9.10.
Note: The @_ variable is a local variable that is defined inside the subroutine. When a subroutine calls a nested subroutine, a new copy of @_ is created for the nested subroutine.
For example, in Listing 9.11, when getcounts calls count, a new copy of @_ is created for count, and the @_ variable in getcounts is not changed.
In Perl, not only can subroutines call other subroutines, but
subroutines actually can call themselves. A subroutine that calls itself
is known as a recursive subroutine.
You can use a subroutine as a recursive subroutine if the
following two conditions are true:
When all the variables that a subroutine uses are local, the
subroutine creates a new copy of the variables each time it calls itself.
This ensures that there is no confusion or overlap.
Listing 9.12 is an example of a program that contains a
recursive subroutine. This program accepts a list of numbers and operands
that is to be evaluated from right to left, as if the list is a stack
whose top is the left end of the list. For example, if the input
is
- 955 * 26 + 11 8
this program adds 11 and 8, multiplies the
result by 26, and subtracts that result from 955.
This is equivalent to the following Perl expression:
955 - 26 * (11 + 8)
Listing 9.12. A program that uses a
recursive subroutine to perform arithmetic.
1: #!/usr/local/bin/perl
2:
3: $inputline = <STDIN>;
4: $inputline =~ s/^\s+|\s+$//g;
5: @list = split (/\s+/, $inputline);
6: $result = &rightcalc (0);
7: print ("The result is $result.\n");
8:
9: sub rightcalc {
10: my ($index) = @_;
11: my ($result, $operand1, $operand2);
12:
13: if ($index+3 == @list) {
14: $operand2 = $list[$index+2];
15: } else {
16: $operand2 = &rightcalc ($index+2);
17: }
18: $operand1 = $list[$index+1];
19: if ($list[$index] eq "+") {
20: $result = $operand1 + $operand2;
21: } elsif ($list[$index] eq "*") {
22: $result = $operand1 * $operand2;
23: } elsif ($list[$index] eq "-") {
24: $result = $operand1 - $operand2;
25: } else {
26: $result = $operand1 / $operand2;
27: }
28: }
$ program9_12
- 98 * 4 + 12 11
The result is 6.
$
This program starts off by reading a line of input from the
standard input file and breaking it into its components, which
are stored as a list in the array variable @list.
When given the input
- 98 * 4 + 12 11
lines 3-5 produce the following list, which is assigned to @list:
("-", "98", "*", "4", "+", "12", "11")
Line 6 calls the subroutine rightcalc for the first
time. rightcalc requires one argument, an index value that
tells the subroutine what part of the list to work on. Because
the first argument here is 0, rightcalc starts with the
first element in the list.
Line 10 assigns the argument passed to rightcalc to the
local variable $index. When rightcalc is called for
the first time, $index is 0.
Lines 13-17 are the heart of this subroutine, because they
control whether to call rightcalc recursively. The basic
logic is that a list such as
("-", "98", "*", "4", "+", "12", "11")
can be broken into three parts: the first operator, -;
the first operand, 98; and a sublist (the rest of the
list). Note that the sublist
("*", "4", "+", "12", "11")
is itself a complete set of operators and operands; because
this program is required to perform its arithmetic starting from
the right, this sublist must be calculated first.
Line 13 checks whether there is a sublist that needs to be
evaluated first. To do this, it checks whether there are more
than three elements in the list. If there are only three elements
in the list, the list consists of only one operator and two
operands, and the arithmetic can be performed right away. If
there are more than three elements in the list, a sublist exists.
To evaluate the sublist when it exists, line 16 calls rightcalc
recursively. The index value passed to this second copy of rightcalc
is 2; this ensures that the first element of the list examined by
the second copy of rightcalc is the element with subscript
2, which is *.
At this point, the following is the chain of subroutine
invocations, their arguments, and the part of the list on which
they are working:
Level 3 rightcalc(2)list ("*", "4", "+", "12", "11")
When this copy of rightcalc reaches line 13, it checks
whether the sublist being worked on has just three elements.
Because this sublist has five elements, line 16 calls yet another
copy of rightcalc, this time setting the value of $index
to 4. The following is the chain of subroutine invocations after this third
call:
Level 4 rightcalc(4)list ("+", "12", "11")
When the third copy of this subroutine reaches line 13, it
checks whether this portion of the list contains only three
elements. Because it does, the conditional expression in line 13
is true. At this point, line 14 is executed for the first time
(by any copy of rightcalc); it takes the value stored in $index--in
this case, 4, adds 2 to it, and uses the result as the subscript
into @list. This assigns 11, the seventh element of @list,
to $operand2.
Lines 18-27 perform an arithmetic operation. Line 18 adds 1 to
the value in $index to retrieve the location of the first operand;
this operand is assigned to $operand1. In this copy of rightcalc, the subscript
is 5 (4+1), and the sixth element of @list, 12, is
assigned to $operand1.
Line 19 uses $index as the subscript into the list to
access the arithmetic operator for this operation. In this case,
the fifth element of $index (subscript 4) is +, and
the expression in line 19 is true. Line 20 then adds $operand1
to $operand2, yielding $result, which is 23. This
value is returned by this copy of rightcalc.
When the third copy of rightcalc returns, execution
continues with the second copy of rightcalc because the
second copy called the third copy. Line 16 of the second copy
assigns the return value of the third copy, 23, to $operand2.
The following is the state of the program after line 16 has finished
executing:
Level 3 rightcalc(2)list ("*", "4", "+", "12", "11"), $operand2 is 23
The Perl interpreter now executes lines 18-27. Because $index
is 2 in this copy of rightcalc, line 18 assigns the fourth element of @list, 4,
to $operand1. Line 21 is true in this case because the operator
is *; this means that line 22 multiplies $operand1 (4)
by $operand2 (23), yielding 92, which is assigned to $result.
At this point, the second copy of rightcalc is
finished, and program execution returns to line 16. This assigns
the return value from the second copy, 92, to $operand2.
The following is the state of the program after the second
copy of rightcalc is finished:
Level 2 rightcalc(0)list ("-", "98", "*", "4", "+", "12", "11"), $operand2 is 92
Now you're almost finished; the program is executing only one
copy of rightcalc. Because $index is 0 in this copy
of rightcalc, line 18 assigns 98 to $operand1.
Line 23 is true in this case because the operator here is -;
line 24 then takes 98 and subtracts 92 from it, yielding a final
result of 6.
This final result of 6 is passed to the main program and is
assigned to $result. (Note that there is no conflict
between $result in the main program and the various copies
of $result in rightcalc because $result is
defined as a local variable in rightcalc.) Line 7,
finally, prints this result.
Note: Recursive subroutines are useful when handling complicated data structures such as trees. You will see examples of such complicated data structures on Day 10, "Associative Arrays."
As you have seen, Perl enables you to pass an array as an
argument to a subroutine.
&my_sub(@array);
When the subroutine my_sub is called, the list stored
in the array variable @array is copied to the variable @_
defined in the subroutine.
sub my_sub {
my (@subarray) = @_;
$arraylength = @subarray;
}
If the array being passed is large, it might take some time
(and considerable space) to create a copy of the array. If your application
is operating under time or space limitations, or you just want to make
it more efficient, you can specify that the array is to be passed
by name.
The following is an example of a similar subroutine that
refers to an array by name:
sub my_sub {
my (*subarray) = @_;
$arraylength = @subarray;
}
The *subarray definition tells the Perl interpreter to
operate on the actual list passed to my_sub instead of
making a copy.
To call this subroutine, specify * instead of @
with the array variable name, as in the following:
@myarray = (1, 2, 3, 4, 5); &my_sub(*myarray);
Specifying *myarray instead of @myarray
indicates that the actual contents of @myarray are to be used
(and modified if desired) in my_sub. In fact, while the
subroutine is being executed, the name @subarray becomes
identical to the name @myarray. This process of creating
another name to refer to the same variable is known as aliasing. @subarray
is now an alias of @myarray.
When my_sub terminates, @subarray stops being an
alias of @myarray. When my_sub is called again with
a different argument, as in
&my_sub(*anotherarray);
the variable @subarray in my_sub becomes an
alias for @anotherarray, which means that you can use the
array variable @subarray to access the storage in @anotherarray.
Aliasing arrays in this manner has one distinct advantage and
one distinct drawback. The advantage is that your program becomes
more efficient. You don't need to copy the entire list from your
main program to the subroutine. The disadvantage is that your
program becomes more difficult to follow. You have to remember,
for example, that changing the contents of @subarray in
the subroutine my_sub also changes the contents of @myarray
and @anotherarray. It is easy to lose track of which name
refers to which variable.
There is also another problem with aliasing: aliasing affects
all variables with the same name, not just array variables.
For example, consider Listing 9.13, which defines a scalar
variable named $foo and an array named @foo, and
then aliases @foo. As you'll see, the program aliases $foo
as well.
Listing 9.13. A program that
demonstrates aliasing.
1: #!/usr/local/bin/perl
2:
3: $foo = 26;
4: @foo = ("here's", "a", "list");
5: &testsub (*foo);
6: print ("The value of \$foo is now $foo\n");
7:
8: sub testsub {
9: local (*printarray) = @_;
10: foreach $element (@printarray) {
11: print ("$element\n");
12: }
13: $printarray = 61;
14: }
$ program9_13
here's
a
list
The value of $foo is now 61
$
Line 5 calls the subroutine testsub. The argument, *foo,
indicates that the array @foo is to be passed to testsub and
aliased.
The local variable definition in line 9 indicates that the
array variable @printarray is to become an alias of the
array variable @foo. This means that the name printarray
is defined to be equivalent to the name foo.
As a consequence, the scalar variable $printarray
becomes an alias of the scalar variable $foo. As a consequence,
line 13, which seems to assign 61 to $printarray, actually
assigns 61 to $foo. This modified value is printed by line
6 of the main program.
Note: Aliasing enables you to pass more than one list to a subroutine.
@array1 = (1, 2, 3);
@array2 = (4, 5, 6);
&two_array_sub (*array1, *array2);
sub two_array_sub {
my (*subarray1, *subarray2) = @_;
}
In this case, the names array1 and array2 are passed to two_array_sub. subarray1 becomes an alias for array1, and subarray2 becomes an alias for array2.
Perl enables you to use the do statement to invoke a
subroutine. For example, the following statements are identical:
&my_sub(1, 2, 3); do my_sub(1, 2, 3);
There is no real reason to use the do statement in this
context.
By default, the built-in function sort sorts in
alphabetical order. The following is an example:
@list = ("words", "to", "sort");
@list2 = sort (@list);
Here, @list2 is assigned ("sort",
"to", "words").
If you like, you can write a subroutine that defines how
sorting is to be accomplished. To understand how to do this,
first you need to know a little about how sorting works.
When sort is given a list to sort, it determines the
sort order of the elements of the list by repeatedly comparing
pairs of elements. To compare a pair of elements, sort
calls a special internal subroutine and passes it a pair of
arguments. Although the subroutine is not accessible from a Perl
program, it basically behaves as follows:
sub sort_criteria {
if ($a gt $b) {
retval = -1;
} elsif ($a eq $b) {
retval = 0;
} else
retval = 1;
}
$retval;
}
This subroutine compares two values, which are stored in $a
and $b. It returns 1 if the first value is greater,
0 if the values are equal, and 1 if the second value is greater.
(This, by the way, is how the cmp operator works; in fact,
the preceding subroutine could compare the two values using a single cmp
operator.)
To define your own sorting rules, you must write a subroutine
whose behavior is identical to the preceding subroutine. This subroutine
must use two global variables named $a and $b to
represent the two items in the list currently being compared, and the
subroutine must return one of the following values:
1 If $a is to appear after $b in the resulting sorted list
Note: Even though $a and $b are global variables that are used by the sorting subroutine, you still can define global variables of your own named $a and $b without risking their being overwritten.
The built-in function sort saves any existing values of $a and $b before sorting, and then it restores them when sorting is completed.
Once you have written the subroutine, you must specify the
subroutine name when calling the function sort. For
example, if you define a function named foo that provides
a set of sorting rules, the following statement sorts a list
using the rules defined in foo:
@list2 = sort foo (@list1);
Listing 9.14 shows how you can define your own sort criteria.
This program sorts a list in the normal order, except that it
puts strings starting with a digit last. (By default, strings
starting with a number appear before strings starting with a
letter, and before some--but not all--special characters.)
Strings that begin with a digit are assumed to be numbers and are
sorted in numerical order.
Listing 9.14. A program that defines
sort criteria.
1: #!/usr/local/bin/perl
2:
3: @list1 = ("test", "14", "26", "test2");
4: @list2 = sort num_last (@list1);
5: print ("@list2\n");
6:
7: sub num_last {
8: my ($num_a, $num_b);
9:
10: $num_a = $a =~ /^[0-9]/;
11: $num_b = $b =~ /^[0-9]/;
12: if ($num_a && $num_b) {
13: $retval = $a <=> $b;
14: } elsif ($num_a) {
15: $retval = 1;
16: } elsif ($num_b) {
17: $retval = -1;
18: } else {
19: $retval = $a cmp $b;
20: }
21: $retval;
22: }
$ program9_14
test test2 14 26
$
Line 4 sorts the program according to the sort criteria
defined in the subroutine num_last. This subroutine is defined
in lines 7-22.
This subroutine first determines whether the items are strings
that begin with a digit. Line 10 sets the local variable $num_a
to a nonzero value if the value stored in $a starts with a
digit; similarly, line 11 sets $num_b to a nonzero value
if the value of $b starts with a digit.
Lines 12 and 13 handle the case in which both $num_a
and $num_b are true. In this case, the two strings are
assumed to be digits, and the numeric comparison operator <=>
compares their values. The result of the <=>
operation is 1 if the first number is larger, 0 if they are
equal, and 1 if the second number is larger.
If $num_a is true but $num_b is false, line 15
sets the return value for this subroutine to 1, indicating that
the string that does not start with a digit, $b, is to be
treated as greater. Similarly, line 17 sets the return value to
1 if $b starts with a digit and $a does not.
If neither string starts with a digit, line 19 uses the normal
sort criterion--alphabetical order--to determine which value is larger.
Here, the cmp operator is useful. It returns 1 if
the first string is alphabetically greater, 0 if the strings are
equal, and 1 if the second string is alphabetically greater.
Perl 5 defines three special subroutines that are executed at
specific times.
Note: These subroutines are not supported in Perl 4.
Perl 5 enables you to create code that is executed when your
program is started. To do this, create a special subroutine named BEGIN.
For example
BEGIN {
print("Hi! Welcome to Perl!\n");
}
When your program begins execution, the following line appears
on your screen:
Hi! Welcome to Perl!
The BEGIN subroutine behaves just like any other Perl
subroutine. For example, you can define local variables for it or
call other subroutines from it.
Note: If you like, you can define multiple BEGIN subroutines. These subroutines are called in the order in which they appear in the program.
Perl 5 enables you to create code to be executed when your
program terminates execution. To do this, define an END subroutine,
as in the following example:
END {
print("Thank you for using Perl!\n");
}
The code contained in the END subroutine is always
executed by your program, even if the program is terminated using die. For
example, the code
die("Prepare to die!\n");
END {
print("Ha! You can't kill me!\n");
}
displays the following on your screen:
Prepare to die! Ha! You can't kill me!
Note: You can define multiple END subroutines in your program. In this case, the subroutines are executed in reverse order of appearance, with the last one executed first.
Perl 5 enables you to define a special subroutine named AUTOLOAD
that is called whenever the Perl interpreter is told to call a
subroutine that does not exist. Listing 9.15 illustrates the use
of AUTOLOAD.
Listing 9.15. A program that uses AUTOLOAD.
1: #!/usr/local/bin/perl
2:
3: ¬here("hi", 46);
4:
5: AUTOLOAD {
6: print("subroutine $AUTOLOAD not found\n");
7: print("arguments passed: @_\n");
8: }
$ program9_15
subroutine main::nothere not found
arguments passed: hi 46
$
This program tries to call the non-existent subroutine nothere.
When the Perl interpreter discovers that nothere does not
exist, it calls the AUTOLOAD subroutine.
Line 6 uses a special scalar variable, $AUTOLOAD, which
contains the name of the subroutine you tried to call. (The main:: text
that appears before the subroutine name, nothere, is the
name of the package in which the subroutine is found. By default, all
your code is placed in one package, called main, so you
normally won't need to worry about packages. For more information
on creating other packages, see Day 18, "Object-Oriented
Programming.")
When AUTOLOAD is called, the arguments that were to be
passed to the non-existent subroutine are passed to AUTOLOAD
instead. This means that the @ array variable contains the list ("hi",
46), because these are the arguments that were to be passed
to nothere.
Tip: AUTOLOAD is useful if you plan to organize your Perl program into modules, because you can use it to ensure that crucial subroutines from other files actually exist when you need them. For more information on organizing Perl programs into modules, see Day 18.
Today, you learned about subroutines, which are separated
chunks of code intended to perform specific tasks. A subroutine can
appear anywhere in your program.
To invoke a subroutine, specify its name preceded by the &
character. In Perl 5, the & character is not required
if the subroutine exists, or if a forward reference is defined.
A subroutine can return a value (either a scalar value or a
list). This return value is the value of the last expression
evaluated inside the subroutine. If this last expression is at
the end of the subroutine, the subroutine is a single-exit
module.
You can define local variables for use inside subroutines.
These local variables exist only while the subroutine is being executed.
When a subroutine finishes, its local variables are destroyed; if
it is invoked again, new copies of the local variables are
defined.
You can pass values to subroutines; these values are called
arguments. You can pass as many arguments as you like, but only one
of these arguments can be a list. If a list is passed to a subroutine,
it must be the last argument passed.
The arguments passed to a subroutine are converted into a list
and assigned to a special system variable, @_. One copy of @_
exists for each list of arguments passed to a subroutine (that
is, @_ is a local variable).
Subroutines can call other subroutines (nested subroutines)
and even can call themselves (recursive subroutines).
You can pass an array variable to a subroutine by name by
defining an alias for the variable name. This alias affects all variables
of that name.
You can use the do statement to invoke a subroutine,
although there is no real reason to do so.
You can define a subroutine that specifies the order in which
the elements of a list are to be sorted. To use the sort criteria defined
by a subroutine, include its name with the call to sort.
The BEGIN subroutine is always executed before your
program begins execution. The END subroutine is always
executed when your program terminates, even if it was killed off
using die. The AUTOLOAD subroutine is executed if
your program tries to call a subroutine that does not exist.
sub breakline {
local ($line) = @_;
@words = split(/\s+/, $line);
}
sub printcount {
for ($count = 0; $count <= 9; $count++) {
print ("$occurs[$count]\n");
}
}
sub arraybyname {
local (*localname) = @_;
}
arraybyname (*name);
The Workshop provides quiz questions to help you solidify your
understanding of the material covered and exercises to give you
experience in using what you've learned. Try and understand the quiz
and exercise answers before you go on to tomorrow's lesson.
#!/usr/local/bin/perl
$total = 0;
@list = (1, 2, 3);
@list2 = &my_sub;
sub my_sub {
local ($total);
$total = 1;
@list = (4, 5, 6);
}
sub sub1 {
$count = $sum = 0;
while ($count <= 10) {
$sum += $count;
$count++;
}
}
#!/usr/local/bin/perl
@list = (1, 2, 3);
&testsub(*list);
sub testsub {
local (*sublist) = @_;
$sublist[1] = 5;
}
#!/usr/local/bin/perl
for ($count = 1; $count <= 10; $count++) {
&print_ten ($count);
}
sub print_ten {
local ($multiplier) = @_;
for ($count = 1; $count <= 10; $count++) {
$printval = $multiplier * 10 + $count;
print ("$printval\n");
}
}
#!/usr/local/bin/perl
$line = <STDIN>;
@words = split(/\s+/, $line);
$searchword = <STDIN>;
&search_for_word (@words, $searchword);
sub search_for_word {
local (@searchlist, $searchword) = @_;
foreach $word (@searchlist) {
return (1) if ($word eq $searchword);
}
$retval = 0;
}