Newsgroups: comp.lang.perl,news.answers
Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!spool.mu.edu!uwm.edu!cs.utexas.edu!uunet!boulder!wraeththu.cs.colorado.edu!tchrist
From: Tom Christiansen <tchrist@cs.Colorado.EDU>
Subject: Perl Frequently Asked Questions, part 3 of 4
Message-ID: <CE9Br9.275@Colorado.EDU>
Followup-To: comp.lang.perl
Originator: tchrist@wraeththu.cs.colorado.edu
Sender: news@Colorado.EDU (USENET News System)
Organization: University of Colorado at Boulder
Date: Sat, 2 Oct 1993 06:37:56 GMT
Approved: news-answers-request@MIT.Edu
Expires: Wed, 1 Dec 1993 12:00:00 GMT
Lines: 1265
Xref: senator-bedfellow.mit.edu comp.lang.perl:20584 news.answers:13122

Archive-name: perl-faq/part3
Version: $Id: perl-tech1,v 1.1 93/10/02 00:27:00 tchrist Exp Locker: tchrist $

This posting contains answers to the following technical questions
regarding Perl:

2.1) What are all these $@*%<> signs and how do I know when to use them?
2.2) Why don't backticks work as they do in shells?  
2.3) How come Perl operators have different precedence than C operators?
2.4) How come my converted awk/sed/sh script runs more slowly in Perl?
2.5) How can I call my system's unique C functions from Perl?
2.6) Where do I get the include files to do ioctl() or syscall()?
2.7) Why doesn't "local($foo) = <FILE>;" work right?
2.8) How can I detect keyboard input without reading it, 
2.9) how can I read a single character from the keyboard under UNIX and DOS?
2.10) How can I make an array of arrays or other recursive data types?
2.11) How do I make an array of structures containing various data types?
2.12) How can I quote a variable to use in a regexp?
2.13) Why do setuid Perl scripts complain about kernel problems?
2.14) How do I open a pipe both to and from a command?
2.15) How can I change the first N letters of a string?
2.16) How can I manipulate fixed-record-length files?
2.17) How can I make a file handle local to a subroutine?
2.18) How can I extract just the unique elements of an array?
2.19) How can I call alarm() or usleep() from Perl?
2.20) How can I test whether an array contains a certain element?
2.21) How can I do an atexit() or setjmp()/longjmp() in Perl?
2.22) Why doesn't Perl interpret my octal data octally?
2.23) How do I sort an associative array by value instead of by key?
2.24) How can I capture STDERR from an external command?
2.25) Why doesn't open return an error when a pipe open fails?
2.26) How can I compare two date strings?
2.27) What's the fastest way to code up a given task in perl?
2.28) How can I know how many entries are in an associative array?


2.1) What are all these $@*%<> signs and how do I know when to use them?

    Those are type specifiers: $ for scalar values, @ for indexed arrays,
    and % for hashed arrays.  The * means all types of that symbol name
    and are sometimes used like pointers; the <> are used for inputting
    a record from a filehandle.  See the question on arrays of arrays
    for more about Perl pointers.

    Always make sure to use a $ for single values and @ for multiple ones.
    Thus element 2 of the @foo array is accessed as $foo[2], not @foo[2],
    which is a list of length one (not a scalar), and is a fairly common
    novice mistake.  Sometimes you can get by with @foo[2], but it's
    not really doing what you think it's doing for the reason you think
    it's doing it, which means one of these days, you'll shoot yourself
    in the foot; ponder for a moment what these will really do:
	@foo[0] = `cmd args`;
	@foo[2] = <FILE>;
    Just always say $foo[2] and you'll be happier.

    This may seem confusing, but try to think of it this way:  you use the
    character of the type which you *want back*.  You could use @foo[1..3] for
    a slice of three elements of @foo, or even @foo{A,B,C} for a slice of
    of %foo.  This is the same as using ($foo[1], $foo[2], $foo[3]) and
    ($foo{A}, $foo{B}, $foo{C}) respectively.  In fact, you can even use
    lists to subscript arrays and pull out more lists, like @foo[@bar] or
    @foo{@bar}, where @bar is in both cases presumably a list of subscripts.

    While there are a few places where you don't actually need these type
    specifiers, except for files, you should always use them.  Note that
    <FILE> is NOT the type specifier for files; it's the equivalent of awk's
    getline function, that is, it reads a line from the handle FILE.  When
    doing open, close, and other operations besides the getline function on
    files, do NOT use the brackets.

    Beware of saying:
	$foo = BAR;
    Which wil be interpreted as 
	$foo = 'BAR';
    and not as 
	$foo = <BAR>;
    If you always quote your strings, you'll avoid this trap.

    Normally, files are manipulated something like this (with appropriate
    error checking added if it were production code):

	open (FILE, ">/tmp/foo.$$");
	print FILE "string\n";
	close FILE;

    If instead of a filehandle, you use a normal scalar variable with file
    manipulation functions, this is considered an indirect reference to a
    filehandle.  For example,

	$foo = "TEST01";
	open($foo, "file");

    After the open, these two while loops are equivalent:

	while (<$foo>) {}
	while (<TEST01>) {}

    as are these two statements:
	
	close $foo;
	close TEST01;

    but NOT to this:

	while (<$TEST01>) {} # error
		^
		^ note spurious dollar sign

    This is another common novice mistake; often it's assumed that

	open($foo, "output.$$");

    will fill in the value of $foo, which was previously undefined.  
    This just isn't so -- you must set $foo to be the name of a valid
    filehandle before you attempt to open it.


2.2) Why don't backticks work as they do in shells?  

    Several reason.  One is because backticks do not interpolate within
    double quotes in Perl as they do in shells.  
    
    Let's look at two common mistakes:

         $foo = "$bar is `wc $file`";  # WRONG

    This should have been:

	 $foo = "$bar is " . `wc $file`;

    But you'll have an extra newline you might not expect.  This
    does not work as expected:

      $back = `pwd`; chdir($somewhere); chdir($back); # WRONG

    Because backticks do not automatically eat trailing or embedded
    newlines.  The chop() function will remove the last character from
    a string.  This should have been:

	  chop($back = `pwd`); chdir($somewhere); chdir($back);

    You should also be aware that while in the shells, embedding
    single quotes will protect variables, in Perl, you'll need 
    to escape the dollar signs.

	Shell: foo=`cmd 'safe $dollar'`
	Perl:  $foo=`cmd 'safe \$dollar'`;
	

2.3) How come Perl operators have different precedence than C operators?

    Actually, they don't; all C operators have the same precedence in Perl as
    they do in C.  The problem is with a class of functions called list
    operators, e.g. print, chdir, exec, system, and so on.  These are somewhat
    bizarre in that they have different precedence depending on whether you
    look on the left or right of them.  Basically, they gobble up all things
    on their right.  For example,

	unlink $foo, "bar", @names, "others";

    will unlink all those file names.  A common mistake is to write:

	unlink "a_file" || die "snafu";

    The problem is that this gets interpreted as

	unlink("a_file" || die "snafu");

    To avoid this problem, you can always make them look like function calls
    or use an extra level of parentheses:

	(unlink "a_file") || die "snafu";
	unlink("a_file")  || die "snafu";

    Sometimes you actually do care about the return value:

	unless ($io_ok = print("some", "list")) { } 

    Yes, print() return I/O success.  That means

	$io_ok = print(2+4) * 5;

    returns 5 times whether printing (2+4) succeeded, and 
	print(2+4) * 5;
    returns the same 5*io_success value and tosses it.

    See the Perl man page's section on Precedence for more gory details,
    and be sure to use the -w flag to catch things like this.


2.4) How come my converted awk/sed/sh script runs more slowly in Perl?

    The natural way to program in those languages may not make for the fastest
    Perl code.  Notably, the awk-to-perl translator produces sub-optimal code;
    see the a2p man page for tweaks you can make.

    Two of Perl's strongest points are its associative arrays and its regular
    expressions.  They can dramatically speed up your code when applied
    properly.  Recasting your code to use them can help a lot.

    How complex are your regexps?  Deeply nested sub-expressions with {n,m} or
    * operators can take a very long time to compute.  Don't use ()'s unless
    you really need them.  Anchor your string to the front if you can.

    Something like this:
	next unless /^.*%.*$/; 
    runs more slowly than the equivalent:
	next unless /%/;

    Note that this:
	next if /Mon/;
	next if /Tue/;
	next if /Wed/;
	next if /Thu/;
	next if /Fri/;
    runs faster than this:
	next if /Mon/ || /Tue/ || /Wed/ || /Thu/ || /Fri/;
    which in turn runs faster than this:
	next if /Mon|Tue|Wed|Thu|Fri/;
    which runs *much* faster than:
	next if /(Mon|Tue|Wed|Thu|Fri)/;

    There's no need to use /^.*foo.*$/ when /foo/ will do.

    Remember that a printf costs more than a simple print.

    Don't split() every line if you don't have to.

    Another thing to look at is your loops.  Are you iterating through 
    indexed arrays rather than just putting everything into a hashed 
    array?  For example,

	@list = ('abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stv');

	for $i ($[ .. $#list) {
	    if ($pattern eq $list[$i]) { $found++; } 
	} 

    First of all, it would be faster to use Perl's foreach mechanism
    instead of using subscripts:

	foreach $elt (@list) {
	    if ($pattern eq $elt) { $found++; } 
	} 

    Better yet, this could be sped up dramatically by placing the whole
    thing in an associative array like this:

	%list = ('abc', 1, 'def', 1, 'ghi', 1, 'jkl', 1, 
		 'mno', 1, 'pqr', 1, 'stv', 1 );
	$found += $list{$pattern};
    
    (but put the %list assignment outside of your input loop.)

    You should also look at variables in regular expressions, which is
    expensive.  If the variable to be interpolated doesn't change over the
    life of the process, use the /o modifier to tell Perl to compile the
    regexp only once, like this:

	for $i (1..100) {
	    if (/$foo/o) {
		&some_func($i);
	    } 
	} 

    Finally, if you have a bunch of patterns in a list that you'd like to 
    compare against, instead of doing this:

	@pats = ('_get.*', 'bogus', '_read', '.*exit', '_write');
	foreach $pat (@pats) {
	    if ( $name =~ /^$pat$/ ) {
		&some_func();
		last;
	    }
	}

    If you build your code and then eval it, it will be much faster.
    For example:

	@pats = ('_get.*', 'bogus', '_read', '.*exit', '_write');
	$code = <<EOS
		while (<>) { 
		    study;
EOS
	foreach $pat (@pats) {
	    $code .= <<EOS
		if ( /^$pat\$/ ) {
		    &some_func();
		    next;
		}
EOS
	}
	$code .= "}\n";
	print $code if $debugging;
	eval $code;



2.5) How can I call my system's unique C functions from Perl?

    If these are system calls and you have the syscall() function, then
    you're probably in luck -- see the next question.  For arbitrary
    library functions, it's not quite so straight-forward.  While you
    can't have a C main and link in Perl routines, if you're
    determined, you can extend Perl by linking in your own C routines.
    See the usub/ subdirectory in the Perl distribution kit for an example
    of doing this to build a Perl that understands curses functions.  It's
    neither particularly easy nor overly-documented, but it is feasible.


2.6) Where do I get the include files to do ioctl() or syscall()?

    These are generated from your system's C include files using the h2ph
    script (once called makelib) from the Perl source directory.  This will
    make files containing subroutine definitions, like &SYS_getitimer, which
    you can use as arguments to your function.

    You might also look at the h2pl subdirectory in the Perl source for how to
    convert these to forms like $SYS_getitimer; there are both advantages and
    disadvantages to this.  Read the notes in that directory for details.  
   
    In both cases, you may well have to fiddle with it to make these work; it
    depends how funny-looking your system's C include files happen to be.

    If you're trying to get at C structures, then you should take a look
    at using c2ph, which uses debugger "stab" entries generated by your
    BSD or GNU C compiler to produce machine-independent perl definitions
    for the data structures.  This allows to you avoid hardcoding
    structure layouts, types, padding, or sizes, greatly enhancing
    portability.  c2ph comes with the perl distribution.  On an SCO
    system, GCC only has COFF debugging support by default, so you'll have
    to build GCC 2.1 with DBX_DEBUGGING_INFO defined, and use -gstabs to
    get c2ph to work there.

    See the file /pub/perl/info/ch2ph on convex.com via anon ftp 
    for more traps and tips on this process.


2.7) Why doesn't "local($foo) = <FILE>;" work right?

    Well, it does.  The thing to remember is that local() provides an array
    context, and that the <FILE> syntax in an array context will read all the
    lines in a file.  To work around this, use:

	local($foo);
	$foo = <FILE>;

    You can use the scalar() operator to cast the expression into a scalar
    context:

	local($foo) = scalar(<FILE>);


2.8) How can I detect keyboard input without reading it? 

    You should check out the Frequently Asked Questions list in
    comp.unix.* for things like this: the answer is essentially the same.
    It's very system dependent.  Here's one solution that works on BSD
    systems:

	sub key_ready {
	    local($rin, $nfd);
	    vec($rin, fileno(STDIN), 1) = 1;
	    return $nfd = select($rin,undef,undef,0);
	}


2.9) How can I read a single character from the keyboard under UNIX and DOS?

    A closely related question to the no-echo question is how to input a
    single character from the keyboard.  Again, this is a system dependent
    operation.  The following code that may or may not help you.  It should
    work on both SysV and BSD flavors of UNIX:

	$BSD = -f '/vmunix';
	if ($BSD) {
	    system "stty cbreak </dev/tty >/dev/tty 2>&1";
	}
	else {
	    system "stty", '-icanon',
	    system "stty", 'eol', "\001"; 
	}

	$key = getc(STDIN);

	if ($BSD) {
	    system "stty -cbreak </dev/tty >/dev/tty 2>&1";
	}
	else {
	    system "stty", 'icanon';
	    system "stty", 'eol', '^@'; # ascii null
	}
	print "\n";

    You could also handle the stty operations yourself for speed if you're
    going to be doing a lot of them.  This code works to toggle cbreak
    and echo modes on a BSD system:

    sub set_cbreak { # &set_cbreak(1) or &set_cbreak(0)
	local($on) = $_[0];
	local($sgttyb,@ary);
	require 'sys/ioctl.ph';
	$sgttyb_t   = 'C4 S' unless $sgttyb_t;  # c2ph: &sgttyb'typedef()

	ioctl(STDIN,&TIOCGETP,$sgttyb) || die "Can't ioctl TIOCGETP: $!";

	@ary = unpack($sgttyb_t,$sgttyb);
	if ($on) {
	    $ary[4] |= &CBREAK;
	    $ary[4] &= ~&ECHO;
	} else {
	    $ary[4] &= ~&CBREAK;
	    $ary[4] |= &ECHO;
	}
	$sgttyb = pack($sgttyb_t,@ary);

	ioctl(STDIN,&TIOCSETP,$sgttyb) || die "Can't ioctl TIOCSETP: $!";
    }

    Note that this is one of the few times you actually want to use the
    getc() function; it's in general way too expensive to call for normal
    I/O.  Normally, you just use the <FILE> syntax, or perhaps the read()
    or sysread() functions.

    For perspectives on more portable solutions, use anon ftp to retrieve
    the file /pub/perl/info/keypress from convex.com.

    For DOS systems, Dan Carson <dbc@tc.fluke.COM> reports:

    To put the PC in "raw" mode, use ioctl with some magic numbers gleaned
    from msdos.c (Perl source file) and Ralf Brown's interrupt list (comes
    across the net every so often):

	$old_ioctl = ioctl(STDIN,0,0);     # Gets device info
	$old_ioctl &= 0xff;
	ioctl(STDIN,1,$old_ioctl | 32);    # Writes it back, setting bit 5

    Then to read a single character:

	sysread(STDIN,$c,1);               # Read a single character

    And to put the PC back to "cooked" mode:

	ioctl(STDIN,1,$old_ioctl);         # Sets it back to cooked mode.


    So now you have $c.  If ord($c) == 0, you have a two byte code, which
    means you hit a special key.  Read another byte (sysread(STDIN,$c,1)),
    and that value tells you what combination it was according to this
    table:

	# PC 2-byte keycodes = ^@ + the following:

	# HEX     KEYS
	# ---     ----
	# 0F      SHF TAB
	# 10-19   ALT QWERTYUIOP
	# 1E-26   ALT ASDFGHJKL
	# 2C-32   ALT ZXCVBNM
	# 3B-44   F1-F10
	# 47-49   HOME,UP,PgUp
	# 4B      LEFT
	# 4D      RIGHT
	# 4F-53   END,DOWN,PgDn,Ins,Del
	# 54-5D   SHF F1-F10
	# 5E-67   CTR F1-F10
	# 68-71   ALT F1-F10
	# 73-77   CTR LEFT,RIGHT,END,PgDn,HOME
	# 78-83   ALT 1234567890-=
	# 84      CTR PgUp

    This is all trial and error I did a long time ago, I hope I'm reading the
    file that worked.


2.10) How can I make an array of arrays or other recursive data types?

    Remember that Perl isn't about nested data structures (actually,
    perl0 ..  perl4 weren't, but maybe perl5 will be, at least
    somewhat).  It's about flat ones, so if you're trying to do this, you
    may be going about it the wrong way or using the wrong tools.  You
    might try parallel arrays with common subscripts.

    But if you're bound and determined, you can use the multi-dimensional
    array emulation of $a{'x','y','z'}, or you can make an array of names
    of arrays and eval it.

    For example, if @name contains a list of names of arrays, you can 
    get at a the j-th element of the i-th array like so:

	$ary = $name[$i];
	$val = eval "\$$ary[$j]";

    or in one line

	$val = eval "\$$name[$i][\$j]";

    You could also use the type-globbing syntax to make an array of *name
    values, which will be more efficient than eval.  Here @name hold
    a list of pointers, which we'll have to dereference through a temporary
    variable.

    For example:

	{ local(*ary) = $name[$i]; $val = $ary[$j]; }

    In fact, you can use this method to make arbitrarily nested data
    structures.  You really have to want to do this kind of thing
    badly to go this far, however, as it is notationally cumbersome.

    Let's assume you just simply *have* to have an array of arrays of
    arrays.  What you do is make an array of pointers to arrays of
    pointers, where pointers are *name values described above.  You
    initialize the outermost array normally, and then you build up your
    pointers from there.  For example:

	@w = ( 'ww' .. 'xx' );
	@x = ( 'xx' .. 'yy' );
	@y = ( 'yy' .. 'zz' );
	@z = ( 'zz' .. 'zzz' );

	@ww = reverse @w;
	@xx = reverse @x;
	@yy = reverse @y;
	@zz = reverse @z;

    Now make a couple of array of pointers to these:

	@A = ( *w, *x, *y, *z );
	@B = ( *ww, *xx, *yy, *zz );

    And finally make an array of pointers to these arrays:

	@AAA = ( *A, *B );

    To access an element, such as AAA[i][j][k], you must do this:

	local(*foo) = $AAA[$i];
	local(*bar) = $foo[$j];
	$answer = $bar[$k];

    Similar manipulations on associative arrays are also feasible.

    You could take a look at recurse.pl package posted by Felix Lee
    <flee@cs.psu.edu>, which lets you simulate vectors and tables (lists and
    associative arrays) by using type glob references and some pretty serious
    wizardry.

    In C, you're used to creating recursive datatypes for operations
    like recursive decent parsing or tree traversal.  In Perl, these
    algorithms are best implemented using associative arrays.  Take an
    array called %parent, and build up pointers such that $parent{$person}
    is the name of that person's parent.  Make sure you remember that
    $parent{'adam'} is 'adam'. :-) With a little care, this approach can
    be used to implement general graph traversal algorithms as well.

    In Perl5, it's quite easy to declare these things.  For example

	@A = (
	    [ 'ww' .. 'xx'  ], 
	    [ 'xx' .. 'yy'  ], 
	    [ 'yy' .. 'zz'  ], 
	    [ 'zz' .. 'zzz' ], 
	);

    And now reference $A[2]->[0] to pull out "yy".  These may also nest
    and mix with tables:

	%T = (
	    key0, { k0, v0, k1, v1 },	
	    key1, { k2, v2, k3, v3 },	
	    key2, { k2, v2, k3, [ 0, 'a' .. 'z' ] },	
	);
    
    Allosing you to reference $T{key2}->{k3}->[3] to pull out 'c'.



2.11) How do I make an array of structures containing various data types?

    Well, soon you may not have to, but for now, let's look at ways to 
    synthesize these.

    One scheme I've invented uses what I call pseudoanonymous packages.
    This was motivated because I wanted an associative array of structures
    in which each structure contained not merely scalar data, but also lists
    and tables.   

    The table (read: associative array) is called %Active_Folders, whose
    key is the name of the folder, and whose values are, well, *logically*
    they're each a structure whose components look like this:

	$Current_Folder
	$Current_Seq  
	$Current_Line
	$Top_Line   
	$Incomplete_Read    
	$Folder_ID  
	$Last_Typed 
	@Scan_Lines
	%Scan_IDs 
	%Deleted 

    The way it works is that I only have one folder active at once.
    Those symbols as listed above are accessible from anywhere in the
    program.  The trick is that when I want to switch folders, I change
    what they point to!  You see, there's a package for each folder name
    that contains the real data.  So, it's not like I get to dereference

	$Active_Folder{$foldername}->$Current_Line

    or 
	$Active_Folder{$foldername}->$Scan_IDs{$msgnum}

    Although I'd like to.  I have to switch folders to $foldername first,
    and then access the individual fields directly.  The package isn't intuitable,
    which is why it's a pseudoanonymous one.

    Hm, I've this scary feeling that in Perl5, the last line will really read:

	${$Active_Folder{$foldername}->Scan_IDs}->{$msgnum}

    or something, which is truly impossible for my brain to parse.  But I'm not
    real clear on it.  I get muddled up part way through whenever Larry explains
    how multiple levels of deferencing will work, and I'm not even sure I'll be
    able to get away with the above without setting up lots of pointers first.

    Anyway, here's the code that allows associative arrays of structures of 
    random data types.  I haven't done more than one level yet, although 
    surely you could embed the value of $Active_Folders{$folder} as a $Prev_Folder
    field in each, then do the right appropriate thing.

	sub gensym { 'gensym_' . ++$gensym'symbol } 

	sub activate_folder {
	    local($folder) = @_;

	    &assert('$folder',$folder);

	    $Last_Seq = $Current_Seq;

	    if (! defined $Active_Folders{$folder}) {
		$Active_Folders{$folder} = &gensym;
		push(@Active_Folders, $folder);
	    }

	    local($package) = $Active_Folders{$folder};

	    local($code)=<<"EOF";
		{
			package $package;
			*'Current_Folder    = *Current_Folder;
			*'Current_Seq       = *Current_Seq;
			*'Current_Line      = *Current_Line;
			*'Top_Line          = *Top_Line;
			*'Scan_Lines        = *Scan_Lines;
			*'Scan_IDs          = *Scan_IDs;
			*'Incomplete_Read   = *Incomplete_Read;
			*'Folder_ID         = *Folder_ID;
			*'Last_Typed        = *Last_Typed;
			*'Deleted           = *Deleted;
		    }
	    EOF
	    eval $code;
	    $Current_Seq = $folder;

	    &panic("bad eval: $@\n$code\n") if $@;
	} 


2.12) How can I quote a variable to use in a regexp?

    From the manual:

	$pattern =~ s/(\W)/\\$1/g;

    Now you can freely use /$pattern/ without fear of any unexpected
    meta-characters in it throwing off the search.  If you don't know
    whether a pattern is valid or not, enclose it in an eval to avoid
    a fatal run-time error.


2.13) Why do setuid Perl scripts complain about kernel problems?

    This message:

    YOU HAVEN'T DISABLED SET-ID SCRIPTS IN THE KERNEL YET!
    FIX YOUR KERNEL, PUT A C WRAPPER AROUND THIS SCRIPT, OR USE -u AND UNDUMP!

    is triggered because setuid scripts are inherently insecure due to a
    kernel bug.  If your system has fixed this bug, you can compile Perl
    so that it knows this.  Otherwise, create a setuid C program that just
    execs Perl with the full name of the script.  Larry's wrapsuid 
    script can help.


2.14) How do I open a pipe both to and from a command?

    In general, this is a dangerous move because you can find yourself in a
    deadlock situation.  It's better to put one end of the pipe to a file.
    For example:

	# first write some_cmd's input into a_file, then 
	open(CMD, "some_cmd its_args < a_file |");
	while (<CMD>) {

	# or else the other way; run the cmd
	open(CMD, "| some_cmd its_args > a_file");
	while ($condition) {
	    print CMD "some output\n";
	    # other code deleted
	} 
	close CMD || warn "cmd exited $?";

	# now read the file
	open(FILE,"a_file");
	while (<FILE>) {

    If you have ptys, you could arrange to run the command on a pty and
    avoid the deadlock problem.  See the chat2.pl package in the
    distributed library for ways to do this.

    At the risk of deadlock, it is theoretically possible to use a
    fork, two pipe calls, and an exec to manually set up the two-way
    pipe.  (BSD system may use socketpair() in place of the two pipes,
    but this is not as portable.)  The open2 library function distributed
    with the current perl release will do this for you.

    It assumes it's going to talk to something like adb, both writing to
    it and reading from it.  This is presumably safe because you "know"
    that commands like adb will read a line at a time and output a line at
    a time.  Programs like sort that read their entire input stream first,
    however, are quite apt to cause deadlock.


2.15) How can I change the first N letters of a string?

    Remember that the substr() function produces an lvalue, that is, it may be
    assigned to.  Therefore, to change the first character to an S, you could
    do this:

	substr($var,0,1) = 'S';

    This assumes that $[ is 0;  for a library routine where you can't know $[,
    you should use this instead:

	substr($var,$[,1) = 'S';

    While it would be slower, you could in this case use a substitute:

	$var =~ s/^./S/;
    
    But this won't work if the string is empty or its first character is a
    newline, which "." will never match.  So you could use this instead:

	$var =~ s/^[^\0]?/S/;

    To do things like translation of the first part of a string, use substr,
    as in:

	substr($var, $[, 10) =~ tr/a-z/A-Z/;

    If you don't know then length of what to translate, something like
    this works:

	/^(\S+)/ && substr($_,$[,length($1)) =~ tr/a-z/A-Z/;
    
    For some things it's convenient to use the /e switch of the 
    substitute operator:

	s/^(\S+)/($tmp = $1) =~ tr#a-z#A-Z#, $tmp/e

    although in this case, it runs more slowly than does the previous example.


2.16) How can I manipulate fixed-record-length files?

    The most efficient way is using pack and unpack.  This is faster than
    using substr.  Here is a sample chunk of code to break up and put back
    together again some fixed-format input lines, in this case, from ps.

	# sample input line:
	#   15158 p5  T      0:00 perl /mnt/tchrist/scripts/now-what
	$ps_t = 'A6 A4 A7 A5 A*';
	open(PS, "ps|");
	$_ = <PS>; print;
	while (<PS>) {
	    ($pid, $tt, $stat, $time, $command) = unpack($ps_t, $_);
	    for $var ('pid', 'tt', 'stat', 'time', 'command' ) {
		print "$var: <", eval "\$$var", ">\n";
	    }
	    print 'line=', pack($ps_t, $pid, $tt, $stat, $time, $command),  "\n";
	}


2.17) How can I make a file handle local to a subroutine?

    You must use the type-globbing *VAR notation.  Here is some code to
    cat an include file, calling itself recursively on nested local
    include files (i.e. those with #include "file", not #include <file>):

	sub cat_include {
	    local($name) = @_;
	    local(*FILE);
	    local($_);

	    warn "<INCLUDING $name>\n";
	    if (!open (FILE, $name)) {
		warn "can't open $name: $!\n";
		return;
	    }
	    while (<FILE>) {
		if (/^#\s*include "([^"]*)"/) {
		    &cat_include($1);
		} else {
		    print;
		}
	    }
	    close FILE;
	}


2.18) How can I extract just the unique elements of an array?

    There are several possible ways, depending on whether the
    array is ordered and you wish to preserve the ordering.

    a) If @in is sorted, and you want @out to be sorted:

	$prev = 'nonesuch';
	@out = grep($_ ne $prev && (($prev) = $_), @in);

       This is nice in that it doesn't use much extra memory, 
       simulating uniq's behavior of removing only adjacent
       duplicates.

    b) If you don't know whether @in is sorted:

	undef %saw;
	@out = grep(!$saw{$_}++, @in);

    c) Like (b), but @in contains only small integers:

	@out = grep(!$saw[$_]++, @in);

    d) A way to do (b) without any loops or greps:

	undef %saw;
	@saw{@in} = ();
	@out = sort keys %saw;  # remove sort if undesired

    e) Like (d), but @in contains only small positive integers:

	undef @ary;
	@ary[@in] = @in;
	@out = sort @ary;


2.19) How can I call alarm() or usleep() from Perl?

    It's available as a built-in as of version 3.038.  If you want finer
    granularity than 1 second (as usleep() provides) and have itimers and
    syscall() on your system, you can use the following.  You could also
    use select().

    It takes a floating-point number representing how long to delay until
    you get the SIGALRM, and returns a floating- point number representing
    how much time was left in the old timer, if any.  Note that the C
    function uses integers, but this one doesn't mind fractional numbers.

    # alarm; send me a SIGALRM in this many seconds (fractions ok)
    # tom christiansen <tchrist@convex.com>
    sub alarm {
	require 'syscall.ph';
	require 'sys/time.ph';

	local($ticks) = @_;
	local($in_timer,$out_timer);
	local($isecs, $iusecs, $secs, $usecs);

	local($itimer_t) = 'L4'; # should be &itimer'typedef()

	$secs = int($ticks);
	$usecs = ($ticks - $secs) * 1e6;

	$out_timer = pack($itimer_t,0,0,0,0);  
	$in_timer  = pack($itimer_t,0,0,$secs,$usecs);

	syscall(&SYS_setitimer, &ITIMER_REAL, $in_timer, $out_timer)
	    && die "alarm: setitimer syscall failed: $!";

	($isecs, $iusecs, $secs, $usecs) = unpack($itimer_t,$out_timer);
	return $secs + ($usecs/1e6);
    }


2.20) How can I test whether an array contains a certain element?

    There are several ways to approach this.  If you are going to make
    this query many times and the values are arbitrary strings, the
    fastest way is probably to invert the original array and keep an
    associative array lying about whose keys are the first array's values.

	@blues = ('turquoise', 'teal', 'lapis lazuli');
	undef %is_blue;
	for (@blues) { $is_blue{$_} = 1; }

    Now you can check whether $is_blue{$some_color}.  It might have been
    a good idea to keep the blues all in an assoc array in the first place.

    If the values are all small integers, you could use a simple
    indexed array.  This kind of an array will take up less space:

	@primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
	undef @is_tiny_prime;
	for (@primes) { $is_tiny_prime[$_] = 1; }

    Now you check whether $is_tiny_prime[$some_number].

    If the values in question are integers, but instead of strings,
    you can save quite a lot of space by using bit strings instead:

	@articles = ( 1..10, 150..2000, 2017 );
	undef $read;
	grep (vec($read,$_,1) = 1, @articles);
    
    Now check whether vec($read,$n,1) is true for some $n.


2.21) How can I do an atexit() or setjmp()/longjmp() in Perl?

    Perl's exception-handling mechanism is its eval operator.  You 
    can use eval as setjmp and die as longjmp.  Here's an example
    of Larry's for timed-out input, which in C is often implemented
    using setjmp and longjmp:

	  $SIG{ALRM} = TIMEOUT;
	  sub TIMEOUT { die "restart input\n" }

	  do { eval { &realcode } } while $@ =~ /^restart input/;

	  sub realcode {
	      alarm 15;
	      $ans = <STDIN>;
	      alarm 0;
	  }

   Here's an example of Tom's for doing atexit() handling:

	sub atexit { push(@_exit_subs, @_) }

	sub _cleanup { unlink $tmp }

	&atexit('_cleanup');

	eval <<'End_Of_Eval';  $here = __LINE__;
	# as much code here as you want
	End_Of_Eval

	$oops = $@;  # save error message

	# now call his stuff
	for (@_exit_subs) { &$_() }

	$oops && ($oops =~ s/\(eval\) line (\d+)/$0 .
	    " line " . ($1+$here)/e, die $oops);

    You can register your own routines via the &atexit function now.  You
    might also want to use the &realcode method of Larry's rather than
    embedding all your code in the here-is document.  Make sure to leave
    via die rather than exit, or write your own &exit routine and call
    that instead.   In general, it's better for nested routines to exit
    via die rather than exit for just this reason.

    In Perl5, it will be easy to set this up because of the automatic
    processing of per-package END functions.

    Eval is also quite useful for testing for system dependent features,
    like symlinks, or using a user-input regexp that might otherwise
    blowup on you.


2.22) Why doesn't Perl interpret my octal data octally?

    Perl only understands octal and hex numbers as such when they occur
    as constants in your program.  If they are read in from somewhere
    and assigned, then no automatic conversion takes place.  You must
    explicitly use oct() or hex() if you want this kind of thing to happen.
    Actually, oct() knows to interpret both hex and octal numbers, while
    hex only converts hexadecimal ones.  For example:

	{
	    print "What mode would you like? ";
	    $mode = <STDIN>;
	    $mode = oct($mode);
	    unless ($mode) {
		print "You can't really want mode 0!\n";
		redo;
	    } 
	    chmod $mode, $file;
	} 

    Without the octal conversion, a requested mode of 755 would turn 
    into 01363, yielding bizarre file permissions of --wxrw--wt.

    If you want something that handles decimal, octal and hex input, 
    you could follow the suggestion in the man page and use:

	$val = oct($val) if $val =~ /^0/;

2.23) How do I sort an associative array by value instead of by key?

    You have to declare a sort subroutine to do this.  Let's assume
    you want an ASCII sort on the values of the associative array %ary.
    You could do so this way:

	foreach $key (sort by_value keys %ary) {
	    print $key, '=', $ary{$key}, "\n";
	} 
	sub by_value { $ary{$a} cmp $ary{$b}; }

    If you wanted a descending numeric sort, you could do this:

	sub by_value { $ary{$b} <=> $ary{$a}; }

    You can also inline your sort function, like this, at least if 
    you have a relatively recent patchlevel of perl4:

	foreach $key ( sort { $ary{$b} <=> $ary{$a} } keys %ary ) {
	    print $key, '=', $ary{$key}, "\n";
	} 

    If you wanted a function that didn't have the array name hard-wired
    into it, you could so this:

	foreach $key (&sort_by_value(*ary)) {
	    print $key, '=', $ary{$key}, "\n";
	} 
	sub sort_by_value {
	    local(*x) = @_;
	    sub _by_value { $x{$a} cmp $x{$b}; } 
	    sort _by_value keys %x;
	} 

    If you want neither an alphabetic nor a numeric sort, then you'll 
    have to code in your own logic instead of relying on the built-in
    signed comparison operators "cmp" and "<=>".

    Note that if you're sorting on just a part of the value, such as a
    piece you might extract via split, unpack, pattern-matching, or
    substr, then rather than performing that operation inside your sort
    routine on each call to it, it is significantly more efficient to
    build a parallel array of just those portions you're sorting on, sort
    the indices of this parallel array, and then to subscript your original
    array using the newly sorted indices.  This method works on both
    regular and associative arrays, since both @ary[@idx] and @ary{@idx}
    make sense.  See page 245 in the Camel Book on "Sorting an Array by a
    Computable Field" for a simple example of this.


2.24) How can I capture STDERR from an external command?

    There are three basic ways of running external commands:

	system $cmd;
	$output = `$cmd`;
	open (PIPE, "cmd |");

    In the first case, both STDOUT and STDERR will go the same place as
    the script's versions of these, unless redirected.  You can always put
    them where you want them and then read them back when the system
    returns.  In the second and third cases, you are reading the STDOUT
    *only* of your command.  If you would like to have merged STDOUT and
    STDERR, you can use shell file-descriptor redirection to dup STDERR to
    STDOUT:

	$output = `$cmd 2>&1`;
	open (PIPE, "cmd 2>&1 |");

    Another possibility is to run STDERR into a file and read the file 
    later, as in 

	$output = `$cmd 2>some_file`;
	open (PIPE, "cmd 2>some_file |");
    
    Here's a way to read from both of them and know which descriptor
    you got each line from.  The trick is to pipe only STDERR through
    sed, which then marks each of its lines, and then sends that
    back into a merged STDOUT/STDERR stream, from which your Perl program
    then reads a line at a time:

        open (CMD, 
          "3>&1 (cmd args 2>&1 1>&3 3>&- | sed 's/^/STDERR:/' 3>&-) 3>&- |");

        while (<CMD>) {
          if (s/^STDERR://)  {
              print "line from stderr: ", $_;
          } else {
              print "line from stdout: ", $_;
          }
        }

    Be apprised that you *must* use Bourne shell redirection syntax
    here, not csh!  In fact, you can't even do these things with csh.
    For details on how lucky you are that perl's system() and backtick
    and pipe opens all use Bourne shell, fetch the file from convex.com
    called /pub/csh.whynot -- and you'll be glad that perl's shell
    interface is the Bourne shell.

    There's an &open3 routine out there which will be merged with 
    &open2 in perl5 production.


2.25) Why doesn't open return an error when a pipe open fails?

    These statements:

	open(TOPIPE, "|bogus_command") || die ...
	open(FROMPIPE, "bogus_command|") || die ...

    will not fail just for lack of the bogus_command.  They'll only
    fail if the fork to run them fails, which is seldom the problem.

    If you're writing to the TOPIPE, you'll get a SIGPIPE if the child
    exits prematurely or doesn't run.  If you are reading from the
    FROMPIPE, you need to check the close() to see what happened.

    If you want an answer sooner than pipe buffering might otherwise
    afford you, you can do something like this:

	$kid = open (PIPE, "bogus_command |");   # XXX: check defined($kid)
	(kill 0, $kid) || die "bogus_command failed";

    This works fine if bogus_command doesn't have shell metas in it, but
    if it does, the shell may well not have exited before the kill 0.  You
    could always introduce a delay:

	$kid = open (PIPE, "bogus_command </dev/null |");
	sleep 1;
	(kill 0, $kid) || die "bogus_command failed";

    but this is sometimes undesirable, and in any event does not guarantee
    correct behavior.  But it seems slightly better than nothing.

    Similar tricks can be played with writable pipes if you don't wish to
    catch the SIGPIPE.


2.26) How can I compare two date strings?

    If the dates are in an easily parsed, predetermined format, then you
    can break them up into their component parts and call &timelocal from
    the distributed perl library.  If the date strings are in arbitrary
    formats, however, it's probably easier to use the getdate program
    from the Cnews distribution, since it accepts a wide variety of dates.
    Note that in either case the return values you will really be
    comparing will be the total time in seconds as return by time().
   
    Here's a getdate function for perl that's not very efficient; you 
    can do better this by sending it many dates at once or modifying
    getdate to behave better on a pipe.  Beware the hardcoded pathname.

	sub getdate {
	    local($_) = shift;

	    s/-(\d{4})$/+$1/ || s/\+(\d{4})$/-$1/; 
		# getdate has broken timezone sign reversal!

	    $_ = `/usr/local/lib/news/newsbin/getdate '$_'`;
	    chop;
	    $_;
	} 

    Richard Ohnemus <rick@IMD.Sterling.COM> actually has a getdate.y
    for use with the Perl yacc.  You can get this from ftp.sterling.com
    [192.124.9.1] in /local/perl-byacc1.8.1.tar.Z, or send the author
    mail for details.

    You might also consider using these: 

    date.pl        - print dates how you want with the sysv +FORMAT method
    date.shar      - routines to manipulate and calculate dates
    ftp-chat2.shar - updated version of ftpget. includes library and demo programs
    getdate.shar   - returns number of seconds since epoch for any given date
    ptime.shar     - print dates how you want with the sysv +FORMAT method

    You probably want 'getdate.shar'... these and other files can be ftp'd from
    the /pub/perl/scripts directory on coombs.anu.edu.au. See the README file in
    the /pub/perl directory for time and the European mirror site details.


2.27) What's the fastest way to code up a given task in perl?

    Because Perl so lends itself to a variety of different approaches
    for any given task, a common question is which is the fastest way
    to code a given task.  Since some approaches can be dramatically
    more efficient that others, it's sometimes worth knowing which is
    best.  Unfortunately, the implementation that first comes to mind,
    perhaps as a direct translation from C or the shell, often yields
    suboptimal performance.  Not all approaches have the same results
    across different hardware and software platforms.  Furthermore,
    legibility must sometimes be sacrificed for speed.

    While an experienced perl programmer can sometimes eye-ball the code
    and make an educated guess regarding which way would be fastest,
    surprises can still occur.  So, in the spirit of perl programming
    being an empirical science, the best way to find out which of several
    different methods runs the fastest is simply to code them all up and
    time them. For example:

	$COUNT = 10_000; $| = 1;

	print "method 1: ";

	    ($u, $s) = times;
	    for ($i = 0; $i < $COUNT; $i++) {
		# code for method 1
	    }
	    ($nu, $ns) = times;
	    printf "%8.4fu %8.4fs\n", ($nu - $u), ($ns - $s);

	print "method 2: ";

	    ($u, $s) = times;
	    for ($i = 0; $i < $COUNT; $i++) {
		# code for method 2
	    }
	    ($nu, $ns) = times;
	    printf "%8.4fu %8.4fs\n", ($nu - $u), ($ns - $s);

    For more specific tips, see the section on Efficiency in the
    ``Other Oddments'' chapter at the end of the Camel Book.


2.28) How can I know how many entries are in an associative array?

    While the number of elements in a @foobar array is simply @foobar when
    used in a scalar, you can't figure out how many elements are in an
    associative array in an analogous fashion.  That's because %foobar in
    a scalar context returns the ratio (as a string) of number of buckets
    filled versus the number allocated.  For example, scalar(%ENV) might
    return "20/32".  While perl could in theory keep a count, this would
    break down on associative arrays that have been bound to dbm files.

    However, while you can't get a count this way, one thing you *can* use
    it for is to determine whether there are any elements whatsoever in
    the array, since "if (%table)" is guaranteed to be false if nothing
    has ever been stored in it.  

    So you either have to keep your own count around and increments
    it every time you store a new key in the array, or else do it
    on the fly when you really care, perhaps like this:

	$count++ while each %ENV;

    This preceding method will be faster than extracting the
    keys into a temporary array to count them.

    As of a very recent patch, you can say

	$count = keys %ENV;
-- 
    Tom Christiansen      tchrist@cs.colorado.edu       
		    Consultant
	Boulder Colorado  303-444-3212
