Perl

08. File and Directory Manipulation


Filehandles

We've seen how to use <STDIN> and <> to read from stdin and other places.

Things in the less-than/greater-than bracketing are known as Filehandles, Perl's way of refering to an open file for reading/writing.


open/close

By default you have STDIN, STDOUT and STDERR, you can open arbitrary other files and name them similarly.

open(IN, "/etc/passwd");        # open file /etc/passwd for reading
                                #  assign to filehandle "IN"

open(IN, "</etc/passwd"):       # same, explicit <

open(OUT, ">blah");             # open blah for writing

open(OUT, ">>blah");            # open blah for writing with append

close(IN); close(OUT);          # shut them down

Using Filehandles

Use them once they are opened any way you like.

open(IN, "/etc/passwd");

while ($line = <IN>) {            # like <> or <STDIN>
   print $line;
}

open (BOO, ">blah");
print BOO "hi there\n";           # note no comma after BOO
close (BOO);

Error Handling

Good idea to check to make sure file opened correctly, users have tendancy to give you bogus names.

if (open(IN, "blah")) {                    unless(open(IN, "blah")) {
   # use IN                                   print "could not open blah!\n";
   close(IN);                              }
}                                          else {
else {                                        # use IN
   print "could not open blah!\n";            close(IN);
}                                          }

Both of above common for C programmers, but Perl has a shorter way.

open(IN, "blah") or die "could not open blah $!";

Pronounce "open file or die".

Can use || version, but precedence might trip you up.

die prints your message on stderr and exits the program. If no newline is attached to message it will fill in extra info about where the program died.

$! is special variable containing "errno". In string context is a message about the error, in numeric context is a numeric value.

For less severe errors you can use warn which does about the same thing but doesn't cause the program to exit.


select

You can use select to set the current output file for print.

open(OUT, ">blah")     or die "cannot open blah, $!";

select(OUT);

print "hello there\n";     # goes to blah

select(STDOUT);

print "hey there\n";       # goes to STDOUT

close(OUT)             or die "cannot close blah, $!

File Tests

Always a good idea to make sure we have permission to do what we want or to have a way to check files for info about them.

File tests similar to shell's "test" are built right into Perl.

Use &&/||/! and/or/not between them for boolean tests.

#!/usr/bin/perl -w

$file = "fitest";

if (-e $file) {
   open(IN, $file)             or die "cannot open $file, $!";
   open(OUT, ">$file.out")     or die "cannot open $file.out, $!";
   @data = <IN>;
   print OUT @data;
   close(IN)                   or die "cannot close $file, $!";
}
else {
   print "$file does not exist!\n";
}

Use special "_" (underscore) to represent "last one tested" to minimize calls to stat().

if (-d $thing && -r $thing && -w $thing) {       # 3 stat()'s
}

if (-d $thing && -r _ && -w _) {                 # 1 stat()
}

You also have "stat" and "lstat" functions which get more detailed info about files.


Filename globbing

You can do filename expansion or globbing with Perl using either angle brackets or the glob function.

@files = </etc/*>;              # files now has all files listed in /etc

@files = glob(/etc/*);          # same

while ($file = </etc/host*>) {  # get each one in turn
   print "file is now $file\n";
}

foreach $file (glob(/etc/host* /etc/passwd*)) {     # two expansions occur
   print "file is now $file\n";
}

Uses csh-style globbing by calling up csh to do the work.


Directory Handles

Can have directory handle like filehandle, even with same name.

Can have scalar x, FH x and dir handle x along with array x and hash x. But you don't want to.

opendir(ETC, "/etc")               or die "cannot open directory /etc, $!";

while ($file = readdir(ETC)) {
   print "file found in /etc: $file\n";
}

closedir(ETC)                      or die "cannot close directory /etc, $!";

Will include dot-files.


File & Directory Functions

Use chdir to change your script's working directory.

chdir("/etc");        # move to /etc

chdir("/etc")    or die "cannot chdir to /etc, $!";

chdir;                # go HOME

Use unlink to remove (a) file(s).

unlink("blah");

$file = "blah";
unlink($file) or warn "could not unlink $file, $!";

@files = qw(file1 file2 file3);
unlink(@files);

unlink(<*.c>);          # globbing

Use chmod to change perms on a file

chmod(0755, "script.pl");     # make script executable

Others:


The Truth about <>

The <> may be called "angle operator", "line input operator" or "diamond operator".

Generally means "read line from standard input", or if surrounding a filehandle "read line from this filehandle". Reads a line AND the newline at end of line, returns. If used in a while loop alone the read data assigned to $_, returns a false value at end of file.

while (<>) {      # read line, assign to $_
   print;               # print $_
}

same as:

while (defined($_ = <STDIN>)) {
   print;
}

Additional magic: if files named on cmdline, will be iterated over by <>, any files named in @ARGV will be read in turn. If none named, then goes to STDIN.

More precisely could say "read record" rather than "read line" because we can reassign special $/ variable to change the delimiter of a "line".

$/ = "<hr>";       # reset to slurp all HTML up to a hr
                   #   tag and return at once (with hr included)