Perl Tutorial: Adapted from Nik Silver from University of Leeds in UK

This tutorial assumes you are running from a UNIX machine or from a windows command prompt.  If you are using Eclipse, see the document Running Perl with  Eclipse.


Perl Tutorial: A basic program

Here is the basic perl program that we'll use to get started.

 
#!/usr/local/bin/perl
#
# Program to do the obvious
#
print 'Hello world.';          # Print a message

Each of the parts will be discussed in turn.

 

If you are using ActiveState Perl for windows, you do not need the first line.  Files are named with a .pl suffix and are executed simply by typing perl programName from the command line (just open up the command prompt window).

Be careful in how you create the files. Sometimes text editors put an extra suffix on the end of your file. This is bad in that the file you think you have and the file you really have do not have the same name. Nothing works. Students have used Notepad successfully. You can do a dir from the command line to check how your files are named. You can do a ren to rename them if they aren't correct.


The first line

On a Unix system, every perl program starts off with this as its very first line:

 
#!/usr/local/bin/perl

This line tells the machine what to do with the file when it is executed (i.e. it tells it to run the file through Perl).


Comments and statements

Comments can be inserted into a program with the # symbol, and anything from the # to the end of the line is ignored (with the exception of the first line).  There are no multiple line comments.

Everything else is a Perl statement which must end with a semicolon, like the last line in the previous example.


Simple printing

The print function outputs some information. In the above case, it prints out the literal string Hello world. and of course the statement ends with a semicolon.

 


Running the program


Type in the example program using a text editor, and save it. On a Unix system, after you've entered and saved the program make sure the file is executable by using the command

 
chmod u+x progname

at the UNIX prompt, where progname is the filename of the program. This is not required for our Windows version of Perl.  Now to run the program, just type any of the following at the prompt.

 
perl progname  // This is sufficient for our Windows version of Perl.
./progname
progname

If something goes wrong then you may get error messages, or you may get nothing. You can always run the program with warnings using the command

 
perl -w progname

at the prompt. This will display warnings and other (hopefully) helpful messages before it tries to execute the program. To run the program with a debugger use the command

 
perl -d progname

 Use the help menu to find out what to do next.


Scalar variables


The most basic kind of variable in Perl is the scalar variable. Scalar variables hold both strings and numbers, and are remarkable in that strings and numbers are completely interchangeable. For example, the statement

 
$priority = 9;                 # notice that variable type is part of the variable name

sets the scalar variable $priority to 9, but you can also assign a string to exactly the same variable:

 
$priority = 'high';

Perl also accepts numbers as strings, like this:

 
$priority = '9';
$default = '0009';

and can still cope with arithmetic and other operations quite happily.

In general variable names consists of numbers, letters and underscores, but they should not start with a number and the variable $_ is special, as we'll see later. Also, Perl is case sensitive, so $a and $A are different.


Operations and Assignment

Perl uses all the usual C arithmetic operators:

 
$a = 1 + 2;    # Add 1 and 2 and store in $a
$a = 3 - 4;    # Subtract 4 from 3 and store in $a
$a = 5 * 6;    # Multiply 5 and 6
$a = 7 / 8;    # Divide 7 by 8 to give 0.875
$a = 9 ** 10;  # Nine to the power of 10
$a = 5 % 2;    # Remainder of 5 divided by 2
++$a;          # Increment $a and then return it
$a++;          # Return $a and then increment it
--$a;          # Decrement $a and then return it
$a--;          # Return $a and then decrement it

and for strings Perl has the following operators (among others):

 
$a = $b . $c;  # Concatenate $b and $c
$a = $b x $c;  # $b repeated $c times

To assign values Perl includes

 
$a = $b;       # Assign $b to $a
$a += $b;      # Add $b to $a
$a -= $b;      # Subtract $b from $a
$a .= $b;      # Append $b onto $a

Note that when Perl assigns a value with $a = $b, it does a deep copy: it makes a copy of $b and then assigns that to $a. Therefore the next time you change $b it will not alter $a.


Interpolation

The following code prints apples and pears using concatenation:

 
$a = 'apples';
$b = 'pears';
print $a.' and '.$b;

It would be nicer to include only one string in the final print statement, but the line

 
print '$a and $b';

prints literally $a and $b which isn't very helpful. Instead we can use the double quotes in place of the single quotes:

 
print "$a and $b";

The double quotes force interpolation of any codes, including interpreting variables. This is a much nicer than our original statement. Other codes that are interpolated include special characters such as newline and tab. The code \n is a newline and \t is a tab when placed inside double quotes.


Exercise 1A

This exercise is to rewrite the Hello world program so that (a) the string is assigned to a variable and (b) this variable is then printed with a newline character. Use the double quotes and don't use the concatenation operator. Make sure you can get this to work before proceeding. Call this file ex1a.pl.

 

Exercise 1B

Write a program to assign a two digit number to a variable, compute the sum of the digits, and output a line such as

The number 14 has digits which sum to 5.  Call this file ex1b.pl.

 


Array variables


A slightly more interesting kind of variable is the array variable which is a list of scalars (i.e. numbers and strings). Array variables have the same format as scalar variables except that they are prefixed by an @ symbol. The statement

 
@food  = ("apples", "pears", "eels");
@music = ("whistle", "flute");

assigns a three element list to the array variable @food and a two element list to the array variable @music.

The array is accessed by using indices starting from 0, and square brackets are used to specify the index. The expression

 
$food[2]

returns eels. Notice that the @ has changed to a $ because eels is a scalar.


Array assignments

As in all of Perl, the same expression in a different context can produce a different result. The first assignment below explodes the @music variable so that it is equivalent to the second assignment.

 
@moremusic = ("organ", @music, "harp");
@moremusic = ("organ", "whistle", "flute", "harp");

This should suggest a way of adding elements to an array. The contents of the array can be printed by a statement such as print "@moremusic \n"; A neater way of adding elements is to use the statement

 
push(@food, "eggs");

which pushes eggs onto the end of the array @food. To push two or more items onto the array use one of the following forms:

 
push(@food, "eggs", "lard");
push(@food, ("eggs", "lard"));
push(@food, @morefood);

The push function returns the length of the new list.

To remove the last item from a list and return it, use the pop function. From our original list the pop function returns eels and @food now has two elements:

 
$grub = pop(@food);    # Now $grub = "eels"

It is also possible to assign an array to a scalar variable. As usual, context is important. The line

 
$f = @food;

assigns the length of @food, but

 
$f = "@food";

turns the list into a string with a space between each element. This space can be replaced by any other string by changing the value of the special $" variable. This variable is just one of Perl's many special variables, most of which have odd names.  This variable is called the list separator variable.  When you set it, it becomes the character (or string) that is used to separate items in a list (rather than a space).  For example, $"='#'; causes a printed list to be separated by the '#' character.

Arrays can also be used to make multiple assignments to scalar variables:

 
($a, $b) = ($c, $d);           # Same as $a=$c; $b=$d;
($a, $b) = @food;              # $a and $b are the first two
                               # items of @food.
($a, @somefood) = @food;       # $a is the first item of @food
                               # @somefood is a list of the others.
(@somefood, $a) = @food;       # @somefood is @food and $a is undefined.

The last assignment occurs because arrays are greedy, and @somefood will swallow up as much of @food as it can. Therefore that form is best avoided.

Finally, you may want to find the index of the last element of a list. To do this for the @food array use the expression

 
$#food   # which is just one less than the size of the array

Displaying arrays

Since context is important, it shouldn't be too surprising that the following all produce different results:

 
print @food;   # By itself
print "@food"; # Embedded in double quotes
print @food.""; # In a scalar context

Exercise 2A

Try out each of the above three print statements to see what they do.  Try three different ways of adding an element to an array. Call the file you create ex2a.pl.

 

Exercise 2B

Create an array of names of your favorite cartoon characters (up to ten of them).  Print them out in a line such as

I have three favorite cartoon characters.  They are snoopy dagwood porky pig.

Note, the “three” is not to be hard-coded, but  is to be derived from the list.  Call the file you create ex2b.pl.

 


File handling


Here is the basic perl program which does the same as the UNIX cat command on a certain file.  Create a file named myFile.

 
#!/usr/local/bin/perl
#
# Program to open the file, read it in,
# print it, and close it again.
 
$file = 'myFile';              # Name the file
open(INFO, $file);             # Open the file
@lines = <INFO>;               # Read it into an array
close(INFO);                   # Close the file
print @lines;                  # Print the array

The open function opens a file for input (i.e. for reading). The first parameter is the filehandle which allows Perl to refer to the file in future. The second parameter is an expression denoting the filename. If the filename was given in quotes then it is taken literally without shell expansion. So the expression '~/notes/todolist' will not be interpreted successfully. If you want to force shell expansion then use angled brackets: that is, use <~/notes/todolist> instead.  Notice, that you will use the forward slash as a directory symbol.  Something like <../grade/class> also works.

The close function tells Perl to finish with that file.

There are a few useful points to add to this discussion on filehandling. First, the open statement can also specify a file for output and for appending as well as for input. To do this, prefix the filename with a > for output and a >> for appending:

 
open(INFO, $file);     # Open for input
open(INFO, ">$file");  # Open for output
open(INFO, ">>$file"); # Open for appending
open(INFO, "<$file");  # Also open for input

Second, if you want to print something to a file you've already opened for output then you can use the print statement with an extra parameter. To print a string to the file with the INFO filehandle use

 
print INFO "This line goes to the file.\n";

You can change the character that is printed at the end of each print statement by setting the special variable $\.  For example, to get a newline at the end of each print statement, you would set  $\="\n";

 

Third, you can use the following to open the standard input (usually the keyboard) and standard output (usually the screen) respectively:

 
open(INFO, '-');       # Open standard input
open(INFO, '>-');      # Open standard output
 
With Eclipse, standard input is entered via the console window. The file is terminated with control-z.  Standard output is written to the console window.

In the above program the information is read from a file. The file is the INFO file and to read from it Perl uses angled brackets. So the statement

 
@lines = <INFO>;

reads the file denoted by the filehandle into the array @lines. Note that the <INFO> expression reads in the file entirely in one go. This because the reading takes place in the context of an array variable. If @lines is replaced by the scalar $lines then only the next one line would be read in. In either case, each line is stored complete with its newline character at the end.


Exercise 3A

Modify the program (at the beginning of this section) to read a file named myFile so that the entire file is printed with a # symbol at the beginning of each line. You should only have to add one line and modify another.   Call the file ex3a.pl. Use the $" variable to specify the separator character in a list. 

 

Exercise 3b

Write the program ex3b.pl which reads all the lines of a file called “second.dat” and appends it onto file “first.dat”.

 

 


Control structures


More interesting possibilities arise when we introduce control structures and looping. Perl supports lots of different kinds of control structures which tend to be like those in C. Here we discuss a few of them.


foreach

To go through each line of an array or other list-like structure (such as lines in a file) Perl uses the foreach structure. This has the form

 
foreach $morsel (@food)               # Visit each item in turn
                               # and call it $morsel
{
        print "$morsel\n";     # Print the item
        print "Yum yum\n";     # That was nice
}

The actions to be performed each time are enclosed in a block of curly braces. The first time through the block $morsel is assigned the value of the first item in the array @food. Next time it is assigned the value of the second item, and so on until the end. If @food is empty to start with, then the block of statements is never executed.


Booleans

The next few structures rely on a test being true or false. In Perl, any non-zero number and non-empty string is counted as true. The number zero, zero by itself in a string, and the empty string are counted as false. Here are some tests on numbers and strings.

 
$a == $b               # Is $a numerically equal to $b?
                       # Beware: Don't use the = operator.
$a != $b               # Is $a numerically unequal to $b?
$a eq $b               # Is $a string-equal to $b?
$a ne $b               # Is $a string-unequal to $b?

You can also use logical and, or and not:

 
($a && $b)             # Is $a and $b true?
($a || $b)             # Is either $a or $b true?
!($a)                  # is $a false?

for

Perl has a for structure that mimics that of C. It has the form

 
for (initialize; test; inc)
{
         first_action;
         second_action;
         etc
}

First of all the statement initialize is executed. Then while test is true the block of actions is executed. After each time the block is executed inc takes place. Here is an example for loop to print out the numbers 0 to 9.

 
for ($i = 0; $i < 10; ++$i)    # Start with $i = 1
                               # Do it while $i < 10
                               # Increment $i before repeating
{
        print "$i\n";
}

while and until

Here is a program that reads some input from the keyboard and won't continue until it is given the correct password

 
print "Password? ";            # Ask for input
$a = <STDIN>;                  # Get input
chop $a;                       # Remove the newline at end
while ($a ne "fred")           # While input is wrong...
{
    print "sorry. Again? ";    # Ask again
    $a = <STDIN>;              # Get input again
    chop $a;                   # Chop off newline again
}

The curly-braced block of code is executed while the input does not equal the password. The while structure should be fairly clear, but this is the opportunity to notice several things. First, we can we read from the standard input (the keyboard) without opening the file first. Second, when the password is entered, $a is given that value including the newline character at the end. The chop function removes the last character of a string which, in this case, is the newline.

To test the opposite thing, we can replace the word while with until statement in just the same way. This executes the block repeatedly until the expression is true, not while it is true.

Another useful technique is putting the while or until check at the end of the statement block rather than at the beginning. This will require the presence of the do operator to mark the beginning of the block and the test at the end. If we forgo the sorry. Again message in the above password program then it could be written like this.

 
do
{
        print "Password? ";            # Ask for input
        $a = <STDIN>;          # Get input
        chop $a;               # Chop off newline
}
while ($a ne "fred")           # Redo while wrong input

Exercise 4A

Modify the program from the previous exercise so that each line of the input file (myFile) is read in one by one and is output with a line number at the beginning.  Call the program ex4a.pl. 

You may find it useful to use the structure

 
while ($line = <INFO>)
{
        ...
}
 

Exercise 4b

Write program ex4b.pl which repeatedly prompts a user for how many tickets he would like to buy.  When all the tickets are gone, print an informative message and stop.  Input the total number of tickets from the command line.  Use the x operator to print out a crude bar chart of the number of tickets each person bought.  So for example, if people bought 10, 5, 4, 4, 3, and 1,  you would output

 

tttttttttt

ttttt

tttt

tttt

ttt

t

Note, from Eclipse, the standard output and standard input don’t interleave properly, so you won’t see the prompts until after you have entered the values.  From the command line, standard input and output act as described above.


Conditionals


Of course, Perl also allows if/then/else statements. These are of the following form:

 
if ($a)   # Notice the parentheses are required
{         # Notice the braces are required 
        print "The string is not empty\n";
}
else
{
        print "The string is empty\n";
}

For this, remember that an empty string is considered to be false. It will also give an "empty" result if $a is the string 0.

It is also possible to include more alternatives in a conditional statement:

 
if (!$a)                       # The ! is the not operator
{
        print "The string is empty\n";
}
elsif (length($a) == 1)               # If above fails, try this.  Note “else if” is not allowed
{
        print "The string has one character\n";
}
elsif (length($a) == 2)               # If that fails, try this
{
        print "The string has two characters\n";
}
else                           # Now, everything has failed
{
        print "The string has lots of characters\n";
}

In this, it is important to notice that the elsif statement really does have an "e" missing.

 

The Perl Creed is, "There is more than one way!" This noble freedom of expression however results in the first of the Perl Paradoxes: Perl programs are easy to write but not always easy to read. For example, the following lines are equivalent!

    if ($x == 0) {$y = 10;}  else {$y = 20;}

    $y = $x==0 ? 10 : 20;

    $y = 20;  $y = 10 if $x==0;

    unless ($x == 0) {$y=20} else {$y=10}

    if ($x)  {$y=20} else {$y=10}

    $y = (10,20)[$x != 0];

 


Exercise 5A

From the previous exercise you should have a program which prints out an input file (myFile) with line numbers.  Call the program you create ex5a.pl.  Alter the program so that line numbers aren't printed or incremented if a line is blank, but every line is still printed, including the blank ones. Remember that when a line of the file is read in, it will still include its newline character at the end.

 

Exercise 5B

Write ex5b.pl  which writes a letter to an applicant based on their gpa and gender.  Assume you have variables which contain $surname, $gpa, and $isMale (as a Boolean).    You will write a letter such as

Dear Mr. (Ms.) Parker

We are delighted (regret) to inform you that we have received your application.  

We will (not) be in touch with you in the future about a job with our company.

Sincerely,

The management

The following people will be hired: gpa > 3.0 or people having surnames “Bush”. 

 

 


String matching


One of the most useful features of Perl (if not the most useful feature) is its powerful string manipulation facilities. At the heart of this is the regular expression (RE) which is shared by many other UNIX utilities.


Regular expressions

A regular expression is contained in slashes, and matching occurs with the =~  operator. The following expression is true if the string the appears in variable $sentence.

 
$sentence =~ /the/

The RE is case sensitive, so if

 
$sentence = "The quick brown fox";

then the above match will be false. The operator ! is used for spotting a non-match. In the above example

 
$sentence !~ /the/

is true because the string the does not appear in $sentence.


The $_ special variable

We could use a conditional as

 
if ($sentence =~ /under/)
{
        print "We're talking about rugby\n";
}

which would print out a message if we had either of the following

 
$sentence = "Up and under";
$sentence = "Best winkles in Sunderland";

But it's often much easier if we assign the sentence to the special variable $_ which is of course a scalar. If we do this then we can avoid using the match and non-match operators and the above can be written simply as

 
if (/under/)   # note that we have an understood argument and operator.
{
        print "We're talking about rugby\n";
}

The $_ variable is the default for many Perl operations and tends to be used very heavily.


More on REs

In an RE, there are plenty of special characters, and it is these that both give regular expressions their power and make them appear very complicated. It's best to build up your use of REs slowly; their creation can be something of an art form.

Here are some special RE characters and their meaning

 
.       # Any single character except a newline
^       # The beginning of the line or string
$       # The end of the line or string
*       # Zero or more of the last character
+       # One or more of the last character
?       # Zero or one of the last character

and here are some example matches. Remember that regular expressions should be enclosed in /.../ slashes to be used.

 
t.e     # t followed by anything followed by e
        # This will match the
        #                 tre
        #                 tle
        # but not te
        #         tale
^f      # f at the beginning of a line
^ftp    # ftp at the beginning of a line
e$      # e at the end of a line
tle$    # tle at the end of a line
und*    # un followed by zero or more d characters
        # This will match un
        #                 und
        #                 undd
        #                 unddd (etc)
.*      # Any string without a newline. This is because
        # the . matches anything except a newline and
        # the * means zero or more of these.
^$      # A line with nothing in it.

There are even more options. Square brackets are used to match any one of the characters inside them. Inside square brackets a hyphen indicates "between" and a ^ at the beginning means "not":

 
[qjk]          # Either q or j or k
[^qjk]         # Neither q nor j nor k
[a-z]          # Anything from a to z inclusive
[^a-z]         # No lower case letters
[a-zA-Z]       # Any letter
[a-z]+         # Any non-zero sequence of lower case letters

A vertical bar | represents an "or" and parentheses (...) can be used to group things together:

 
jelly|cream    # Either jelly or cream
(eg|le)gs      # Either eggs or legs
(da)+          # Either da or dada or dadada or...

Here are some more special characters:

 
\n             # A newline
\t             # A tab
\w             # Any alphanumeric (word) character.
               # The same as [a-zA-Z0-9_]
\W             # Any non-word character.
               # The same as [^a-zA-Z0-9_]
\d             # Any digit. The same as [0-9]
\D             # Any non-digit. The same as [^0-9]
\s             # Any whitespace character: space, tab, newline, etc
\S             # Any non-whitespace character
\b             # A word boundary, use outside [] only
\B             # No word boundary

Clearly, characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to precede it by a backslash. So:

\|             # Vertical bar
\[             # An open square bracket
\)             # A closing parenthesis
\*             # An asterisk
\^             # A carat symbol
\/             # A slash
\\             # A backslash

and so on.


Some example REs

As was mentioned earlier, it's probably best to build up your use of regular expressions slowly. Here are a few examples. Remember that to use them for matching they should be put in /.../ slashes

 
[01]           # Either "0" or "1"
\/0            # A division by zero: "/0"
\/ 0           # A division by zero with a space: "/ 0"
\/\s0          # A division by zero with a whitespace:
               # "/ 0" where the space may be a tab etc.
\/ *0          # A division by zero with possibly some
               # spaces: "/0" or "/ 0" or "/  0" etc.
\/\s*0         # A division by zero with possibly some whitespace.
\/\s*0\.0*     # As the previous one, but with decimal
               # point and maybe some 0s after it. Accepts
               # "/0." and "/0.0" and "/0.00" etc and
               # "/ 0." and "/  0.0" and "/   0.00" etc.

More Lists

There are a number of practical construction operators for lists: 

The first, a shorthand notation for a range of numbers, is written as ($x..$y), and makes a list that consists of numbers ranging from $x to $y in steps of 1 (assuming, of course, that $x is smaller than $y). For example: 

@x = (1..6); # same as (1, 2, 3, 4, 5, 6)
@z = (2..5,8,11..13); # same as (2,3,4,5,8,11,12,13)

The second operator is the qw() ("quote word") function, which allows you to list a bunch of strings without the quotes and comma's (only separated by whitespace), and make it into a list. For example: qw(Jan Piet Marie) is a shorter notation for ("Jan","Piet","Marie"). 

A third operation we need is the split function. It takes a regular expression and a string, and splits the string into a list, breaking it into pieces at places where the regular expression matches. 

$string = "Jan Piet\nMarie \tDirk";
@list = split /\s+/, $string; # yields ( "Jan","Piet","Marie","Dirk" )

$string = " Jan Piet\nMarie \tDirk\n"; # watch out, empty string at the begin and end!!!
@list = split /\s+/, $string; # yields ( "", "Jan","Piet","Marie","Dirk", "" )

$string = "Jan:Piet;Marie---Dirk"; # use any regular expression... 
@list = split /[:;]|---/, $string; # yields ( "Jan","Piet","Marie","Dirk" )

$string = "Jan Piet"; # use an empty regular expression to split on letters 
@letters= split //, $string; # yields ( "J","a","n"," ","P","i","e","t")

This will turn out to be very useful for processing lines of text and their text fields using all kinds of field separators. split can do a lot of common parsing tasks for us. 

"join" does the opposite of "split":
@personal = ("Caine", "Michael", "Actor", "14, Leafy Drive");
$bigstring = join(":",@personal);

 

As practice, try reading the following example (split.pl):

@DAYS = ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday');

@MONTHS = ('January', 'February', 'March', 'April', 'May', 'June',

           'July', 'August', 'September', 'October', 'November', 'December');

print "please input date of birth as mm-dd-yyyy";

$birth = <>;

@list = split/[-. ]/,$birth;

print "@list";

$\="\n";

print "You were born on $list[1] of  $MONTHS[$list[0]-1] in $list[2]";

print "Was that a $DAYS[5]?";


Exercise 6A

Previously your program counted non-empty lines. Alter it so that instead of counting all non-empty lines it counts only lines with any of the following:

·the letter x

·the string the

·the string right which may or may not have a capital r

·word me with or without a capital. Use \b to detect word boundaries.

In each case the program should print out every line, but it should only number those specified. Try to use the $_  variable to avoid using the =~  match operator explicitly.  Call the program you create ex6a.pl.

 

 

 

 


Substitution and translation


As well as identifying regular expressions, Perl can make substitutions based on those matches. The way to do this is to use the s function which is designed to mimic the way substitution is done in the vi text editor. Once again, the match operator is used, and once again if it is omitted then the substitution is assumed to take place with the $_ variable.

To replace an occurrence of london by London in the string $sentence we use the expression

 
$sentence =~ s/london/London/

and to do the same thing with the $_ variable just

 
s/london/London/

Notice that the two regular expressions (london and London) are surrounded by a total of three slashes. The result of this expression is the number of substitutions made, so it is either 0 (false) or 1 (true) in this case.


Options

This example only replaces the first occurrence of the string, and it may be that there will be more than one such string we want to replace. To make a global substitution the last slash is followed by a g as follows:

 
s/london/London/g

which of course works on the $_ variable. Again the expression returns the number of substitutions made, which is 0 (false) or something greater than 0 (true).

If we want to also replace occurrences of lOndon, lonDON, LoNDoN and so on then we could use

 
s/[Ll][Oo][Nn][Dd][Oo][Nn]/London/g

but an easier way is to use the i option (for "ignore case"). The expression

 
s/london/London/gi

will make a global substitution ignoring case. The i option is also used in the basic /.../ regular expression match.


Remembering patterns

It's often useful to remember patterns that have been matched so that they can be used again. It just so happens that anything matched in parentheses gets remembered in the variables $1,...,$9. These strings can also be used in the same regular expression (or substitution) by using the special RE codes \1,...,\9. For example

 
$_ = "Word Whopper of Fibbing";
s/([A-Z])/:\1:/g;
print "$_\n";

will replace each upper case letter by that letter surrounded by colons. It will print :W:ord :W:hopper of :F:ibbing. The variables $1,...,$9 are read-only variables; you cannot alter them yourself.

As another example, the test

 
if (/(\b.+\b) \1/)
{
        print "Found $1 repeated\n";
}

will identify any words repeated. Each \b represents a word boundary and the .+ matches any non-empty string, so \b.+\b matches anything between two word boundaries. This is then remembered by the parentheses and stored as \1 for regular expressions and as $1 for the rest of the program.

The following swaps the first and last characters of a line in the $_ variable:

 
s/^(.)(.*)(.)$/\3\2\1/

The ^ and $ match the beginning and end of the line. The \1 code stores the first character; the \2 code stores everything else up the last character which is stored in the \3 code. Then that whole line is replaced with \1 and \3 swapped round.

After a match, you can use the special read-only variables $` and $& and $' to find what was matched before, during and after the search. So after

 
$_ = "Lord Whopper of Fibbing";
/pp/;

all of the following are true. (Remember that eq is the string-equality test.)

 
$` eq "Lord Who";
$& eq "pp";
$' eq "er of Fibbing";

Finally on the subject of remembering patterns, it's worth knowing that inside of the slashes of a match or a substitution variables are interpolated. So

 
$search = "the";
s/$search/xxx/g;

will replace every occurrence of the with xxx. If you want to replace every occurrence of there then you cannot do s/$searchre/xxx/ because this will be interpolated as the variable $searchre. Instead you should put the variable name in curly braces so that the code becomes

 
$search = "the";
s/${search}re/xxx/;

Translation

The tr function allows character-by-character translation. The following expression replaces each a with e, each b with d, and each c with f in the variable $sentence. The expression returns the number of substitutions made.

 
$sentence =~ tr/abc/edf/

Most of the special RE codes do not apply in the tr function. For example, the statement here counts the number of asterisks in the $sentence variable and stores that in the $count variable.

 
$count = ($sentence =~ tr/*/*/);

However, the dash is still used to mean "between". This statement converts $_ to upper case.

 
tr/a-z/A-Z/;
 

Exercise 6B

 Write a program ex6b.pl  which reads a file (myFile) and outputs only those lines which

1.             are strings that do not contain white space, or

2.             are strings containing only c’s and d’s and white space, or

3.             are strings with exactly one word regardless of white space, or

4.             are strings that end with the same character they start with, or

5.             are strings with a’s and b’s but the  number of a's are even, or

6.             any string ending with a z

 

In each case, output the rule number(s) which allowed the line to be printed.  A line may be printed more than once if it qualifies for printing under different rules.  Be sure to chop the newline character that is put on the end of each line of the file. So, for example, your output might contain (among other things):

 

2 cc dd cd cd cddd

1 3 4 5 aababa

6 don’t go to sleep, pleazzzzz

Hint: you may need to use ^ and $ to make sure each line matches NOTHING but what is asked for. 

Exercise 6C

 

 Write a program that recognizes palindromes (words or phrases that read the same in both directions). Deal with one-letter words, and be permissive with whitespace and punctuation, so that the program behaves as specified by the example below.

Type a word or phrase: computer
Not a palindrome.
Type a word or phrase: level
This is a palindro