tobold.org

correct • elegant • free

△ comp.lang.perl △

◅ Redirecting STDERR to STDOUT

Is perl tail recursive? ▻

split into word and the rest of string

In article <DIEqt7.5Ft@lexmark.com>,  <rbowen@lexmark.com> wrote:
>[ Could someone explain this? ]
>
>       if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))

You don't say which bit you don't understand, so I'll assume it's
everything :-).

The construct `$foo =~ /.../' means "do this regular expression
match on the scalar variable foo".

The regular expression itself, `/^(\S+)\s+(\S+)\s*(.*)/', matches

    ^    at the beginning of the string,
    \S+  a sequence of one or more non space characters ($1), followed by
    \s+  a sequence of one of more space characters, followed by
    \S+  a sequence of one or more non space characters ($2), followed by
    \s*  a sequence of zero of more space characters, followed by
    .*   anything ($3).

Like most things in Perl, a regular expression has a return value:
in a scalar context, it's whether or not the expression matched;
in an array context, it's the array consisting of all the parenthesized
subexpresssions in the regexp, if it matched.

So, if the regular expression matches, it will return an array
consisting of the "interesting" bits of the expression.  (It will
also assign them to the variables $1, $2, and $3, as indicated
above.)  This array is then assigned to the list `($F1, $F2, $Etc)',
which simply assigns to the three named scalars.

If the regexp doesn't match, it returns the undefined value, which
in turn means that each of $F1, $F2, and $Etc will be assigned the
undefined value.

An assignment returns whatever was assigned; in this case either
an array of three elements or the undefined value.  Since the
assignment is in a scalar context (the condition clause of an `if'
statement), an array of three elements decays into the number "3",
which counts as true.  The undefined value stays undefined, which
counts as false.

So the overall effect is:

    if $foo has the desired form, assign the interesting bits to
    the named variables and execute the body of the `if' statement;

    otherwise, nuke the named variables and execute the corresponding
    `else' clause, if any.

As others have pointed out, `split' is usually the first tool that
comes to hand in cases like this.

    if (($F1, $F2, $Etc) = split /\s+/, $foo, 3)

This is not exactly equivalent to the regular expression version.
If $foo begins with whitespace, the regexp won't match (because of
the `^' anchoring the match to the beginning of the string), whereas
split will put the initial whitespace into $F1.  Also, split will
happily "split" $foo even if it only contains one word, whereas
the regexp won't match in that case.  You could check that $F2 is
defined, or check that the assignment expression is `> 1', if this
matters.

Tim.
--
Tim Goodwin   | "After all, what did Brunel, Watt, Boulton and
Unipalm PIPEX | Telford do that was complex?  I could have built
Cambridge, UK | the Great Western Railway on my own." -- Ian Batten

Original headers:

From: tim@pipex.net (Tim Goodwin)
Newsgroups: comp.lang.perl.misc
Subject: Re: split into word and the rest of string
Date: 28 Nov 1995 18:34:40 GMT
Organization: Unipalm PIPEX
Message-ID: <49fko0$in7@wave.news.pipex.net>
References: <DIEqt7.5Ft@lexmark.com>

△ comp.lang.perl △

◅ Redirecting STDERR to STDOUT

Is perl tail recursive? ▻