tobold.org

correct • elegant • free

△ comp.unix.shell △

◅ Shell script help

Better cron output handling? ▻

grep

In article <38E9EBBF.454EB4DC@nc.prestige.net>,
Michael Austin  <maustin@nc.prestige.net> wrote:
>I have been using a real OS for the last 15 years where the SEARCH

Best to lower that sarcasm level a notch or two, when you're asking for
help...

>grep 'THIS STRING' filename | grep word | grep another > outfile
>
>Will  this all be done with a single pass through the file?

No, but the second grep will only see the output from the first.  You
can minimize the work done by ordering them so that the string least
commonly occurring in the file is found by the first grep.

grep's forté is searching for regular expressions (the name is
actually an acronym for Global Regular Expression Print).  If your
word matches are always in the same order on the line, it's
trivial to write a regular expression to match them all: all you
need to know is that the regular expression `.*' matches any
string.

So to use `grep' to print all lines which contain all the strings `THIS
STRING', `word', and `another', but only in that order:

    grep 'THIS STRING.*word.*another' filename

To relax the ordering requirement is harder; I wouldn't bother till
you've measured the performance of the pipeline you show, and it doesn't
meet your requirements.

>Will  this all be done with a single pass through the file?  If so, I
>can use it, if not... remembering that the files I will be using this on
>are 150-250Mb and > 1.5Million lines of 132 columns (actually they are
>variable length, but the max is 132).

On my 200MHz PC, GNU grep takes about 6 CPU seconds to search a 250M
file.  Not too bad, for a fake OS...

Tim.
--
Tim Goodwin   | "If you don't know what closures are, you probably don't
Leicester, UK | want to know what closures are." -- Larry Wall

Original headers:

From: tjg@star.le.ac.uk (Tim Goodwin)
Newsgroups: comp.unix.shell
Subject: Re: grep
Date: 4 Apr 2000 15:49:33 +0100
Message-ID: <8ccvej$oks$1@ltpcg.star.le.ac.uk>
References: <38E9EBBF.454EB4DC@nc.prestige.net>

△ comp.unix.shell △

◅ Shell script help

Better cron output handling? ▻