tobold.org

correct • elegant • free

△ comp.unix.shell △

◅ Cargo Cult programmers?

This Week's Useless Use of Cat Award goes to... ▻

filesize -> variable

In article <949594968.16527.0.nnrp-10.c2deb51d@news.demon.co.uk>,
Adam Price <adam+usenet@pappnase.demon.co.uk> wrote:
>Get your attributions right, the first line you quote is Kens not
>mine.

That's why it's got an extra '>'.  Sorry if I confused anyone.

>This from the man page from stat on Tru64
>
>  int stat(const char *path,struct stat *buffer );
>
>suggests to me that wc would need a file name to use if it were
>to call stat.

Sure, but just a few lines down you'll see fstat().

    int fstat(int filedes, struct stat *buffer);

>The comment was intended to ask if wc in the form Ken used was
>really more efficient than the form without the redirect, and
>if so how.

Yes, it really is.

It's actually a little more subtle than just using fstat() and reporting
st.st_size.  In fact, GNU wc uses fstat() to establish if the file is a
regular file; if so, it uses lseek() to find the current position in the
file, and the end of the file, and reports the difference.

Unfortunately, Tru64 doesn't have a system call tracing command (that I
know of; if I'm wrong, *please* tell me!).  But where you do, it's easy
to see this.  Note that S_IFREG is stat()'s way of saying that standard
input is a regular file.

    ; strace wc -c < /etc/group
    ...
    fstat(0, {st_mode=S_IFREG|0555, st_size=730, ...}) = 0
    _llseek(0, 0, [0], SEEK_CUR)            = 0
    _llseek(0, 0, [730], SEEK_END)          = 0

This extra wrinkle is necessary for the case, which can only occur for
standard input, when part of the file has already been consumed by
another process.

    ; wc -c < /etc/group
        730
    ; { perl -e 'sysread(STDIN, $x, 123)'; wc -c } < /etc/group
        607

>PS The red herring about UUOC wouldn't make my question more or less
>correct.

It was partly an (obviously poor) attempt at humour, but also to
emphasise this difference between a pipe and a redirect.  Again, a
system call trace makes it quite clear: this time fstat() says that
standard input is a pipe (S_IFIFO), so wc actually has to read() it.

    ; cat /etc/group | strace wc -c
    ...
    fstat(0, {st_mode=S_IFIFO|0600, st_size=730, ...}) = 0
    read(0, "system:*:0:root,cgp,phi,swp,rgg,"..., 16384) = 730
    read(0, "", 16384)                      = 0

Tim.
--
Tim Goodwin   | "If you don't know what closures are, you probably don't
Leicester, UK | want to know what closures are." -- Larry Wall

Original headers:

From: tjg@star.le.ac.uk (Tim Goodwin)
Newsgroups: comp.unix.shell
Subject: Re: filesize -> variable
Date: 4 Feb 2000 12:45:28 -0000
Message-ID: <87ehlu$af0$1@ltpcg.star.le.ac.uk>
References: <3897B1E6.AE5CCAFD@email.dk>
  <949582467.19012.2.nnrp-11.c2deb51d@news.demon.co.uk>
  <87bv21$5fr$1@ltpcg.star.le.ac.uk>
  <949594968.16527.0.nnrp-10.c2deb51d@news.demon.co.uk>

△ comp.unix.shell △

◅ Cargo Cult programmers?

This Week's Useless Use of Cat Award goes to... ▻