tobold.org

correct • elegant • free

△ comp.unix.shell △

◅ copying stderr to file?

Using date as a file name ▻

remove whitespace and comments

>Alan Delucia <ad2165@cnsvax.albany.edu> wrote:
>> I am unsure how to write a script that will look through a source code
>> file and remove all of its comments.  Any where in a file that starts
>> with /* or // have to be replaced with a space.

There are two problem statements here, which are contradictory.
Removing comments is more subtle than you might think; the intended
clarification in the second sentence doesn't really get us very far.

In article <87rfli02kbf@enews4.newsguy.com>,
Dan Kappus  <danka@photobooks.photobooks.com> wrote:
>Using sed
>
>s/(\/\*.*$)|(\/\/.*$)//g

Did you try this?  Its most obvious problem is that sed supports only
Basic Regular Expressions, so you can't use alternation (`|'). (Or, in
some versions of sed, you can but you must say `\|'.)

Other nits are that the parentheses are redundant (and would be, even if
sed did support alternation), as are the end-of-line anchoring dollars
(since `.*' will always match to the end of the line).  An optional
improvement is to use something other than `/' to delimit the regular
expressions.  Rewriting to fix all these, we get the following.

    sed 's,/\*.*,,;s,//.*,,/' hello.c

The second half works well for `//' comments.

Unfortunately, dealing with `/*' comments is trickier than this.
Consider the following C code.

    /* this is
    a multiline comment */

    /* const */ int x = 7 -/* oops! */-3;

The last line illustrates a couple of points.  There can be multiple
comments on a single line, intermingled with the code---a simple
"greedy" regular expression like `.*' will not handle this correctly.
In ANSI C, comments have the semantics of white space: completely
eliding the second comment on the last line would result in a syntax
error.

Here's a Perl version that handles this input correctly.  I'll leave the
sed version for Ken :-).

perl -pe '$c = 0 if $c and s,.*?\*/,,; s,/\*.*?\*/, ,g; $c=1 if s,/\*.*,,'

Tim.
--
Tim Goodwin   | "If you don't know what closures are, you probably don't
Leicester, UK | want to know what closures are." -- Larry Wall

Original headers:

From: tjg@star.le.ac.uk (Tim Goodwin)
Newsgroups: comp.unix.shell
Subject: Re: remove whitespace and comments
Date: 9 Feb 2000 12:43:36 -0000
Message-ID: <87rnee$q43$1@ltpcg.star.le.ac.uk>
References: <38A0F2B9.35C3C455@cnsvax.albany.edu>
  <87rfli02kbf@enews4.newsguy.com>

△ comp.unix.shell △

◅ copying stderr to file?

Using date as a file name ▻