tobold.org

correct • elegant • free

△ comp.mail.* △

◅ Wanted: base64 {de,en}coding code

MIME and "Allow 8-bit" in Netscape Nav. 3.01 ▻

ms-word files in unix

In article <45pt81$gbq@ob1.uws.EDU.AU>, Rachel Polanskis
<r.polanskis@nepean.uws.edu.au> wrote:
>Hasn't anyone considered getting the Word API or file format and creating
>a simple viewer?

If you want a *simple* viewer, perhaps this will suffice?

    #! /usr/local/bin/perl5

    undef $/;
    $_=<>;

    @z=/
            [\s\040-\176]{3,}               (?# At least three ASCII chars )
            (?:
                    ..?                     (?# One or two noise chars )
                    [\s\040-\176]{3,}       (?# ... as long as 3 more follow )
            )*
    /sgx;                                   # s= treat as one line

    # Word uses \r for paragraph break and \f for section break
    # Various other strings appear in the text.
    $_ = join '', grep (/[\r\f]/, @z);

    # It uses ^s for another sort of break, it appears.
    s/\cS/\n/g;
    s/[\r\f]+/\n\n/g;
    print;

Needless to say, this doesn't handle embedded Excel spreadsheets, or...

Tim.
--
Tim Goodwin   | "I'm not sure that displaying knowledge of MS-DOS
Unipalm PIPEX | arcana is any mark of virtue." -- Steve Summit

Original headers:

From: tim@pipex.net (Tim Goodwin)
Newsgroups: comp.unix.dos-under-unix,comp.mail.mime
Subject: Re: ms-word files in unix
Date: 16 Oct 1995 16:38:21 GMT
Organization: Unipalm PIPEX
Message-ID: <45u1pt$hrl@wave.news>
References: <446lnu$pj1@ferrari.mst6.lanl.gov> <DFH3y5.GzA@boi.hp.com>
  <45pt81$gbq@ob1.uws.EDU.AU>

△ comp.mail.* △

◅ Wanted: base64 {de,en}coding code

MIME and "Allow 8-bit" in Netscape Nav. 3.01 ▻