Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: reading a file into an array. mapfile? (f)?



Thanks! Works great.

Using mapfile and (f) on zsh's configure script takes 0.189s (which includes loading zsh and mapfile) while doing this via a read loop takes over a minute.

Given this, I find this wording in  zshmodules a little misleading:

       Thus  it should not automatically be assumed that use of mapfile repre‐
       sents a gain in efficiency over use of other mechanisms. 

Ok. I won't assume it; I will just make use of its speedup over a read loop.

Before posting I tried googling for this and didn't turn up anything. Since this is so simple and I think common (perhaps more common that the case where one a file as a single long string) possibly this might be mentioned in the mapfile doc?

I sort of agree with this comment in zshmodules:
       It  is  unfortunate that the mechanism for loading modules does not yet
       allow the user to specify the name of the shell parameter to  be  given
       the special behaviour.

Here's how it is done in Ruby which is extremely simple: if there is an associative array SCRIPT_LINES__ defined file lines are saved into this array when it reads a file. So translating to zsh-speak:

  typeset -A SCRIPT_LINES___
turns on saving file lines and
  unset SCRIPT_LINES__
turns it off. (It's off by default.)

At any rate, I guess I no longer have an excuse for implementing file listing in zshdb, so I guess that's next up.

Any thoughts on how to get checksum information? I can shell out to "sum" or "md5sum". But given I have the file data as a string if there is a solution usesi zsh only, that is preferable.


On Thu, Sep 18, 2008 at 12:44 AM, Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
On Sep 17, 10:53pm, Rocky Bernstein wrote:
} Subject: reading a file into an array. mapfile? (f)?
}
} I'd like to a read a file (a zsh script file) into an array fast.

Ending up with what, one line per array entry?  I'm guessing so since
you mention the (f) expansion flag.

} [...] I also know about mapfile which reads the file and turns it
} into a single long zsh string. Question: if the underlying file
} changes, what does mapfile do? Update its data? Keep the original?
} Show something which is indeterminant?

When you reference a hash key in the mapfile hash, zsh calls mmap()
to access the file contents, but immediately allocates enough memory
to contain the data and copies into it.  The file is then unmapped.
This is done because parameter values are stored with zsh's internal
"metafication" already applied, and it's obviously not possible to
metafy the file in place.

If the file is modified during the brief period when zsh has it mapped
and is copying it, you could get indeterminate results.  It probably
depends on the system's mmap() implementation.  After the file has been
copied, zsh no longer pays attention to it.

If you assign a value to a field in the mapfile hash, zsh attempts
to mmap() the the corresponding disk file for writing, and whatever
you assigned replaces the file contents by way of msync().  You can
(I think) assign to slices of the file, but nothing magical is done,
so the entire file is rewritten unless the msync() implementation is
clever.

} There is also the zsh parameter expansion operator (f) "a shorthand
} for 'pws:\n:'". But I don't see how to use that with either mapfile or
} input redirection to save this into an array variable short of putting
} this in a loop

It's much simpler than you seem to believe:

lines=( ${(f)mapfile[/path/to/file]} )

Splitting up /etc/termcap this way (17890 lines on my system) takes
a little less than 0.08 seconds on my 3GHz Pentium 4.  Fully parsing
termcap into "shell words" with (z) takes about 0.13 seconds.  For
/usr/share/dict/words (479829 lines), (f) takes about 0.8 seconds but
(z) takes almost 13 seconds.




Messages sorted by: Reverse Date, Date, Thread, Author