Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Using file lines as "input files"
- X-seq: zsh-users 27879
- From: Mikael Magnusson <mikachu@xxxxxxxxx>
- To: dominik.vogt@xxxxxx, Zsh Users <zsh-users@xxxxxxx>
- Subject: Re: Using file lines as "input files"
- Date: Sat, 9 Jul 2022 04:21:37 +0200
- Archived-at: <https://zsh.org/users/27879>
- In-reply-to: <Ysi7JTBpUXK/F8v/@gmx.de>
- List-id: <zsh-users.zsh.org>
- References: <Ysiacl2I2eYF+uY4@gmx.de> <CAH+w=7betuB2tG+RrkYiqsQ66_3xKwfK1kSkVyY-SBRcVjqhvQ@mail.gmail.com> <Ysi7JTBpUXK/F8v/@gmx.de>
On 7/9/22, Dominik Vogt <dominik.vogt@xxxxxx> wrote:
> On Fri, Jul 08, 2022 at 03:04:31PM -0700, Bart Schaefer wrote:
>> On Fri, Jul 8, 2022 at 1:58 PM Dominik Vogt <dominik.vogt@xxxxxx> wrote:
>> >
>> > Disclaimer: I _know_ this can be done in seconds with perl /
>> > python, but I like to not rely on scripting languages when the
>> > shell can do the job.
>>
>> This is sort of like saying "I like to not rely on hiking boots when
>> shoes can do the job."
>
> Actually, for me, scripting languages are the "shoes" because they
> don't interact very well with the command pipeline, unless you
> spend an absurd amount of work to make them do so. Calling
> commands for everything can be slower, but most of the time it's
> just a symptom of bad scripting. GNU coreutils are faster than
> anything I'll ever be willing to code (or any perl or python
> script or C or C++ library for that matter). The trick is keeping
> the process spawning overhead low.
>
>> > $ chksum Fline1 Fline2 Fline3 ... Fline265000
>> >
>> > (Of course without actually splitting the input file
>>
>> If "not actually splitting" means what it seems to mean, and you
>> literally want to run cksum, the answer is no.
>
> Right.
>
> This does the job pretty well, relying entirely on existing Unix
> tools:
>
> ulimit -s 100000
> split -l 1 "$INPUTF" ff
> cksum ff*
> rm ff*
>
> That cuts runtime down to seven seconds instead of four minutes,
> at the cost of a fem hunred MB on the RAM disk. Splitting the
> source file and removing the fragments takes about three to four
> seconds.
>
> Thanks for the comments which put me on the right track.
>
> --
>
> (I prefer to have a huge stack size anyway to be able to do things
> like "grep foobar **/*(.)".)
I realized I misinterpreted the question originally, and the following
doesn't seem to work 100% but it was a fun idea:
% mkfifo apipe
% foo[265000]='' # number of lines in the file
% cksum apipe$^foo # pass "apipe" to cksum 265000 times
(in another terminal or job control etc)
% while read; do echo $REPLY > apipe; done < infile
When I tried the above on some test data, I got about 10 broken pipes.
Also several lines sometimes get passed through the pipe without an
intervening EOF, I'll admit I don't know the finer points of pipe/fifo
behavior when you open and close them rapidly.
That said, this also seems to take around 4-5 seconds to run.
--
Mikael Magnusson
Messages sorted by:
Reverse Date,
Date,
Thread,
Author