Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: Using file lines as "input files"

X-seq: zsh-users 27879
From: Mikael Magnusson <mikachu@xxxxxxxxx>
To: dominik.vogt@xxxxxx, Zsh Users <zsh-users@xxxxxxx>
Subject: Re: Using file lines as "input files"
Date: Sat, 9 Jul 2022 04:21:37 +0200
Archived-at: <https://zsh.org/users/27879>
In-reply-to: <Ysi7JTBpUXK/F8v/@gmx.de>
List-id: <zsh-users.zsh.org>
References: <Ysiacl2I2eYF+uY4@gmx.de> <CAH+w=7betuB2tG+RrkYiqsQ66_3xKwfK1kSkVyY-SBRcVjqhvQ@mail.gmail.com> <Ysi7JTBpUXK/F8v/@gmx.de>

On 7/9/22, Dominik Vogt <dominik.vogt@xxxxxx> wrote:
> On Fri, Jul 08, 2022 at 03:04:31PM -0700, Bart Schaefer wrote:
>> On Fri, Jul 8, 2022 at 1:58 PM Dominik Vogt <dominik.vogt@xxxxxx> wrote:
>> >
>> > Disclaimer: I _know_ this can be done in seconds with perl /
>> > python, but I like to not rely on scripting languages when the
>> > shell can do the job.
>>
>> This is sort of like saying "I like to not rely on hiking boots when
>> shoes can do the job."
>
> Actually, for me, scripting languages are the "shoes" because they
> don't interact very well with the command pipeline, unless you
> spend an absurd amount of work to make them do so.  Calling
> commands for everything can be slower, but most of the time it's
> just a symptom of bad scripting.  GNU coreutils are faster than
> anything I'll ever be willing to code (or any perl or python
> script or C or C++ library for that matter).  The trick is keeping
> the process spawning overhead low.
>
>> >   $ chksum Fline1 Fline2 Fline3 ... Fline265000
>> >
>> > (Of course without actually splitting the input file
>>
>> If "not actually splitting" means what it seems to mean, and you
>> literally want to run cksum, the answer is no.
>
> Right.
>
> This does the job pretty well, relying entirely on existing Unix
> tools:
>
>   ulimit -s 100000
>   split -l 1 "$INPUTF" ff
>   cksum ff*
>   rm ff*
>
> That cuts runtime down to seven seconds instead of four minutes,
> at the cost of a fem hunred MB on the RAM disk.  Splitting the
> source file and removing the fragments takes about three to four
> seconds.
>
> Thanks for the comments which put me on the right track.
>
> --
>
> (I prefer to have a huge stack size anyway to be able to do things
> like "grep foobar **/*(.)".)

I realized I misinterpreted the question originally, and the following
doesn't seem to work 100% but it was a fun idea:
% mkfifo apipe
% foo[265000]='' # number of lines in the file
% cksum apipe$^foo # pass "apipe" to cksum 265000 times
(in another terminal or job control etc)
% while read; do echo $REPLY > apipe; done < infile

When I tried the above on some test data, I got about 10 broken pipes.
Also several lines sometimes get passed through the pipe without an
intervening EOF, I'll admit I don't know the finer points of pipe/fifo
behavior when you open and close them rapidly.

That said, this also seems to take around 4-5 seconds to run.

-- 
Mikael Magnusson

Follow-Ups:
- Re: Using file lines as "input files"
  - From: Dominik Vogt

References:
- Using file lines as "input files"
  - From: Dominik Vogt
- Re: Using file lines as "input files"
  - From: Bart Schaefer
- Re: Using file lines as "input files"
  - From: Dominik Vogt

Messages sorted by: Reverse Date, Date, Thread, Author