Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Emulating 'locate'



Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> writes:

> On Oct 4,  9:48am, Lloyd Zusman wrote:
> }
> } >> I'm not sure how it compares to this:
> } >>   locate() { find / -name "*${^*}*" -print }
> } 
> } zsh interprets the ${^*} part in intersperses it between the other two
> } asterisks when the shell function is being invoked, and 'find'
> } interprets the result.  I think I should have left out the ^, however,
> } or probably only used ${1}.
>
> Actually the caret is meaningless for $* when in double quotes.  What
> you were thinking of was probably "*${^@}*".
>
>     set "*${^@}*"
>     eval find / '\(' -name "'${(j:' -o -name ':)@}'" '\)' -print

Well, to be honest, I wasn't thinking at all. :)

I just copied part of the argument specfication from the extended-glob
version of the shell function to my quick, off the cuff version using
'find'.

Your example indeed does what I intended ... if I were to have been
thinking. :)


> } [ ... ]
> } 
> }   find / -name specific-file -print   # 15 min 19 sec elapsed
> }   xlocate specific-file               # 28 min 40 sec elapsed
>
> I think that's expected rather than unfortunate.  For one thing, that
> find command will only print paths that end in names matching the
> pattern, whereas xlocate descends and prints entire trees below any
> directory matching the pattern.  (I'm not even sure how to express the
> latter in find without resorting to -exec of another find.)  However,
> zsh also does a lot of stat() calls during the glob to avoid following
> symlinks and perform MARK_DIRS and so on, and it buffers up all the
> results and does a duplicate-eliminating sort pass as well (in case of
> a path like x/y/z/foo/a/b/c/foo/p/d/q when globbing for **/foo/*).

'find' has to do the same stat() calls as well, as it also has to
identify directories, avoid traversing symlinks, etc.  I didn't take
into consideration the fact that the zsh version keeps traversing even
when it finds a match.  However, in my timing example, I was searching
for a single file that happens to reside in the leaf of a directory
tree.  For the purpose of this test, I made sure that there was only one
instance of a file with this basename.  Therefore, both 'find' and 'zsh'
both traversed my entire file system, and both considered considered the
same number of items.

Actually, this brings up a question: I presume that if I want the zsh
version to only look for matches on the basename of paths being tested,
much like 'find', all that's needed would be to leave off the trailing
{,/**/*} ... correct?

I re-ran my earlier xlocate timing with a version that didn't have this
trailing {,/**/*}, and I did another xlocate timing with a version that
didn't surround the trailing ${^*} with asterisks.  And for
completeness, I did another 'find' run, this time with asterisks
surrounding the fine name.  Here are the results, along with a rehash of
the earlier findings, which are numbered 1 and 2, below (all of the
'print -l' versions are within the 'xlocate' alias, with 'specific-file'
passed as an argument):

1. find / -name specific-file -print              15 min 19 sec elapsed
2. print -l /**/*${^*}*{,/**/*}                   28 min 40 sec elapsed
3. print -l /**/*${^*}*                           13 min 58 sec elapsed
4. print -l /**/${^*}                             14 min 09 sec elapsed
5. find / -name '*specific-file*' -print          14 min 10 sec elapsed

I did numbers 1 and 2 at roughly the same time, and numbers 3, 4, and 5
at around the same time, a couple hours later.  My system was less
loaded during the 3/4/5 tests than during the earlier ones, which at
explains the somewhat lower values for all three of those elapsed times.

This is a rather non-scientific test, as you can't generalize from a
sample space of 1.  Nonetheless, the trends that are shown seem
reasonable: the zsh version is considerably slower with the trailing
{,/**/*}, and similar matching within 'find' and zsh turn out to take
similar amounts of time.

Based on this, it seems that zsh and 'find' are both maximally optimized
with regard to recursive searching ... or at least the're both optimized
equally well. :)   Therefore, I would no longer advise against using zsh
for these kinds of tasks.  And given that zsh's globbing is much more
sophisticated than find's, I would now lean towards using zsh in these
cases ... as long as you are careful about choosing matching constructs
that suit (and do not exceed) the task at hand.


> } [ ... ]
> }
> }   xlocate() {
> }     setopt nullglob extendedglob
> }     eval print -l ${argv[1]%/}'/**/'${^argv[2,-1]}'{,/**/*}'
> }   }
>
> You should use localoptions there, and you can avoid the eval:
>
>     xlocate() {
> 	setopt localoptions nullglob extendedglob
> 	print -l ${~argv[1]%/}/**/${~^argv[2,-1]}{,/**/*}
>     }
>
> And if you add this alias (which must come after the function def'n):
>
>     alias xlocate='noglob xlocate'

Well, using this alias causes the argv indices to be off by one in the
shell function: $0 becomes 'noglob', argv[1] becomes 'xlocate', etc.
The way I handle that case in my previously posted version (with the
help text, error checking, etc.) is to put the following near the top of
the shell function, and to use ${prog} everywhere I was previously using
$0.  In the shorter version of xlocate, above, a similar thing would
also have to be done with the argv indices.

  if [[ $0 = noglob ]]
  then
    prog=${argv[1]}
    (( OPTIND = $OPTIND + 1 ))
  else
    prog=$0
  fi


> [ ... ]

-- 
 Lloyd Zusman
 ljz@xxxxxxxxxx



Messages sorted by: Reverse Date, Date, Thread, Author