Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: find duplicate files
On Sat, Apr 06, 2019 at 07:40:59AM +0200, Emanuel Berg wrote:
> Is this any good? Can it be done lineary?
>
> TIA
>
> #! /bin/zsh
>
> find-duplicates () {
> local -a files
> [[ $# = 0 ]] && files=("${(@f)$(ls)}") || files=($@)
>
> local dups=0
>
> # files
> local a
> local b
>
> for a in $files; do
> for b in $files; do
> if [[ $a != $b ]]; then
> diff $a $b > /dev/null
> if [[ $? = 0 ]]; then
> echo $a and $b are the same
> dups=1
> fi
> fi
> done
> done
> [[ $dups = 0 ]] && echo "no duplicates"
> }
> alias dups=find-duplicates
Your function keeps comparing even after it finds duplicates, so its
runtime will be O(N^2), i.e., proportional to the square of the sum of
file sizes (N). Here's one that calculates MD5 checksums, and compares
those, and so is O(N) + O(M^2), i.e., proportional to the sum of file
sizes plus the square of the number of files (M).
#!/bin/zsh
find-duplicates () {
(( # > 0 )) || set -- *(.N)
local dups=0
md5sum $@ | sort | uniq -c |
grep -qv '^ *1 ' | wc -l | read dups
(( dups == 0 )) && echo "no duplicates"
}
A better solution would use an associative array (local -A NAME), would
*not* sort, and would stop as soon as it found a duplicate, but I'll
leave that as an exercise for the reader. :-)
Paul.
--
Paul Hoffman <nkuitse@xxxxxxxxxxx>
Messages sorted by:
Reverse Date,
Date,
Thread,
Author