Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: PATCH: Apply spell correction to autocd



On Feb 27,  8:44pm, Bart Schaefer wrote:
} Subject: PATCH: Apply spell correction to autocd
}
} I don't know whether this is going to require tweaking for wide-char file
} names, but it's at least as good as the current bin_cd() implementation.

I was in a bit of a hurry when I worked out that patch, and it occurred to
me laterthat this implementation prefers names by cdpath order rather than
by comparison distance, so I went back to look again, and found several
interesting things.

The first is this snippet of spckword():

	if ((u = spname(guess)) != guess)
	    best = u;

The condition tested here is always true, because spname() never returns
anything other than NULL or a pointer to an internal static buffer.  This
might as well be:

	best = spname(guess);

However, I'm not sure that's the intended semantic, which might be:

	if ((u = spname(guess)) && strcmp(u, guess))
	    best = u;

The next thing that I noticed is that there's no way to recover the comp
distance computed by spname().  Which probably doesn't matter as it's
always less than 3 if spname() returned anything useful.  This is a bit
different than the scheme applied to scanning the hash tables, which
uses a threshold distance of 1/4 of the length of the input.  In other
words, zsh can correct more mistakes in hashed strings than in file
paths, unless the component directory names are very short.

The reason I was interested in the distance computed by spname() was that
it seemed reasonable to loop over the entire cdpath to find the best of
all possible matches, and also to use that distance as the starting value
of d in the next section of spckword():

 	    d = 100;
 	    scanhashtable(reswdtab, 1, 0, 0, spscan, 0);

That is, I'd prefer not to choose something from the hash tables if
there's a cdpath directory that's a better fit.  Presently (even before
my patch) zsh always prefers the hash table unless there's an exact
match from spname(), even if the hashed value is a less precise match.

Finally, spname() is a bit inconsistent, because it returns NULL if it
finds a match with a distance >= 3 in any leading path component, but
returns a copy of the input string even when it finds no match at all
in the final path component.  I suppose that's intended to allow one to
create new files in existing directories, correcting only the existing
part of the path, but it makes spname() ugly to use (and CORRECT_ALL
less useful from the user's perspective) in any case where the final
component is required to exist, such as for "cd".

So I'm not going to commit that patch -- which would be better off not
having to call spckword() recursively in any case -- pending resolution
of some of these issues.  Anybody have any comments?

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com

Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net   



Messages sorted by: Reverse Date, Date, Thread, Author