Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: 'case' pattern matching bug with bracket expressions
- X-seq: zsh-workers 35128
- From: Peter Stephenson <p.stephenson@xxxxxxxxxxx>
- To: Martijn Dekker <martijn@xxxxxxxx>, zsh-workers@xxxxxxx
- Subject: Re: 'case' pattern matching bug with bracket expressions
- Date: Thu, 14 May 2015 15:42:38 +0100
- In-reply-to: <55549FB2.80705@inlv.org>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- Organization: Samsung Cambridge Solution Centre
- References: <55549FB2.80705@inlv.org>
On Thu, 14 May 2015 14:14:26 +0100
Martijn Dekker <martijn@xxxxxxxx> wrote:
> While writing a cross-platform shell library I've come across a bug in
> the way zsh (in POSIX mode) matches patterns in 'case' statements that
> are at variance with other POSIX shells.
>
> Normally, zsh considers an empty bracket expression [] a bad pattern
> while other shells ([d]ash, bash, ksh) consider it a negative:
>
> case abc in ( [] ) echo yes ;; ( * ) echo no ;; esac
>
> Expected output: no
> Got output: zsh: bad pattern: []
This is the shell language being typically duplicitous and unhelpful.
"]" after a "[" indicates that the "]" is part of the set. This is
normal; in bash as well as zsh:
[[ ']' = []] ]] && echo yes
outputs 'yes'.
However, as you've found out, other shells handle the case where there
isn't another ']' later. Generally there's no harm in this, and in most
cases we could do this (the case below is harder).
Nonetheless, there's a real ambiguity here, so given this and the
following I'd definitely suggest not relying on it if you can avoid
doing so --- use something else to signify an empty string.
> The same thing does NOT produce an error, but a false positive (!), if
> an extra non-matching pattern with | is added:
>
> case abc in ( [] | *[!a-z]*) echo yes ;; ( * ) echo no ;; esac
This is the pattern:
'[' introducing bracketed expression
'] | *[!a-z' characters inside
']' end of bracketed expression
'*' wildcard.
so it's a set including the character a followed by anything, and hence
matches.
I'm not really sure we *can* resolve this unambiguously the way you
want. Is there something that forbids us from interpreting the pattern
that way? The handling of ']' at the start is mandated, if I've
followed all the logic corretly --- POSIX 2007 Shell and Utilities
2.13.1 says:
[
If an open bracket introduces a bracket expression as in XBD RE
Bracket Expression, except that the <exclamation-mark> character (
'!' ) shall replace the <circumflex> character ( '^' ) in its role
in a non-matching list in the regular expression notation, it shall
introduce a pattern bracket expression. A bracket expression
starting with an unquoted <circumflex> character produces
unspecified results. Otherwise, '[' shall match the character
itself.
The languaqge is a little turgid, but I think it's saying "unless
you have ^ or [ just go with the RE rules in [section 9.3.5]".
9.3.5 (in regular expressions) says, amongst a lot of other things:
The <right-square-bracket> ( ']' ) shall lose its special meaning and
represent itself in a bracket expression if it occurs first in the
list (after an initial <circumflex> ( '^' ), if any)
That's a "shall".
I haven't read through the "case" doc so there may be some killer reason
why that " | " has to be a case separator and not part of a
square-bracketed expression. But that would seem to imply some form of
hierarchical parsing in which those characters couldn't occur within a
pattern.
By the way, we don't handle all forms in 9.3.5, e.g. equivalence sets,
so saying "it works like REs" isn't a perfect answer for zsh, either.
pws
Messages sorted by:
Reverse Date,
Date,
Thread,
Author