Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: clarification on (#U) in pattern matching.
- X-seq: zsh-workers 49745
- From: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
- To: zsh workers <zsh-workers@xxxxxxx>
- Subject: Re: clarification on (#U) in pattern matching.
- Date: Mon, 7 Feb 2022 12:15:44 +0000 (GMT)
- Archived-at: <https://zsh.org/workers/49745>
- Importance: Medium
- In-reply-to: <1071890479.577225.1644233454174@mail2.virginmedia.com>
- List-id: <zsh-workers.zsh.org>
- References: <20220206084255.tn3dgitvpr7qdjig@chazelas.org> <1071890479.577225.1644233454174@mail2.virginmedia.com>
Sorry, this just went to Stephane.
pws
> On 07 February 2022 at 11:30 Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx> wrote:
>
>
> > On 06 February 2022 at 08:42 Stephane Chazelas <stephane@xxxxxxxxxxxx> wrote:
> > $ set -o extendedglob
> > $ a='Stéphane€'
> > $ print -rn -- ${a//(#U)?} | hd
> > 00000000 a9 82 ac |...|
> > 00000003
> >
> > It seems that with (#U) (and here in a locale using UTF-8 as
> > charmap), ? with (#U) matches only on the first byte of
> > multibyte characters. Is that how it's meant to be?
>
> I think what you're hitting is probably, as you suspected, a
> difference between the pattern matching code and the substitution
> code. The underlying pattern matching really is byte by byte,
> but this doesn't force any substitution such as // to behave
> in the same way. As far as I know, the MULTIBYTE option is
> the only higher level consistency measure we have.
>
> I think there might be a parameter matching flag that you can
> also set that would help. I'd have to look in more detail.
>
> pws
Messages sorted by:
Reverse Date,
Date,
Thread,
Author