Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: [PATCH] [[:blank:]] only matches on SPC and TAB
- X-seq: zsh-workers 42775
- From: Stephane Chazelas <stephane.chazelas@xxxxxxxxx>
- To: Peter Stephenson <p.stephenson@xxxxxxxxxxx>
- Subject: Re: [PATCH] [[:blank:]] only matches on SPC and TAB
- Date: Mon, 14 May 2018 16:51:31 +0100
- Cc: Zsh hackers list <zsh-workers@xxxxxxx>
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=tOON2Z9tAhGAYkXJ3hiH7/EYAF3MFUOhCaS9174Kd2o=; b=k52ReuSEVr3DsWxrn4TGDTEMm4HX0wSc9iCbZRZl2p1+05f/9wedbjG05A42WLiCRJ V3OZ0vTDp3ECsVjkyurH3YUG557KkCCvgk9WTf5JHbOy1lPiJlMmd9+T+IXfkDOpqze/ f2S89bqLT0zhkjs/WVtB8sNrAeymBIIafd74b7OlFYIi4qBgGxh3o+stXuJxyvtJaMvq z78FrF4xMaLFFJ3eL0INy5VVOYENmRCWA0tCVOrvTAyw2VqT5UyKkk/AxGpLJ1JRREgr HckpZ4rrYkzymDcQdNOYzFUvDKxXbYv0ElI9QPG4knUR9MITlIdZX3AtSzlJ8r/0d7+X /xQQ==
- In-reply-to: <20180514145056.3eedaea9@camnpupstephen.cam.scsc.local>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- List-unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
- Mail-followup-to: Peter Stephenson <p.stephenson@xxxxxxxxxxx>, Zsh hackers list <zsh-workers@xxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- References: <20180513212553.GA29028@chaz.gmail.com> <CAKc7PVDyrTMsmBSEDcMC=CNVCjOnEDVtywRYA0=UnNCBpF=7JQ@mail.gmail.com> <20180514063611.GA7263@chaz.gmail.com> <CGME20180514064505epcas3p1b2f178c595fc9bb962e4094e296ba699@epcas3p1.samsung.com> <20180514064431.GB7263@chaz.gmail.com> <20180514094733.308bff1a@camnpupstephen.cam.scsc.local> <20180514123425.GA19631@chaz.gmail.com> <20180514145056.3eedaea9@camnpupstephen.cam.scsc.local>
2018-05-14 14:50:56 +0100, Peter Stephenson:
[...]
> It wouldn't be ridiculous to change the documentation for this case and
> require "unsetopt multibyte" for strict byte-by-byte comparisions, which
> is already how it works in the vast majority of other cases.
[...]
But note that here it's not about multibyte vs singlebyte but
whether [:blank:] honours the locale like the other POSIX
character classes (alpha, punct...) do.
There are locales on some systems (like NetBSD already
mentioned) that use a single-byte charset where more than SPC
and TAB are classified as "blank" (like 0xA0 (nbsp) in locales
using iso8859-x charsets or 0x9A in KOI8-R on NetBSD).
IMO, without the "multibyte" option, we should still call
isblank() which on most systems and most locales will match only
on SPC and TAB but is not guaranteed to (and does not in
practice like on NetBSD).
I just noticed that on NetBSD, in locales using UTF-8 or
GB18030, isblank() returns true on \v (vertical TAB), not in any
other locale! So does iswblank(). So out goes my claim that
"blank" should be for horizontal spaces. On OpenBSD (where only
UTF-8 charsets are supported in locales other than C/POSIX),
iswblank() matches on \v and \f.
What a mess!
--
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author