Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
[PATCH] [[:blank:]] only matches on SPC and TAB
- X-seq: zsh-workers 42763
- From: Stephane Chazelas <stephane.chazelas@xxxxxxxxx>
- To: Zsh hackers list <zsh-workers@xxxxxxx>
- Subject: [PATCH] [[:blank:]] only matches on SPC and TAB
- Date: Sun, 13 May 2018 22:25:53 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=N3oHC+mVVHV7ipvLVrMpGwJgxJowP/FnlDFYeBG2cho=; b=e0+L1fFFHn9CSaxAzWGzXpls7BHXCtiOyde++r37S5dg2K9eQv16z7YZF7JMsDEAL0 0u+sptIT5tH1uTLq7KZiqvkaKomGIJ1XvcO+Z7Kyc7vAhq1mtmxO/n5o4Kyt94nFNZpt 75mm09/ak8APD8CUtpcBjdSe2UPiS6F7rD1ujNc5HiRbqwOBOBLNLASgNDE0lKTJa9Lm NHLV6u96kQotOT9zGD0Y3RjxJrO8H79kMhqJ0uRqOt9YSAsJSgGPMBbr5Fjvm03UvaAd VZFJIuPULJ4pb0RkE/n/i2vNTdpmKefpxAcbJduMm0DLazwokgB27SdnqDDHgEBEGpV3 vsMw==
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- List-unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
- Mail-followup-to: Zsh hackers list <zsh-workers@xxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
I noticed that [[:blank:]] was not matching on non-ASCII blank
characters. In a typical UTF-8 GNU locale, [[:blank:]] normally
includes
U+0009 CHARACTER TABULATION
U+0020 SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
On FreeBSD:
U+0009 CHARACTER TABULATION
U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+FEFF ZERO WIDTH NO-BREAK SPACE
(Strangely enough U+00A0 is not classified as blank in single
byte charsets like ISO8859-1 there)
The code indeed matches on SPC and TAB explicitly both in the
multibyte and singlebyte cases (the non-breaking space is one
non-ASCII character that appears in a few singlebyte charsets
and is considered as blank on some systems (not GNU ones)).
In case that was not intentional, this patch should fix it:
diff --git a/Src/pattern.c b/Src/pattern.c
index fc7c737..d3eac44 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -3605,7 +3605,7 @@ mb_patmatchrange(char *range, wchar_t ch, int zmb_ind, wint_t *indptr, int *mtp)
return 1;
break;
case PP_BLANK:
- if (ch == L' ' || ch == L'\t')
+ if (iswblank(ch))
return 1;
break;
case PP_CNTRL:
@@ -3840,7 +3840,7 @@ patmatchrange(char *range, int ch, int *indptr, int *mtp)
return 1;
break;
case PP_BLANK:
- if (ch == ' ' || ch == '\t')
+ if (isblank(ch))
return 1;
break;
case PP_CNTRL:
--
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author