Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

PATCH: character sets for internal zsh tests



After the last mail I sent, I was just thinking about quoting of
separators between array elements in vared, and I drifted into thinking
about how it would be useful to have tests for whether a character
was a separator, etc.  You can do things like [$IFS], but (1) they are
a bit fraught with difficulty because in general IFS can contain
pretty much anything including a "-" or a "!" (2) you need to apply
additional rules in some cases such as "IFS whitespace" or word
characters which always include alphanumerics.  (See my hacks for
[$WORDCHARS] in the Zle function match-words-by-style, for example.)

This patch adds [[:sep:]], [[:wsep:]], [[:ident:]], [[:word:]].  These
are trivial because the tests are already available internally, so we
can get quite a lot from little effort.  The names are simply borrowed
from the internal macros; let me know if you think there are better
names.  I think the last two are OK but maybe [[:ifs:]] and [[:ifsw:]]
or [[:ifsspace:]] would be better for the first two.  Then I will add
tests.

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.53
diff -u -r1.53 expn.yo
--- Doc/Zsh/expn.yo	24 Apr 2005 18:38:04 -0000	1.53
+++ Doc/Zsh/expn.yo	28 Apr 2005 11:30:17 -0000
@@ -1224,19 +1224,81 @@
 first character in the list.
 cindex(character classes)
 There are also several named classes of characters, in the form
-`tt([:)var(name)tt(:])' with the following meanings:  `tt([:alnum:])'
-alphanumeric, `tt([:alpha:])' alphabetic,
-`tt([:ascii:])' 7-bit,
-`tt([:blank:])' space or tab,
-`tt([:cntrl:])' control character, `tt([:digit:])' decimal
-digit, `tt([:graph:])' printable character except whitespace,
-`tt([:lower:])' lowercase letter, `tt([:print:])' printable character,
-`tt([:punct:])' printable character neither alphanumeric nor whitespace,
-`tt([:space:])' whitespace character, `tt([:upper:])' uppercase letter, 
-`tt([:xdigit:])' hexadecimal digit.  These use the macros provided by
+`tt([:)var(name)tt(:])' with the following meanings.
+The first set use the macros provided by
 the operating system to test for the given character combinations,
-including any modifications due to local language settings:  see
-manref(ctype)(3).  Note that the square brackets are additional
+including any modifications due to local language settings, see
+manref(ctype)(3):
+
+startitem()
+item(tt([:alnum:]))(
+The character is alphanumeric
+)
+item(tt([:alpha:]))
+(
+The character is alphabetic
+)
+item(tt([:ascii:]))(
+The character is 7-bit, i.e. is a single-byte character without
+the top bit set.
+)
+item(tt([:blank:]))(
+The character is either space or tab
+)
+item(tt([:cntrl:]))(
+The character is a control character
+)
+item(tt([:digit:]))(
+The character is a decimal digit
+)
+item(tt([:graph:]))(
+The character is a printable character other than whitespace
+)
+item(tt([:lower:]))(l
+The character is a lowercase letter
+)
+item(tt([:print:]))(
+The character is printable
+)
+item(tt([:punct:]))(
+The character is printable but neither alphanumeric nor whitespace
+)
+item(tt([:space:]))(
+The character is whitespace
+)
+item(tt([:upper:]))(
+The character is an uppercase letter
+)
+item(tt([:xdigit:]))(
+The character is a hexadecimal digit
+)
+enditem()
+
+Another set of tests are handled internally by the shell and
+are not sensitive to the locale:
+
+startitem()
+item(tt([:ident:]))(
+The character is allowed to form part of a shell identifier, such
+as a parameter name
+)
+item(tt([:sep:]))(
+The character is a separator, i.e. is contained in the tt(IFS) parameter
+)
+item(tt([:word:]))(
+The character is treated as part of a word; this test is sensitive
+to the value of the tt(WORDCHARS) parameter
+)
+item(tt([:wsep:]))(
+The character is an IFS white space character; see the documentation
+for tt(IFS) in
+ifzman(the zmanref(zshparams) manual page)\
+ifnzman(noderef(Parameters Used By The Shell))\
+.
+)
+enditem()
+
+Note that the square brackets are additional
 to those enclosing the whole set of characters, so to test for a
 single alphanumeric character you need `tt([[:alnum:]])'.  Named
 character sets can be used alongside other types,
Index: Src/pattern.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v
retrieving revision 1.26
diff -u -r1.26 pattern.c
--- Src/pattern.c	26 Apr 2005 09:51:29 -0000	1.26
+++ Src/pattern.c	28 Apr 2005 11:30:19 -0000
@@ -193,8 +193,12 @@
 #define PP_SPACE  11
 #define PP_UPPER  12
 #define PP_XDIGIT 13
-#define PP_UNKWN  14
-#define PP_RANGE  15
+#define PP_IDENT  14
+#define PP_SEP    15
+#define PP_WORD   16
+#define PP_WSEP   17
+#define PP_UNKWN  18
+#define PP_RANGE  19
 
 #define	P_OP(p)		((p)->l & 0xff)
 #define	P_NEXT(p)	((p)->l >> 8)
@@ -1118,6 +1122,14 @@
 			    ch = PP_UPPER;
 			else if (!strncmp(patparse, "xdigit", len))
 			    ch = PP_XDIGIT;
+			else if (!strncmp(patparse, "ident", len))
+			    ch = PP_IDENT;
+			else if (!strncmp(patparse, "sep", len))
+			    ch = PP_SEP;
+			else if (!strncmp(patparse, "word", len))
+			    ch = PP_WORD;
+			else if (!strncmp(patparse, "wsep", len))
+			    ch = PP_WSEP;
 			else
 			    ch = PP_UNKWN;
 			patparse = nptr + 2;
@@ -2724,6 +2736,22 @@
 		if (isxdigit(ch))
 		    return 1;
 		break;
+	    case PP_IDENT:
+		if (iident(ch))
+		    return 1;
+		break;
+	    case PP_SEP:
+		if (isep(ch))
+		    return 1;
+		break;
+	    case PP_WORD:
+		if (iword(ch))
+		    return 1;
+		break;
+	    case PP_WSEP:
+		if (iwsep(ch))
+		    return 1;
+		break;
 	    case PP_RANGE:
 		range++;
 		r1 = STOUC(UNMETA(range));

-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************



Messages sorted by: Reverse Date, Date, Thread, Author