Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author
Re: Questions about character types

X-seq: zsh-workers 22544
From: Peter Stephenson <pws@xxxxxxx>
To: zsh-workers@xxxxxxxxxx (Zsh hackers list)
Subject: Re: Questions about character types
Date: Mon, 10 Jul 2006 13:51:23 +0100
In-reply-to: <060705093122.ZM1165@xxxxxxxxxxxxxxxxxxxxxx>
Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
References: <200607051214.k65CEHQ2011360@xxxxxxxxxxxxxx> <060705093122.ZM1165@xxxxxxxxxxxxxxxxxxxxxx>
Bart Schaefer wrote:
> } Another question is what to do with user names.  Currently these are
> } just the ASCII identifier characters plus "-".  Is it useful to extend
> } these to include alphanumeric characters from the local character set?
> 
> My impression is that it would be, but some non-English-speakers should
> weigh in.

I've allowed it; it's particularly useful if you're using the extended
parameter naming rules and the reference is to a named directory.

> } Finally, I failed to interpret this code from math.c:
> } 
> } 	    if (*ptr == '+' && (unary || !ialnum(*ptr))) {
> } 		ptr++;
> 
> I suspect that's a bug.  It probably originally said
> 
> 	    if (*ptr++ == '+' && (unary || !ialnum(*ptr))) {
> 
> but someone realized it was wrong to increment the pointer if it was NOT
> equal to plus, and made an incomplete fix.

I've just assumed this is "&& 1" which is how it's evaluated as far back
as the CVS archive goes without any obvious problems, and hence removed
it.

Here is the patch.  The MULTIBYTE and POSIX_IDENTIFIERS options should
be respected whenever necessary when testing character types.

There's one remaining big job:  I have not yet fixed up IFS to handle
multibyte characters (also isep() and ISEP macros).  That looks a little
messy in places.

The other remaining cases I'm aware of where we still don't test for
multibyte characters should be harmless:
- Some idigit()s.  I don't see any good reason for allowing active
  multibyte digit characters in numerical expressions (for example,
  extra width digits), so anywhere a real digit (rather than just
  a printable character that happens to look like a digit) is required
  it must be 0 to 9 from the portable character set.
- Likewise some iblank()s when inputting text.  Whitespace has to
  be portable whitespace.
- One ialpha() when checking options to builtins, since
  all option letters come from the portable character set.

Index: README
===================================================================
RCS file: /cvsroot/zsh/zsh/README,v
retrieving revision 1.33
diff -u -r1.33 README
--- README	26 Jun 2006 09:57:17 -0000	1.33
+++ README	10 Jul 2006 12:49:17 -0000
@@ -50,11 +50,23 @@
 subsequently by the user.  It is valid for the variable to be unset.
 
 Zsh has previously been lax about whether it allows octets with the
-top bit set to be part of a shell identifier.  With --enable-multibyte set,
-this is now completely disabled.  This is a temporary fix until the main
-shell handles multibyte characters properly and the appropriate library
-tests can be used.  This change may be reviewed if no such permanent fix
-is forthcoming.
+top bit set to be part of a shell identifier.  Older versions of the shell
+assumed all such octets were allowed in identifiers, however the POSIX
+standard does not allow such characters in identifiers.  The older
+behaviour is still obtained with --disable-multibyte in effect.
+With --enable-multibyte set there are three possible cases:
+  MULTIBYTE option unset:  only ASCII characters are allowed; the
+    shell does not attempt to identify non-ASCII characters at all.
+  MULTIBYTE option set, POSIX_IDENTIFIERS option unset: in addition
+    to the POSIX characters, any alphanumeric characters in the
+    local character set are allowed.  Note that scripts and functions that
+    take advantage of this are non-portable; however, this is in the spirit
+    of previous versions of the shell.  Note also that the options must
+    be set before the shell parses the script or function; setting
+    them during execution is not sufficient.
+  MULITBYTE option set, POSIX_IDENTIFIERS set:  only ASCII characters
+    are allowed in identifiers even though the shell will recognise
+    alphanumeric multibyte characters.
 
 The completion style pine-directory must now be set to use completion
 for PINE mailbox folders; previously it had the default ~/mail.  This
Index: Doc/Zsh/options.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/options.yo,v
retrieving revision 1.46
diff -u -r1.46 options.yo
--- Doc/Zsh/options.yo	9 Apr 2006 21:47:22 -0000	1.46
+++ Doc/Zsh/options.yo	10 Jul 2006 12:49:18 -0000
@@ -1204,6 +1204,27 @@
 tt(trap) and
 tt(unset).
 )
+pindex(POSIX_IDENTIFIERS)
+cindex(identifiers, non-portable characters in)
+cindex(parameter names, non-portable characters in)
+item(tt(POSIX_IDENTIFIERS) <K> <S>)(
+When this option is set, only the ASCII characters tt(a) to tt(z), tt(A) to
+tt(Z), tt(0) to tt(9) and tt(_) may be used in identifiers (names
+of shell parameters and modules).
+
+When the option is unset and multibyte character support is enabled (i.e. it
+is compiled in and the option tt(MULTIBYTE) is set), then additionally any
+alphanumeric characters in the local character set may be used in
+identifiers.  Note that scripts and functions written with this feature are
+not portable, and also that both options must be set before the script
+or function is parsed; setting them during execution is not sufficient
+as the syntax var(variable)tt(=)var(value) has already been parsed as
+a command rather than an assignment.
+
+If multibyte character support is not compiled into the shell this option is
+ignored; all octets with the top bit set may be used in identifiers.
+This is non-standard but is the traditional zsh behaviour.
+)
 pindex(SH_FILE_EXPANSION)
 cindex(sh, expansion style)
 cindex(expansion style, sh)
Index: Src/builtin.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/builtin.c,v
retrieving revision 1.158
diff -u -r1.158 builtin.c
--- Src/builtin.c	30 May 2006 22:35:03 -0000	1.158
+++ Src/builtin.c	10 Jul 2006 12:49:20 -0000
@@ -2629,9 +2629,7 @@
 	    char *modname = NULL;
 	    char *ptr;
 
-	    for (ptr = funcname; *ptr; ptr++)
-		if (!iident(*ptr))
-		    break;
+	    ptr = itype_end(funcname, IIDENT, 0);
 	    if (idigit(*funcname) || funcname == ptr || *ptr) {
 		zwarnnam(name, "-M %s: bad math function name", funcname);
 		return 1;
Index: Src/glob.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/glob.c,v
retrieving revision 1.51
diff -u -r1.51 glob.c
--- Src/glob.c	30 May 2006 22:35:03 -0000	1.51
+++ Src/glob.c	10 Jul 2006 12:49:21 -0000
@@ -1443,9 +1443,7 @@
 
 		    if (s[-1] == '+') {
 			plus = 0;
-			tt = s;
-			while (iident(*tt))
-			    tt++;
+			tt = itype_end(s, IIDENT, 0);
 			if (tt == s)
 			{
 			    zerr("missing identifier after `+'");
Index: Src/lex.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/lex.c,v
retrieving revision 1.33
diff -u -r1.33 lex.c
--- Src/lex.c	30 May 2006 22:35:03 -0000	1.33
+++ Src/lex.c	10 Jul 2006 12:49:22 -0000
@@ -1135,10 +1135,13 @@
 		if (idigit(*t))
 		    while (++t < bptr && idigit(*t));
 		else {
-		    while (iident(*t) && ++t < bptr);
+		    int sav = *bptr;
+		    *bptr = '\0';
+		    t = itype_end(t, IIDENT, 0);
 		    if (t < bptr) {
-			*bptr = '\0';
 			skipparens(Inbrack, Outbrack, &t);
+		    } else {
+			*bptr = sav;
 		    }
 		}
 		if (*t == '+')
Index: Src/math.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/math.c,v
retrieving revision 1.25
diff -u -r1.25 math.c
--- Src/math.c	30 Jun 2006 09:41:35 -0000	1.25
+++ Src/math.c	10 Jul 2006 12:49:22 -0000
@@ -265,11 +265,12 @@
 {
     int cct = 0;
     yyval.type = MN_INTEGER;
+    char *ie;
 
     for (;; cct = 0)
 	switch (*ptr++) {
 	case '+':
-	    if (*ptr == '+' && (unary || !ialnum(*ptr))) {
+	    if (*ptr == '+') {
 		ptr++;
 		return (unary) ? PREPLUS : POSTPLUS;
 	    }
@@ -279,7 +280,7 @@
 	    }
 	    return (unary) ? UPLUS : PLUS;
 	case '-':
-	    if (*ptr == '-' && (unary || !ialnum(*ptr))) {
+	    if (*ptr == '-') {
 		ptr++;
 		return (unary) ? PREMINUS : POSTMINUS;
 	    }
@@ -469,12 +470,12 @@
 		}
 		cct = 1;
 	    }
-	    if (iident(*ptr)) {
+	    if ((ie = itype_end(ptr, IIDENT, 0)) != ptr) {
 		int func = 0;
 		char *p;
 
 		p = ptr;
-		while (iident(*++ptr));
+		ptr = ie;
 		if (*ptr == '[' || (!cct && *ptr == '(')) {
 		    char op = *ptr, cp = ((*ptr == '[') ? ']' : ')');
 		    int l;
Index: Src/module.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/module.c,v
retrieving revision 1.22
diff -u -r1.22 module.c
--- Src/module.c	30 May 2006 22:35:03 -0000	1.22
+++ Src/module.c	10 Jul 2006 12:49:23 -0000
@@ -734,12 +734,8 @@
 modname_ok(char const *p)
 {
     do {
-	if(*p != '_' && !ialnum(*p))
-	    return 0;
-	do {
-	    p++;
-	} while(*p == '_' || ialnum(*p));
-	if(!*p)
+	p = itype_end(p, IIDENT, 0);
+	if (!*p)
 	    return 1;
     } while(*p++ == '/');
     return 0;
Index: Src/options.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/options.c,v
retrieving revision 1.28
diff -u -r1.28 options.c
--- Src/options.c	30 May 2006 22:35:03 -0000	1.28
+++ Src/options.c	10 Jul 2006 12:49:23 -0000
@@ -176,6 +176,7 @@
 {{NULL, "overstrike",	      0},			 OVERSTRIKE},
 {{NULL, "pathdirs",	      OPT_EMULATE},		 PATHDIRS},
 {{NULL, "posixbuiltins",      OPT_EMULATE|OPT_BOURNE},	 POSIXBUILTINS},
+{{NULL, "posixidentifiers",   OPT_EMULATE|OPT_BOURNE},	 POSIXIDENTIFIERS},
 {{NULL, "printeightbit",      0},                        PRINTEIGHTBIT},
 {{NULL, "printexitvalue",     0},			 PRINTEXITVALUE},
 {{NULL, "privileged",	      OPT_SPECIAL},		 PRIVILEGED},
Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.116
diff -u -r1.116 params.c
--- Src/params.c	27 Jun 2006 16:28:46 -0000	1.116
+++ Src/params.c	10 Jul 2006 12:49:24 -0000
@@ -899,9 +899,7 @@
 		break;
     } else {
 	/* Find the first character in `s' not in the iident type table */
-	for (ss = s; *ss; ss++)
-	    if (!iident(*ss))
-		break;
+	ss = itype_end(s, IIDENT, 0);
     }
 
     /* If the next character is not [, then it is *
@@ -1653,7 +1651,7 @@
 mod_export Value
 fetchvalue(Value v, char **pptr, int bracks, int flags)
 {
-    char *s, *t;
+    char *s, *t, *ie;
     char sav, c;
     int ppar = 0;
 
@@ -1665,9 +1663,8 @@
 	else
 	    ppar = *s++ - '0';
     }
-    else if (iident(c))
-	while (iident(*s))
-	    s++;
+    else if ((ie = itype_end(s, IIDENT, 0)) != s)
+	s = ie;
     else if (c == Quest)
 	*s++ = '?';
     else if (c == Pound)
@@ -1732,7 +1729,7 @@
 		return v;
 	    }
 	} else if (!(flags & SCANPM_ASSIGNING) && v->isarr &&
-		   iident(*t) && isset(KSHARRAYS))
+		   itype_end(t, IIDENT, 1) != t && isset(KSHARRAYS))
 	    v->end = 1, v->isarr = 0;
     }
     if (!bracks && *s)
Index: Src/parse.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/parse.c,v
retrieving revision 1.56
diff -u -r1.56 parse.c
--- Src/parse.c	9 Jul 2006 14:47:22 -0000	1.56
+++ Src/parse.c	10 Jul 2006 12:49:26 -0000
@@ -1603,10 +1603,7 @@
 
 		if (*ptr == Outbrace && ptr > tokstr + 1)
 		{
-		    while (--ptr > tokstr)
-			if (!iident(*ptr))
-			    break;
-		    if (ptr == tokstr)
+		    if (itype_end(tokstr, IIDENT, 0) >= ptr - 1)
 		    {
 			char *toksave = tokstr;
 			char *idstring = dupstrpfx(tokstr+1, eptr-tokstr-1);
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.53
diff -u -r1.53 subst.c
--- Src/subst.c	28 Jun 2006 14:34:27 -0000	1.53
+++ Src/subst.c	10 Jul 2006 12:49:28 -0000
@@ -475,15 +475,14 @@
 		return 0;
 	    *namptr = dyncat(ds, ptr);
 	    return 1;
-	} else if (iuser(str[1])) {   /* ~foo */
-	    char *ptr, *hom, save;
+	} else if ((ptr = itype_end(str+1, IUSER, 0)) != str+1) {   /* ~foo */
+	    char *hom, save;
 
-	    for (ptr = ++str; *ptr && iuser(*ptr); ptr++);
 	    save = *ptr;
 	    if (!isend(save))
 		return 0;
 	    *ptr = 0;
-	    if (!(hom = getnameddir(str))) {
+	    if (!(hom = getnameddir(++str))) {
 		if (isset(NOMATCH))
 		    zerr("no such user or named directory: %s", str);
 		*ptr = save;
@@ -1146,9 +1145,10 @@
      * Shouldn't this be a table or something?  We test for all
      * these later on, too.
      */
-    if (!ialnum(c = *s) && c != '#' && c != Pound && c != '-' &&
-	c != '!' && c != '$' && c != String && c != Qstring &&
-	c != '?' && c != Quest && c != '_' &&
+    c = *s;
+    if (itype_end(s, IIDENT, 1) == s && *s != '#' && c != Pound &&
+	c != '-' && c != '!' && c != '$' && c != String && c != Qstring &&
+	c != '?' && c != Quest &&
 	c != '*' && c != Star && c != '@' && c != '{' &&
 	c != Inbrace && c != '=' && c != Equals && c != Hat &&
 	c != '^' && c != '~' && c != Tilde && c != '+') {
@@ -1446,8 +1446,8 @@
 	    } else
 		spbreak = 2;
 	} else if ((c == '#' || c == Pound) &&
-		   (iident(cc = s[1])
-		    || cc == '*' || cc == Star || cc == '@'
+		   (itype_end(s+1, IIDENT, 0) != s + 1
+		    || (cc = s[1]) == '*' || cc == Star || cc == '@'
 		    || cc == '-' || (cc == ':' && s[2] == '-')
 		    || (isstring(cc) && (s[2] == Inbrace || s[2] == Inpar)))) {
 	    getlen = 1 + whichlen, s++;
@@ -1471,7 +1471,7 @@
 	     * Try to handle this when parameter is named
 	     * by (P) (second part of test).
 	     */
-	    if (iident(s[1]) || (aspar && isstring(s[1]) &&
+	    if (itype_end(s+1, IIDENT, 0) != s+1 || (aspar && isstring(s[1]) &&
 				 (s[2] == Inbrace || s[2] == Inpar)))
 		chkset = 1, s++;
 	    else if (!inbrace) {
Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.126
diff -u -r1.126 utils.c
--- Src/utils.c	30 Jun 2006 09:41:35 -0000	1.126
+++ Src/utils.c	10 Jul 2006 12:49:29 -0000
@@ -1921,7 +1921,7 @@
 	return;
     if (**s == String && !*t) {
 	guess = *s + 1;
-	if (*t || !ialpha(*guess))
+	if (itype_end(guess, IIDENT, 1) == guess)
 	    return;
 	ic = String;
 	d = 100;
@@ -2750,11 +2750,8 @@
  * iident() macro extended to support wide characters.
  *
  * The macro is intended to test if a character is allowed in an
- * internal zsh identifier.  Until the main shell handles multibyte
- * characters it's not a good idea to allow characters other than
- * ASCII characters; it would cause zle to allow characters that
- * the main shell would reject.  Eventually we should be able
- * to allow all alphanumerics.
+ * internal zsh identifier.  We allow all alphanumerics outside
+ * the ASCII range unless POSIXIDENTIFIERS is set.
  *
  * Otherwise similar to wcsiword.
  */
@@ -2774,14 +2771,90 @@
     } else if (len == 1 && iascii(*outstr)) {
 	return iident(*outstr);
     } else {
-	/* TODO: not currently allowed, see above */
-	return 0;
+	return !isset(POSIXIDENTIFIERS) && iswalnum(c);
     }
 }
 /**/
 #endif
 
 
+/*
+ * Find the end of a set of characters in the set specified by itype;
+ * one of IALNUM, IIDENT, IWORD or IUSER.  For non-ASCII characters, we assume
+ * alphanumerics are part of the set, with the exception that
+ * identifiers are not treated that way if POSIXIDENTIFIERS is set.
+ *
+ * See notes above for identifiers.
+ * Returns the same pointer as passed if not on an identifier character.
+ * If "once" is set, just test the first character, i.e. (outptr !=
+ * inptr) tests whether the first character is valid in an identifier.
+ *
+ * Currently this is only called with itype IIDENT or IUSER.
+ */
+
+/**/
+mod_export char *
+itype_end(const char *ptr, int itype, int once)
+{
+#ifdef MULTIBYTE_SUPPORT
+    if (isset(MULTIBYTE) &&
+	(itype != IIDENT || !isset(POSIXIDENTIFIERS))) {
+	mb_metacharinit();
+	while (*ptr) {
+	    wint_t wc;
+	    int len = mb_metacharlenconv(ptr, &wc);
+
+	    if (!len)
+		break;
+
+	    if (wc == WEOF) {
+		/* invalid, treat as single character */
+		int chr = STOUC(*ptr == Meta ? ptr[1] ^ 32 : *ptr);
+		/* in this case non-ASCII characters can't match */
+		if (chr > 127 || !zistype(chr,itype))
+		    break;
+	    } else if (len == 1 && iascii(*ptr)) {
+		/* ASCII: can't be metafied, use standard test */
+		if (!zistype(*ptr,itype))
+		    break;
+	    } else {
+		/*
+		 * Valid non-ASCII character.  Allow all alphanumerics;
+		 * if testing for words, allow all wordchars.
+		 */
+		if (!(iswalnum(wc) ||
+		      (itype == IWORD && wcschr(wordchars_wide, wc))))
+		    break;
+	    }
+	    ptr += len;
+
+	    if (once)
+		break;
+	}
+    } else
+#endif
+	for (;;) {
+	    int chr = STOUC(*ptr == Meta ? ptr[1] ^ 32 : *ptr);
+	    if (!zistype(chr,itype))
+		break;
+	    ptr += (*ptr == Meta) ? 2 : 1;
+
+	    if (once)
+		break;
+	}
+
+    /*
+     * Nasty.  The first argument is const char * because we
+     * don't modify it here.  However, we really want to pass
+     * back the same type as was passed down, to allow idioms like
+     *   p = itype_end(p, IIDENT, 0);
+     * So returning a const char * isn't really the right thing to do.
+     * Without having two different functions the following seems
+     * to be the best we can do.
+     */
+    return (char *)ptr;
+}
+
 /**/
 mod_export char **
 arrdup(char **s)
@@ -3710,9 +3783,10 @@
 
 /**/
 int
-mb_metacharlenconv(char *s, wint_t *wcp)
+mb_metacharlenconv(const char *s, wint_t *wcp)
 {
-    char inchar, *ptr;
+    char inchar;
+    const char *ptr;
     size_t ret;
     wchar_t wc;
 
Index: Src/zsh.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v
retrieving revision 1.92
diff -u -r1.92 zsh.h
--- Src/zsh.h	9 Jul 2006 14:47:22 -0000	1.92
+++ Src/zsh.h	10 Jul 2006 12:49:30 -0000
@@ -1610,6 +1610,7 @@
     OVERSTRIKE,
     PATHDIRS,
     POSIXBUILTINS,
+    POSIXIDENTIFIERS,
     PRINTEIGHTBIT,
     PRINTEXITVALUE,
     PRIVILEGED,
Index: Src/ztype.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/ztype.h,v
retrieving revision 1.3
diff -u -r1.3 ztype.h
--- Src/ztype.h	1 Nov 2005 02:50:22 -0000	1.3
+++ Src/ztype.h	10 Jul 2006 12:49:30 -0000
@@ -42,22 +42,22 @@
 #define IMETA    (1 << 12)
 #define IWSEP    (1 << 13)
 #define INULL    (1 << 14)
-#define _icom(X,Y) (typtab[STOUC(X)] & Y)
-#define idigit(X) _icom(X,IDIGIT)
-#define ialnum(X) _icom(X,IALNUM)
-#define iblank(X) _icom(X,IBLANK)	/* blank, not including \n */
-#define inblank(X) _icom(X,INBLANK)	/* blank or \n */
-#define itok(X) _icom(X,ITOK)
-#define isep(X) _icom(X,ISEP)
-#define ialpha(X) _icom(X,IALPHA)
-#define iident(X) _icom(X,IIDENT)
-#define iuser(X) _icom(X,IUSER)	/* username char */
-#define icntrl(X) _icom(X,ICNTRL)
-#define iword(X) _icom(X,IWORD)
-#define ispecial(X) _icom(X,ISPECIAL)
-#define imeta(X) _icom(X,IMETA)
-#define iwsep(X) _icom(X,IWSEP)
-#define inull(X) _icom(X,INULL)
+#define zistype(X,Y) (typtab[STOUC(X)] & Y)
+#define idigit(X) zistype(X,IDIGIT)
+#define ialnum(X) zistype(X,IALNUM)
+#define iblank(X) zistype(X,IBLANK)	/* blank, not including \n */
+#define inblank(X) zistype(X,INBLANK)	/* blank or \n */
+#define itok(X) zistype(X,ITOK)
+#define isep(X) zistype(X,ISEP)
+#define ialpha(X) zistype(X,IALPHA)
+#define iident(X) zistype(X,IIDENT)
+#define iuser(X) zistype(X,IUSER)	/* username char */
+#define icntrl(X) zistype(X,ICNTRL)
+#define iword(X) zistype(X,IWORD)
+#define ispecial(X) zistype(X,ISPECIAL)
+#define imeta(X) zistype(X,IMETA)
+#define iwsep(X) zistype(X,IWSEP)
+#define inull(X) zistype(X,INULL)
 
 #define iascii(X) isascii(STOUC(X))
 #define ilower(X) islower(STOUC(X))
Index: Src/Zle/compcore.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/compcore.c,v
retrieving revision 1.83
diff -u -r1.83 compcore.c
--- Src/Zle/compcore.c	7 Mar 2006 12:52:28 -0000	1.83
+++ Src/Zle/compcore.c	10 Jul 2006 12:49:31 -0000
@@ -1081,7 +1081,7 @@
     }
     if ((*p == String || *p == Qstring) && p[1] != Inpar && p[1] != Inbrack) {
 	/* This is really a parameter expression (not $(...) or $[...]). */
-	char *b = p + 1, *e = b;
+	char *b = p + 1, *e = b, *ie;
 	int n = 0, br = 1, nest = 0;
 
 	if (*b == Inbrace) {
@@ -1124,10 +1124,16 @@
 	else if (idigit(*e))
 	    while (idigit(*e))
 		e++;
-	else if (iident(*e))
-	    while (iident(*e) ||
-		   (comppatmatch && *comppatmatch && (*e == Star || *e == Quest)))
-		e++;
+	else if ((ie = itype_end(e, IIDENT, 0)) != e) {
+	    do {
+		e = ie;
+		if (comppatmatch && *comppatmatch &&
+		    (*e == Star || *e == Quest))
+		    ie = e + 1;
+		else
+		    ie = itype_end(e, IIDENT, 0);
+	    } while (ie != e);
+	}
 
 	/* Now make sure that the cursor is inside the name. */
 	if (offs <= e - s && offs >= b - s && n <= 0) {
Index: Src/Zle/zle_tricky.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_tricky.c,v
retrieving revision 1.67
diff -u -r1.67 zle_tricky.c
--- Src/Zle/zle_tricky.c	30 May 2006 22:35:04 -0000	1.67
+++ Src/Zle/zle_tricky.c	10 Jul 2006 12:49:32 -0000
@@ -551,9 +551,8 @@
 	else if (idigit(*e))
 	    while (idigit(*e))
 		e++;
-	else if (iident(*e))
-	    while (iident(*e))
-		e++;
+	else
+	    e = itype_end(e, IIDENT, 0);
 
 	/* Now make sure that the cursor is inside the name. */
 	if (offs <= e - s && offs >= b - s && n <= 0) {
@@ -740,8 +739,7 @@
 			    else if (idigit(*q))
 				do q++; while (idigit(*q));
 			    else
-				while (iident(*q))
-				    q++;
+				q = itype_end(q, IIDENT, 0);
 			    sav = *q;
 			    *q = '\0';
 			    if (zlemetacs - wb == q - s &&
@@ -1293,7 +1291,7 @@
 	if (varq)
 	    tt = clwords[clwpos];
 
-	for (s = tt; iident(*s); s++);
+	s = itype_end(tt, IIDENT, 0);
 	sav = *s;
 	*s = '\0';
 	zsfree(varname);
@@ -1360,17 +1358,29 @@
      * as being in math.                                              */
     if (inwhat != IN_MATH) {
 	int i = 0;
-	char *nnb = (iident(*s) ? s : s + 1), *nb = NULL, *ne = NULL;
-	
-	for (tt = s; ++tt < s + zlemetacs - wb;)
+	char *nnb, *nb = NULL, *ne = NULL;
+
+	MB_METACHARINIT();
+	if (itype_end(s, IIDENT, 1) == s)
+	    nnb = s + MB_METACHARLEN(s);
+	else
+	    nnb = s;
+	for (tt = s; tt < s + zlemetacs - wb;) {
 	    if (*tt == Inbrack) {
 		i++;
 		nb = nnb;
 		ne = tt;
-	    } else if (i && *tt == Outbrack)
+		tt++;
+	    } else if (i && *tt == Outbrack) {
 		i--;
-	    else if (!iident(*tt))
-		nnb = tt + 1;
+		tt++;
+	    } else {
+		int nclen = MB_METACHARLEN(tt);
+		if (itype_end(tt, IIDENT, 1) == tt)
+		    nnb = tt + nclen;
+		tt += nclen;
+	    }
+	}
 	if (i) {
 	    inwhat = IN_MATH;
 	    insubscr = 1;
@@ -1415,33 +1425,59 @@
 	    /* In mathematical expression, we complete parameter names  *
 	     * (even if they don't have a `$' in front of them).  So we *
 	     * have to find that name.                                  */
-	    for (we = zlemetacs; iident(zlemetaline[we]); we++);
-	    for (wb = zlemetacs; --wb >= 0 && iident(zlemetaline[wb]););
-	    wb++;
+	    char *cspos = zlemetaline + zlemetacs, *wptr, *cptr;
+	    we = itype_end(cspos, IIDENT, 0) - cspos;
+
+	    /*
+	     * With multibyte characters we need to go forwards,
+	     * so start at the beginning of the line and continue
+	     * until cspos.
+	     */
+	    wptr = cptr = zlemetaline;
+	    for (;;) {
+		cptr = itype_end(wptr, IIDENT, 0);
+		if (cptr == wptr) {
+		    /* not an ident character */
+		    wptr = (cptr += MB_METACHARLEN(cptr));
+		}
+		if (cptr >= cspos) {
+		    wb = wptr - zlemetaline;
+		    break;
+		}
+	    }
 	}
 	zsfree(s);
 	s = zalloc(we - wb + 1);
 	strncpy(s, zlemetaline + wb, we - wb);
 	s[we - wb] = '\0';
-	if (wb > 2 && zlemetaline[wb - 1] == '[' &&
-	    iident(zlemetaline[wb - 2])) {
-	    int i = wb - 3;
-	    char sav = zlemetaline[wb - 1];
 
-	    while (i >= 0 && iident(zlemetaline[i]))
-		i--;
+	if (wb > 2 && zlemetaline[wb - 1] == '[') {
+	    char *sqbr = zlemetaline + wb - 1, *cptr, *wptr;
 
-	    zlemetaline[wb - 1] = '\0';
-	    zsfree(varname);
-	    varname = ztrdup(zlemetaline + i + 1);
-	    zlemetaline[wb - 1] = sav;
-	    if ((keypm = (Param) paramtab->getnode(paramtab, varname)) &&
-		(keypm->node.flags & PM_HASHED)) {
-		if (insubscr != 3)
-		    insubscr = 2;
-	    } else
-		insubscr = 1;
+	    /* Need to search forward for word characters */
+	    cptr = wptr = zlemetaline;
+	    for (;;) {
+		cptr = itype_end(wptr, IIDENT, 0);
+		if (cptr == wptr) {
+		    /* not an ident character */
+		    wptr = (cptr += MB_METACHARLEN(cptr));
+		}
+		if (cptr >= sqbr)
+		    break;
+	    }
+
+	    if (wptr < sqbr) {
+		zsfree(varname);
+		varname = ztrduppfx(wptr, sqbr - wptr);
+		if ((keypm = (Param) paramtab->getnode(paramtab, varname)) &&
+		    (keypm->node.flags & PM_HASHED)) {
+		    if (insubscr != 3)
+			insubscr = 2;
+		} else
+		    insubscr = 1;
+	    }
 	}
+
 	parse_subst_string(s);
     }
     /* This variable will hold the current word in quoted form. */
@@ -1562,12 +1598,12 @@
 			*tp == '@')
 			p++, i++;
 		    else {
+			char *ie;
 			if (idigit(*tp))
 			    while (idigit(*tp))
 				tp++;
-			else if (iident(*tp))
-			    while (iident(*tp))
-				tp++;
+			else if ((ie = itype_end(tp, IIDENT, 0)) != tp)
+			    tp = ie;
 			else {
 			    tt = NULL;
 			    break;
Index: Test/D07multibyte.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D07multibyte.ztst,v
retrieving revision 1.4
diff -u -r1.4 D07multibyte.ztst
--- Test/D07multibyte.ztst	30 Jun 2006 09:41:35 -0000	1.4
+++ Test/D07multibyte.ztst	10 Jul 2006 12:49:32 -0000
@@ -165,3 +165,12 @@
 >165
 >163
 >945 945
+
+  unsetopt posix_identifiers
+  expr='hÃ¤hÃ¤=3 || exit 1; print $hÃ¤hÃ¤'
+  eval $expr
+  setopt posix_identifiers
+  (eval $expr)
+1:POSIX_IDENTIFIERS option
+>3
+?(eval):1: command not found: hÃ¤hÃ¤=3

-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070
References:
- Questions about character types
  - From: Peter Stephenson
- Re: Questions about character types
  - From: Bart Schaefer
Messages sorted by: Reverse Date, Date, Thread, Author