Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Questions about character types
- X-seq: zsh-workers 22544
- From: Peter Stephenson <pws@xxxxxxx>
- To: zsh-workers@xxxxxxxxxx (Zsh hackers list)
- Subject: Re: Questions about character types
- Date: Mon, 10 Jul 2006 13:51:23 +0100
- In-reply-to: <060705093122.ZM1165@xxxxxxxxxxxxxxxxxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
- References: <200607051214.k65CEHQ2011360@xxxxxxxxxxxxxx> <060705093122.ZM1165@xxxxxxxxxxxxxxxxxxxxxx>
Bart Schaefer wrote:
> } Another question is what to do with user names. Currently these are
> } just the ASCII identifier characters plus "-". Is it useful to extend
> } these to include alphanumeric characters from the local character set?
>
> My impression is that it would be, but some non-English-speakers should
> weigh in.
I've allowed it; it's particularly useful if you're using the extended
parameter naming rules and the reference is to a named directory.
> } Finally, I failed to interpret this code from math.c:
> }
> } if (*ptr == '+' && (unary || !ialnum(*ptr))) {
> } ptr++;
>
> I suspect that's a bug. It probably originally said
>
> if (*ptr++ == '+' && (unary || !ialnum(*ptr))) {
>
> but someone realized it was wrong to increment the pointer if it was NOT
> equal to plus, and made an incomplete fix.
I've just assumed this is "&& 1" which is how it's evaluated as far back
as the CVS archive goes without any obvious problems, and hence removed
it.
Here is the patch. The MULTIBYTE and POSIX_IDENTIFIERS options should
be respected whenever necessary when testing character types.
There's one remaining big job: I have not yet fixed up IFS to handle
multibyte characters (also isep() and ISEP macros). That looks a little
messy in places.
The other remaining cases I'm aware of where we still don't test for
multibyte characters should be harmless:
- Some idigit()s. I don't see any good reason for allowing active
multibyte digit characters in numerical expressions (for example,
extra width digits), so anywhere a real digit (rather than just
a printable character that happens to look like a digit) is required
it must be 0 to 9 from the portable character set.
- Likewise some iblank()s when inputting text. Whitespace has to
be portable whitespace.
- One ialpha() when checking options to builtins, since
all option letters come from the portable character set.
Index: README
===================================================================
RCS file: /cvsroot/zsh/zsh/README,v
retrieving revision 1.33
diff -u -r1.33 README
--- README 26 Jun 2006 09:57:17 -0000 1.33
+++ README 10 Jul 2006 12:49:17 -0000
@@ -50,11 +50,23 @@
subsequently by the user. It is valid for the variable to be unset.
Zsh has previously been lax about whether it allows octets with the
-top bit set to be part of a shell identifier. With --enable-multibyte set,
-this is now completely disabled. This is a temporary fix until the main
-shell handles multibyte characters properly and the appropriate library
-tests can be used. This change may be reviewed if no such permanent fix
-is forthcoming.
+top bit set to be part of a shell identifier. Older versions of the shell
+assumed all such octets were allowed in identifiers, however the POSIX
+standard does not allow such characters in identifiers. The older
+behaviour is still obtained with --disable-multibyte in effect.
+With --enable-multibyte set there are three possible cases:
+ MULTIBYTE option unset: only ASCII characters are allowed; the
+ shell does not attempt to identify non-ASCII characters at all.
+ MULTIBYTE option set, POSIX_IDENTIFIERS option unset: in addition
+ to the POSIX characters, any alphanumeric characters in the
+ local character set are allowed. Note that scripts and functions that
+ take advantage of this are non-portable; however, this is in the spirit
+ of previous versions of the shell. Note also that the options must
+ be set before the shell parses the script or function; setting
+ them during execution is not sufficient.
+ MULITBYTE option set, POSIX_IDENTIFIERS set: only ASCII characters
+ are allowed in identifiers even though the shell will recognise
+ alphanumeric multibyte characters.
The completion style pine-directory must now be set to use completion
for PINE mailbox folders; previously it had the default ~/mail. This
Index: Doc/Zsh/options.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/options.yo,v
retrieving revision 1.46
diff -u -r1.46 options.yo
--- Doc/Zsh/options.yo 9 Apr 2006 21:47:22 -0000 1.46
+++ Doc/Zsh/options.yo 10 Jul 2006 12:49:18 -0000
@@ -1204,6 +1204,27 @@
tt(trap) and
tt(unset).
)
+pindex(POSIX_IDENTIFIERS)
+cindex(identifiers, non-portable characters in)
+cindex(parameter names, non-portable characters in)
+item(tt(POSIX_IDENTIFIERS) <K> <S>)(
+When this option is set, only the ASCII characters tt(a) to tt(z), tt(A) to
+tt(Z), tt(0) to tt(9) and tt(_) may be used in identifiers (names
+of shell parameters and modules).
+
+When the option is unset and multibyte character support is enabled (i.e. it
+is compiled in and the option tt(MULTIBYTE) is set), then additionally any
+alphanumeric characters in the local character set may be used in
+identifiers. Note that scripts and functions written with this feature are
+not portable, and also that both options must be set before the script
+or function is parsed; setting them during execution is not sufficient
+as the syntax var(variable)tt(=)var(value) has already been parsed as
+a command rather than an assignment.
+
+If multibyte character support is not compiled into the shell this option is
+ignored; all octets with the top bit set may be used in identifiers.
+This is non-standard but is the traditional zsh behaviour.
+)
pindex(SH_FILE_EXPANSION)
cindex(sh, expansion style)
cindex(expansion style, sh)
Index: Src/builtin.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/builtin.c,v
retrieving revision 1.158
diff -u -r1.158 builtin.c
--- Src/builtin.c 30 May 2006 22:35:03 -0000 1.158
+++ Src/builtin.c 10 Jul 2006 12:49:20 -0000
@@ -2629,9 +2629,7 @@
char *modname = NULL;
char *ptr;
- for (ptr = funcname; *ptr; ptr++)
- if (!iident(*ptr))
- break;
+ ptr = itype_end(funcname, IIDENT, 0);
if (idigit(*funcname) || funcname == ptr || *ptr) {
zwarnnam(name, "-M %s: bad math function name", funcname);
return 1;
Index: Src/glob.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/glob.c,v
retrieving revision 1.51
diff -u -r1.51 glob.c
--- Src/glob.c 30 May 2006 22:35:03 -0000 1.51
+++ Src/glob.c 10 Jul 2006 12:49:21 -0000
@@ -1443,9 +1443,7 @@
if (s[-1] == '+') {
plus = 0;
- tt = s;
- while (iident(*tt))
- tt++;
+ tt = itype_end(s, IIDENT, 0);
if (tt == s)
{
zerr("missing identifier after `+'");
Index: Src/lex.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/lex.c,v
retrieving revision 1.33
diff -u -r1.33 lex.c
--- Src/lex.c 30 May 2006 22:35:03 -0000 1.33
+++ Src/lex.c 10 Jul 2006 12:49:22 -0000
@@ -1135,10 +1135,13 @@
if (idigit(*t))
while (++t < bptr && idigit(*t));
else {
- while (iident(*t) && ++t < bptr);
+ int sav = *bptr;
+ *bptr = '\0';
+ t = itype_end(t, IIDENT, 0);
if (t < bptr) {
- *bptr = '\0';
skipparens(Inbrack, Outbrack, &t);
+ } else {
+ *bptr = sav;
}
}
if (*t == '+')
Index: Src/math.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/math.c,v
retrieving revision 1.25
diff -u -r1.25 math.c
--- Src/math.c 30 Jun 2006 09:41:35 -0000 1.25
+++ Src/math.c 10 Jul 2006 12:49:22 -0000
@@ -265,11 +265,12 @@
{
int cct = 0;
yyval.type = MN_INTEGER;
+ char *ie;
for (;; cct = 0)
switch (*ptr++) {
case '+':
- if (*ptr == '+' && (unary || !ialnum(*ptr))) {
+ if (*ptr == '+') {
ptr++;
return (unary) ? PREPLUS : POSTPLUS;
}
@@ -279,7 +280,7 @@
}
return (unary) ? UPLUS : PLUS;
case '-':
- if (*ptr == '-' && (unary || !ialnum(*ptr))) {
+ if (*ptr == '-') {
ptr++;
return (unary) ? PREMINUS : POSTMINUS;
}
@@ -469,12 +470,12 @@
}
cct = 1;
}
- if (iident(*ptr)) {
+ if ((ie = itype_end(ptr, IIDENT, 0)) != ptr) {
int func = 0;
char *p;
p = ptr;
- while (iident(*++ptr));
+ ptr = ie;
if (*ptr == '[' || (!cct && *ptr == '(')) {
char op = *ptr, cp = ((*ptr == '[') ? ']' : ')');
int l;
Index: Src/module.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/module.c,v
retrieving revision 1.22
diff -u -r1.22 module.c
--- Src/module.c 30 May 2006 22:35:03 -0000 1.22
+++ Src/module.c 10 Jul 2006 12:49:23 -0000
@@ -734,12 +734,8 @@
modname_ok(char const *p)
{
do {
- if(*p != '_' && !ialnum(*p))
- return 0;
- do {
- p++;
- } while(*p == '_' || ialnum(*p));
- if(!*p)
+ p = itype_end(p, IIDENT, 0);
+ if (!*p)
return 1;
} while(*p++ == '/');
return 0;
Index: Src/options.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/options.c,v
retrieving revision 1.28
diff -u -r1.28 options.c
--- Src/options.c 30 May 2006 22:35:03 -0000 1.28
+++ Src/options.c 10 Jul 2006 12:49:23 -0000
@@ -176,6 +176,7 @@
{{NULL, "overstrike", 0}, OVERSTRIKE},
{{NULL, "pathdirs", OPT_EMULATE}, PATHDIRS},
{{NULL, "posixbuiltins", OPT_EMULATE|OPT_BOURNE}, POSIXBUILTINS},
+{{NULL, "posixidentifiers", OPT_EMULATE|OPT_BOURNE}, POSIXIDENTIFIERS},
{{NULL, "printeightbit", 0}, PRINTEIGHTBIT},
{{NULL, "printexitvalue", 0}, PRINTEXITVALUE},
{{NULL, "privileged", OPT_SPECIAL}, PRIVILEGED},
Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.116
diff -u -r1.116 params.c
--- Src/params.c 27 Jun 2006 16:28:46 -0000 1.116
+++ Src/params.c 10 Jul 2006 12:49:24 -0000
@@ -899,9 +899,7 @@
break;
} else {
/* Find the first character in `s' not in the iident type table */
- for (ss = s; *ss; ss++)
- if (!iident(*ss))
- break;
+ ss = itype_end(s, IIDENT, 0);
}
/* If the next character is not [, then it is *
@@ -1653,7 +1651,7 @@
mod_export Value
fetchvalue(Value v, char **pptr, int bracks, int flags)
{
- char *s, *t;
+ char *s, *t, *ie;
char sav, c;
int ppar = 0;
@@ -1665,9 +1663,8 @@
else
ppar = *s++ - '0';
}
- else if (iident(c))
- while (iident(*s))
- s++;
+ else if ((ie = itype_end(s, IIDENT, 0)) != s)
+ s = ie;
else if (c == Quest)
*s++ = '?';
else if (c == Pound)
@@ -1732,7 +1729,7 @@
return v;
}
} else if (!(flags & SCANPM_ASSIGNING) && v->isarr &&
- iident(*t) && isset(KSHARRAYS))
+ itype_end(t, IIDENT, 1) != t && isset(KSHARRAYS))
v->end = 1, v->isarr = 0;
}
if (!bracks && *s)
Index: Src/parse.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/parse.c,v
retrieving revision 1.56
diff -u -r1.56 parse.c
--- Src/parse.c 9 Jul 2006 14:47:22 -0000 1.56
+++ Src/parse.c 10 Jul 2006 12:49:26 -0000
@@ -1603,10 +1603,7 @@
if (*ptr == Outbrace && ptr > tokstr + 1)
{
- while (--ptr > tokstr)
- if (!iident(*ptr))
- break;
- if (ptr == tokstr)
+ if (itype_end(tokstr, IIDENT, 0) >= ptr - 1)
{
char *toksave = tokstr;
char *idstring = dupstrpfx(tokstr+1, eptr-tokstr-1);
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.53
diff -u -r1.53 subst.c
--- Src/subst.c 28 Jun 2006 14:34:27 -0000 1.53
+++ Src/subst.c 10 Jul 2006 12:49:28 -0000
@@ -475,15 +475,14 @@
return 0;
*namptr = dyncat(ds, ptr);
return 1;
- } else if (iuser(str[1])) { /* ~foo */
- char *ptr, *hom, save;
+ } else if ((ptr = itype_end(str+1, IUSER, 0)) != str+1) { /* ~foo */
+ char *hom, save;
- for (ptr = ++str; *ptr && iuser(*ptr); ptr++);
save = *ptr;
if (!isend(save))
return 0;
*ptr = 0;
- if (!(hom = getnameddir(str))) {
+ if (!(hom = getnameddir(++str))) {
if (isset(NOMATCH))
zerr("no such user or named directory: %s", str);
*ptr = save;
@@ -1146,9 +1145,10 @@
* Shouldn't this be a table or something? We test for all
* these later on, too.
*/
- if (!ialnum(c = *s) && c != '#' && c != Pound && c != '-' &&
- c != '!' && c != '$' && c != String && c != Qstring &&
- c != '?' && c != Quest && c != '_' &&
+ c = *s;
+ if (itype_end(s, IIDENT, 1) == s && *s != '#' && c != Pound &&
+ c != '-' && c != '!' && c != '$' && c != String && c != Qstring &&
+ c != '?' && c != Quest &&
c != '*' && c != Star && c != '@' && c != '{' &&
c != Inbrace && c != '=' && c != Equals && c != Hat &&
c != '^' && c != '~' && c != Tilde && c != '+') {
@@ -1446,8 +1446,8 @@
} else
spbreak = 2;
} else if ((c == '#' || c == Pound) &&
- (iident(cc = s[1])
- || cc == '*' || cc == Star || cc == '@'
+ (itype_end(s+1, IIDENT, 0) != s + 1
+ || (cc = s[1]) == '*' || cc == Star || cc == '@'
|| cc == '-' || (cc == ':' && s[2] == '-')
|| (isstring(cc) && (s[2] == Inbrace || s[2] == Inpar)))) {
getlen = 1 + whichlen, s++;
@@ -1471,7 +1471,7 @@
* Try to handle this when parameter is named
* by (P) (second part of test).
*/
- if (iident(s[1]) || (aspar && isstring(s[1]) &&
+ if (itype_end(s+1, IIDENT, 0) != s+1 || (aspar && isstring(s[1]) &&
(s[2] == Inbrace || s[2] == Inpar)))
chkset = 1, s++;
else if (!inbrace) {
Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.126
diff -u -r1.126 utils.c
--- Src/utils.c 30 Jun 2006 09:41:35 -0000 1.126
+++ Src/utils.c 10 Jul 2006 12:49:29 -0000
@@ -1921,7 +1921,7 @@
return;
if (**s == String && !*t) {
guess = *s + 1;
- if (*t || !ialpha(*guess))
+ if (itype_end(guess, IIDENT, 1) == guess)
return;
ic = String;
d = 100;
@@ -2750,11 +2750,8 @@
* iident() macro extended to support wide characters.
*
* The macro is intended to test if a character is allowed in an
- * internal zsh identifier. Until the main shell handles multibyte
- * characters it's not a good idea to allow characters other than
- * ASCII characters; it would cause zle to allow characters that
- * the main shell would reject. Eventually we should be able
- * to allow all alphanumerics.
+ * internal zsh identifier. We allow all alphanumerics outside
+ * the ASCII range unless POSIXIDENTIFIERS is set.
*
* Otherwise similar to wcsiword.
*/
@@ -2774,14 +2771,90 @@
} else if (len == 1 && iascii(*outstr)) {
return iident(*outstr);
} else {
- /* TODO: not currently allowed, see above */
- return 0;
+ return !isset(POSIXIDENTIFIERS) && iswalnum(c);
}
}
/**/
#endif
+/*
+ * Find the end of a set of characters in the set specified by itype;
+ * one of IALNUM, IIDENT, IWORD or IUSER. For non-ASCII characters, we assume
+ * alphanumerics are part of the set, with the exception that
+ * identifiers are not treated that way if POSIXIDENTIFIERS is set.
+ *
+ * See notes above for identifiers.
+ * Returns the same pointer as passed if not on an identifier character.
+ * If "once" is set, just test the first character, i.e. (outptr !=
+ * inptr) tests whether the first character is valid in an identifier.
+ *
+ * Currently this is only called with itype IIDENT or IUSER.
+ */
+
+/**/
+mod_export char *
+itype_end(const char *ptr, int itype, int once)
+{
+#ifdef MULTIBYTE_SUPPORT
+ if (isset(MULTIBYTE) &&
+ (itype != IIDENT || !isset(POSIXIDENTIFIERS))) {
+ mb_metacharinit();
+ while (*ptr) {
+ wint_t wc;
+ int len = mb_metacharlenconv(ptr, &wc);
+
+ if (!len)
+ break;
+
+ if (wc == WEOF) {
+ /* invalid, treat as single character */
+ int chr = STOUC(*ptr == Meta ? ptr[1] ^ 32 : *ptr);
+ /* in this case non-ASCII characters can't match */
+ if (chr > 127 || !zistype(chr,itype))
+ break;
+ } else if (len == 1 && iascii(*ptr)) {
+ /* ASCII: can't be metafied, use standard test */
+ if (!zistype(*ptr,itype))
+ break;
+ } else {
+ /*
+ * Valid non-ASCII character. Allow all alphanumerics;
+ * if testing for words, allow all wordchars.
+ */
+ if (!(iswalnum(wc) ||
+ (itype == IWORD && wcschr(wordchars_wide, wc))))
+ break;
+ }
+ ptr += len;
+
+ if (once)
+ break;
+ }
+ } else
+#endif
+ for (;;) {
+ int chr = STOUC(*ptr == Meta ? ptr[1] ^ 32 : *ptr);
+ if (!zistype(chr,itype))
+ break;
+ ptr += (*ptr == Meta) ? 2 : 1;
+
+ if (once)
+ break;
+ }
+
+ /*
+ * Nasty. The first argument is const char * because we
+ * don't modify it here. However, we really want to pass
+ * back the same type as was passed down, to allow idioms like
+ * p = itype_end(p, IIDENT, 0);
+ * So returning a const char * isn't really the right thing to do.
+ * Without having two different functions the following seems
+ * to be the best we can do.
+ */
+ return (char *)ptr;
+}
+
/**/
mod_export char **
arrdup(char **s)
@@ -3710,9 +3783,10 @@
/**/
int
-mb_metacharlenconv(char *s, wint_t *wcp)
+mb_metacharlenconv(const char *s, wint_t *wcp)
{
- char inchar, *ptr;
+ char inchar;
+ const char *ptr;
size_t ret;
wchar_t wc;
Index: Src/zsh.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v
retrieving revision 1.92
diff -u -r1.92 zsh.h
--- Src/zsh.h 9 Jul 2006 14:47:22 -0000 1.92
+++ Src/zsh.h 10 Jul 2006 12:49:30 -0000
@@ -1610,6 +1610,7 @@
OVERSTRIKE,
PATHDIRS,
POSIXBUILTINS,
+ POSIXIDENTIFIERS,
PRINTEIGHTBIT,
PRINTEXITVALUE,
PRIVILEGED,
Index: Src/ztype.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/ztype.h,v
retrieving revision 1.3
diff -u -r1.3 ztype.h
--- Src/ztype.h 1 Nov 2005 02:50:22 -0000 1.3
+++ Src/ztype.h 10 Jul 2006 12:49:30 -0000
@@ -42,22 +42,22 @@
#define IMETA (1 << 12)
#define IWSEP (1 << 13)
#define INULL (1 << 14)
-#define _icom(X,Y) (typtab[STOUC(X)] & Y)
-#define idigit(X) _icom(X,IDIGIT)
-#define ialnum(X) _icom(X,IALNUM)
-#define iblank(X) _icom(X,IBLANK) /* blank, not including \n */
-#define inblank(X) _icom(X,INBLANK) /* blank or \n */
-#define itok(X) _icom(X,ITOK)
-#define isep(X) _icom(X,ISEP)
-#define ialpha(X) _icom(X,IALPHA)
-#define iident(X) _icom(X,IIDENT)
-#define iuser(X) _icom(X,IUSER) /* username char */
-#define icntrl(X) _icom(X,ICNTRL)
-#define iword(X) _icom(X,IWORD)
-#define ispecial(X) _icom(X,ISPECIAL)
-#define imeta(X) _icom(X,IMETA)
-#define iwsep(X) _icom(X,IWSEP)
-#define inull(X) _icom(X,INULL)
+#define zistype(X,Y) (typtab[STOUC(X)] & Y)
+#define idigit(X) zistype(X,IDIGIT)
+#define ialnum(X) zistype(X,IALNUM)
+#define iblank(X) zistype(X,IBLANK) /* blank, not including \n */
+#define inblank(X) zistype(X,INBLANK) /* blank or \n */
+#define itok(X) zistype(X,ITOK)
+#define isep(X) zistype(X,ISEP)
+#define ialpha(X) zistype(X,IALPHA)
+#define iident(X) zistype(X,IIDENT)
+#define iuser(X) zistype(X,IUSER) /* username char */
+#define icntrl(X) zistype(X,ICNTRL)
+#define iword(X) zistype(X,IWORD)
+#define ispecial(X) zistype(X,ISPECIAL)
+#define imeta(X) zistype(X,IMETA)
+#define iwsep(X) zistype(X,IWSEP)
+#define inull(X) zistype(X,INULL)
#define iascii(X) isascii(STOUC(X))
#define ilower(X) islower(STOUC(X))
Index: Src/Zle/compcore.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/compcore.c,v
retrieving revision 1.83
diff -u -r1.83 compcore.c
--- Src/Zle/compcore.c 7 Mar 2006 12:52:28 -0000 1.83
+++ Src/Zle/compcore.c 10 Jul 2006 12:49:31 -0000
@@ -1081,7 +1081,7 @@
}
if ((*p == String || *p == Qstring) && p[1] != Inpar && p[1] != Inbrack) {
/* This is really a parameter expression (not $(...) or $[...]). */
- char *b = p + 1, *e = b;
+ char *b = p + 1, *e = b, *ie;
int n = 0, br = 1, nest = 0;
if (*b == Inbrace) {
@@ -1124,10 +1124,16 @@
else if (idigit(*e))
while (idigit(*e))
e++;
- else if (iident(*e))
- while (iident(*e) ||
- (comppatmatch && *comppatmatch && (*e == Star || *e == Quest)))
- e++;
+ else if ((ie = itype_end(e, IIDENT, 0)) != e) {
+ do {
+ e = ie;
+ if (comppatmatch && *comppatmatch &&
+ (*e == Star || *e == Quest))
+ ie = e + 1;
+ else
+ ie = itype_end(e, IIDENT, 0);
+ } while (ie != e);
+ }
/* Now make sure that the cursor is inside the name. */
if (offs <= e - s && offs >= b - s && n <= 0) {
Index: Src/Zle/zle_tricky.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_tricky.c,v
retrieving revision 1.67
diff -u -r1.67 zle_tricky.c
--- Src/Zle/zle_tricky.c 30 May 2006 22:35:04 -0000 1.67
+++ Src/Zle/zle_tricky.c 10 Jul 2006 12:49:32 -0000
@@ -551,9 +551,8 @@
else if (idigit(*e))
while (idigit(*e))
e++;
- else if (iident(*e))
- while (iident(*e))
- e++;
+ else
+ e = itype_end(e, IIDENT, 0);
/* Now make sure that the cursor is inside the name. */
if (offs <= e - s && offs >= b - s && n <= 0) {
@@ -740,8 +739,7 @@
else if (idigit(*q))
do q++; while (idigit(*q));
else
- while (iident(*q))
- q++;
+ q = itype_end(q, IIDENT, 0);
sav = *q;
*q = '\0';
if (zlemetacs - wb == q - s &&
@@ -1293,7 +1291,7 @@
if (varq)
tt = clwords[clwpos];
- for (s = tt; iident(*s); s++);
+ s = itype_end(tt, IIDENT, 0);
sav = *s;
*s = '\0';
zsfree(varname);
@@ -1360,17 +1358,29 @@
* as being in math. */
if (inwhat != IN_MATH) {
int i = 0;
- char *nnb = (iident(*s) ? s : s + 1), *nb = NULL, *ne = NULL;
-
- for (tt = s; ++tt < s + zlemetacs - wb;)
+ char *nnb, *nb = NULL, *ne = NULL;
+
+ MB_METACHARINIT();
+ if (itype_end(s, IIDENT, 1) == s)
+ nnb = s + MB_METACHARLEN(s);
+ else
+ nnb = s;
+ for (tt = s; tt < s + zlemetacs - wb;) {
if (*tt == Inbrack) {
i++;
nb = nnb;
ne = tt;
- } else if (i && *tt == Outbrack)
+ tt++;
+ } else if (i && *tt == Outbrack) {
i--;
- else if (!iident(*tt))
- nnb = tt + 1;
+ tt++;
+ } else {
+ int nclen = MB_METACHARLEN(tt);
+ if (itype_end(tt, IIDENT, 1) == tt)
+ nnb = tt + nclen;
+ tt += nclen;
+ }
+ }
if (i) {
inwhat = IN_MATH;
insubscr = 1;
@@ -1415,33 +1425,59 @@
/* In mathematical expression, we complete parameter names *
* (even if they don't have a `$' in front of them). So we *
* have to find that name. */
- for (we = zlemetacs; iident(zlemetaline[we]); we++);
- for (wb = zlemetacs; --wb >= 0 && iident(zlemetaline[wb]););
- wb++;
+ char *cspos = zlemetaline + zlemetacs, *wptr, *cptr;
+ we = itype_end(cspos, IIDENT, 0) - cspos;
+
+ /*
+ * With multibyte characters we need to go forwards,
+ * so start at the beginning of the line and continue
+ * until cspos.
+ */
+ wptr = cptr = zlemetaline;
+ for (;;) {
+ cptr = itype_end(wptr, IIDENT, 0);
+ if (cptr == wptr) {
+ /* not an ident character */
+ wptr = (cptr += MB_METACHARLEN(cptr));
+ }
+ if (cptr >= cspos) {
+ wb = wptr - zlemetaline;
+ break;
+ }
+ }
}
zsfree(s);
s = zalloc(we - wb + 1);
strncpy(s, zlemetaline + wb, we - wb);
s[we - wb] = '\0';
- if (wb > 2 && zlemetaline[wb - 1] == '[' &&
- iident(zlemetaline[wb - 2])) {
- int i = wb - 3;
- char sav = zlemetaline[wb - 1];
- while (i >= 0 && iident(zlemetaline[i]))
- i--;
+ if (wb > 2 && zlemetaline[wb - 1] == '[') {
+ char *sqbr = zlemetaline + wb - 1, *cptr, *wptr;
- zlemetaline[wb - 1] = '\0';
- zsfree(varname);
- varname = ztrdup(zlemetaline + i + 1);
- zlemetaline[wb - 1] = sav;
- if ((keypm = (Param) paramtab->getnode(paramtab, varname)) &&
- (keypm->node.flags & PM_HASHED)) {
- if (insubscr != 3)
- insubscr = 2;
- } else
- insubscr = 1;
+ /* Need to search forward for word characters */
+ cptr = wptr = zlemetaline;
+ for (;;) {
+ cptr = itype_end(wptr, IIDENT, 0);
+ if (cptr == wptr) {
+ /* not an ident character */
+ wptr = (cptr += MB_METACHARLEN(cptr));
+ }
+ if (cptr >= sqbr)
+ break;
+ }
+
+ if (wptr < sqbr) {
+ zsfree(varname);
+ varname = ztrduppfx(wptr, sqbr - wptr);
+ if ((keypm = (Param) paramtab->getnode(paramtab, varname)) &&
+ (keypm->node.flags & PM_HASHED)) {
+ if (insubscr != 3)
+ insubscr = 2;
+ } else
+ insubscr = 1;
+ }
}
+
parse_subst_string(s);
}
/* This variable will hold the current word in quoted form. */
@@ -1562,12 +1598,12 @@
*tp == '@')
p++, i++;
else {
+ char *ie;
if (idigit(*tp))
while (idigit(*tp))
tp++;
- else if (iident(*tp))
- while (iident(*tp))
- tp++;
+ else if ((ie = itype_end(tp, IIDENT, 0)) != tp)
+ tp = ie;
else {
tt = NULL;
break;
Index: Test/D07multibyte.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D07multibyte.ztst,v
retrieving revision 1.4
diff -u -r1.4 D07multibyte.ztst
--- Test/D07multibyte.ztst 30 Jun 2006 09:41:35 -0000 1.4
+++ Test/D07multibyte.ztst 10 Jul 2006 12:49:32 -0000
@@ -165,3 +165,12 @@
>165
>163
>945 945
+
+ unsetopt posix_identifiers
+ expr='hähä=3 || exit 1; print $hähä'
+ eval $expr
+ setopt posix_identifiers
+ (eval $expr)
+1:POSIX_IDENTIFIERS option
+>3
+?(eval):1: command not found: hähä=3
--
Peter Stephenson <pws@xxxxxxx> Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070
Messages sorted by:
Reverse Date,
Date,
Thread,
Author