Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: ${a[(i)pattern]} if a=()
- X-seq: zsh-workers 24731
- From: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
- To: Zsh hackers list <zsh-workers@xxxxxxxxxx>
- Subject: Re: ${a[(i)pattern]} if a=()
- Date: Tue, 25 Mar 2008 17:24:09 +0000
- In-reply-to: <080318084728.ZM12523@xxxxxxxxxxxxxxxxxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
- References: <200803181213.m2ICDULc004081@xxxxxxxxxxxxxxxxxxx> <080318084728.ZM12523@xxxxxxxxxxxxxxxxxxxxxx>
On Tue, 18 Mar 2008 08:47:28 -0700
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> Looking at documenation for this, I was reminded about this recent bit:
>
> Note that in subscripts with both `r' and `R' pattern characters
> are active even if they were substituted for a parameter
> (regardless of the setting of GLOB_SUBST which controls this
> feature in normal pattern matching). It is therefore necessary to
> quote pattern characters for an exact string match.
>
> Maybe we could press the (e) flag into service here? I haven't looked
> at how hard that would be to do, but it's semantically similar to the
> existing use
Yes, that seems perfectly reasonable, and it was easy to do (except I've
just got back from holiday so it's appeared a week late). It might look
a little bizarre that in one case we untokenize() and in the other case
we tokenize(): you might think we'd need just one or the other. The
difference occurs if the substitution is inside double quotes: if so, we
need to tokenize to do pattern matching, while if not we need to
untokenize to make sure we don't.
It's still necessary to use a parameter as the key to guarantee all
characters are interpreted literally. The issue is that we don't do
full argument parsing on the subscript; it's handled a bit like a
special case of double quoting (but with a different terminator), so
single and double quotes don't have their quoting effect there. I don't
think we want to change this in a hurry.
I noticed meanwhile that the optimization for pattern-character-free
strings was being confused by multibyte mode; the only difference is
speed, so it's unlikely anybody would have noticed.
Index: Doc/Zsh/params.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/params.yo,v
retrieving revision 1.41
diff -u -r1.41 params.yo
--- Doc/Zsh/params.yo 25 Oct 2007 09:33:01 -0000 1.41
+++ Doc/Zsh/params.yo 25 Mar 2008 17:08:38 -0000
@@ -227,16 +227,14 @@
If tt(KSH_ARRAYS) is in effect, the tt(-le) should be replaced by tt(-lt).
Note that in subscripts with both `tt(r)' and `tt(R)' pattern characters
-are active even if they were substituted for a parameter (regardless
-of the setting of tt(GLOB_SUBST) which controls this feature in normal
-pattern matching). It is therefore necessary to quote pattern characters
-for an exact string match. Given a string in tt($key), and assuming
-the tt(EXTENDED_GLOB) option is set, the following is sufficient to
-match an element of an array tt($array) containing exactly the value of
-tt($key):
+are active even if they were substituted for a parameter (regardless of the
+setting of tt(GLOB_SUBST) which controls this feature in normal pattern
+matching). The flag `tt(e)' can be added to inhibit pattern matching. As
+this flag does not inhibit other forms of substitution, care is still
+required; using a parameter to hold the key has the desired effect:
-example(key2=${key//(#m)[\][+LPAR()+RPAR()\\*?#<>~^]/\\$MATCH}
-print ${array[(R)$key2]})
+example(key2='original key'
+print ${array[(Re)$key2]})
)
item(tt(R))(
Like `tt(r)', but gives the last match. For associative arrays, gives
@@ -283,11 +281,15 @@
The delimiter character tt(:) is arbitrary; see above.
)
item(tt(e))(
-This flag has no effect and for ordinary arrays is retained for backward
-compatibility only. For associative arrays, this flag can be used to
-force tt(*) or tt(@) to be interpreted as a single key rather than as a
-reference to all values. This flag may be used on the left side of an
-assignment.
+This flag causes any pattern matching that would be performed on the
+subscript to use plain string matching instead. Hence
+`tt(${array[(re)*]})' matches only the array element whose value is tt(*).
+Note that other forms of substitution such as parameter substitution are
+not inhibited.
+
+This flag can also be used to force tt(*) or tt(@) to be interpreted as
+a single key rather than as a reference to all values. It may be used
+for either purpose on the left side of an assignment.
)
enditem()
Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.141
diff -u -r1.141 params.c
--- Src/params.c 10 Jan 2008 10:25:31 -0000 1.141
+++ Src/params.c 25 Mar 2008 17:08:38 -0000
@@ -1007,7 +1007,7 @@
int hasbeg = 0, word = 0, rev = 0, ind = 0, down = 0, l, i, ishash;
int keymatch = 0, needtok = 0, arglen, len;
char *s = *str, *sep = NULL, *t, sav, *d, **ta, **p, *tt, c;
- zlong num = 1, beg = 0, r = 0;
+ zlong num = 1, beg = 0, r = 0, quote_arg = 0;
Patprog pprog = NULL;
ishash = (v->pm && PM_TYPE(v->pm->node.flags) == PM_HASHED);
@@ -1058,8 +1058,7 @@
sep = "\n";
break;
case 'e':
- /* Compatibility flag with no effect except to prevent *
- * special interpretation by getindex() of `*' or `@'. */
+ quote_arg = 1;
break;
case 'n':
t = get_strarg(++s, &arglen);
@@ -1286,7 +1285,10 @@
}
}
if (!keymatch) {
- tokenize(s);
+ if (quote_arg)
+ untokenize(s);
+ else
+ tokenize(s);
remnulargs(s);
pprog = patcompile(s, 0, NULL);
} else
Index: Src/pattern.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v
retrieving revision 1.41
diff -u -r1.41 pattern.c
--- Src/pattern.c 23 Oct 2007 16:09:10 -0000 1.41
+++ Src/pattern.c 25 Mar 2008 17:08:38 -0000
@@ -511,7 +511,7 @@
if (!(patflags & PAT_ANY)) {
/* Look for a really pure string, with no tokens at all. */
- if (!patglobflags
+ if (!(patglobflags & ~GF_MULTIBYTE)
#ifdef __CYGWIN__
/*
* If the OS treats files case-insensitively and we
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.32
diff -u -r1.32 D04parameter.ztst
--- Test/D04parameter.ztst 11 Mar 2008 10:00:39 -0000 1.32
+++ Test/D04parameter.ztst 25 Mar 2008 17:08:43 -0000
@@ -282,6 +282,7 @@
print ${(P)bar}
0:${(P)...}
>I'm nearly out of my mind with tedium
+#' deconfuse emacs
foo=(I could be watching that programme I recorded)
print ${(o)foo}
@@ -375,6 +376,7 @@
print ${(QX)foo}
1:${(QX)...}
?(eval):2: unmatched "
+# " deconfuse emacs
array=(characters in an array)
print ${(c)#array}
@@ -411,6 +413,7 @@
print ${(pl.10..\x22..X.)foo}
0:${(pl...)...}
>Xresulting """"Xwords roariously """Xpadded
+#" deconfuse emacs
print ${(l.5..X.r.5..Y.)foo}
print ${(l.6..X.r.4..Y.)foo}
@@ -870,6 +873,7 @@
0:Parameters associated with backreferences
>match 12 16 match
>1 1 1
+#' deconfuse emacs
string='and look for a MATCH in here'
if [[ ${(S)string%%(#m)M*H} = "and look for a in here" ]]; then
@@ -1010,3 +1014,36 @@
>fields
>in
>it
+
+ array=('%' '$' 'j' '*' '$foo')
+ print ${array[(i)*]} "${array[(i)*]}"
+ print ${array[(ie)*]} "${array[(ie)*]}"
+ key='$foo'
+ print ${array[(ie)$key]} "${array[(ie)$key]}"
+ key='*'
+ print ${array[(ie)$key]} "${array[(ie)$key]}"
+0:Matching array indices with and without quoting
+>1 1
+>4 4
+>5 5
+>4 4
+
+# Ordering of associative arrays is arbitrary, so we need to use
+# patterns that only match one element.
+ typeset -A assoc_r
+ assoc_r=(star '*' of '*this*' and '!that!' or '(the|other)')
+ print ${(kv)assoc_r[(re)*]}
+ print ${(kv)assoc_r[(re)*this*]}
+ print ${(kv)assoc_r[(re)!that!]}
+ print ${(kv)assoc_r[(re)(the|other)]}
+ print ${(kv)assoc_r[(r)*at*]}
+ print ${(kv)assoc_r[(r)*(ywis|bliss|kiss|miss|this)*]}
+ print ${(kv)assoc_r[(r)(this|that|\(the\|other\))]}
+0:Reverse subscripting associative arrays with literal matching
+>star *
+>of *this*
+>and !that!
+>or (the|other)
+>and !that!
+>of *this*
+>or (the|other)
--
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/
Messages sorted by:
Reverse Date,
Date,
Thread,
Author