Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

[PATCH (not final)] (take three?) unset "array[$anything]"



On Wed, Jun 2, 2021 at 8:59 AM Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
>
> I've just had a hand-slaps-forehead
> moment ... take 3 to follow in another thread.

What I realized is that for any unset of an array element, the closing
bracket must always be the last character of the argument.  There's no
reason to parse the subscript or skip over matching brackets; if a '['
is found, just make sure the last character is ']' and the subscript
must be everything in between.

% typeset -A ax
% for k in '' '\' '`' '(' '[' ')' ']'; do
for> ax[$k]=$k
for> done
% typeset -p ax
typeset -A ax=( ['']='' ['(']='(' [')']=')' ['[']='[' ['\']='\'
[']']=']' ['`']='`' )
% for k in ${(k)ax}; do
unset "ax[$k]"
done
% typeset -p ax
typeset -A ax=( )

Given this realization, it's easy to make { unset "hash[$key]" } work
"like it always should have".  The trouble comes in with (not)
breaking past workarounds.  Because the current (and theoretically
experimental, though we forgot about that for 5 years) code uses
parse_subscript(), we get a partly-tokenized (as if double-quoted,
actually) string in the cases where backslashes are used to force the
closing bracket to be found.  If those backslashes aren't needed any
more, there's no clean way to ignore them upon untokenize, to get back
to something that actually matches the intended hash key.

The attached not-ready-for-push patch has 4 variations that can be
chosen by #define, currently set up as follows:

#define unset_workers_37914 0
#define unset_hashelem_empty_only 0
#define unset_hashelem_literal 1
#define unset_hashelem_stripquote 0

The first one is just the current code.  The second one allows { unset
"hash[]" } (and gives "invalid subscript" for array[] instead of
"invalid parameter name").  The third one, which I have defined by
default, uses the subscript literally, so if you can do { hash[$k]=v }
you can also do { unset "hash[$k]" } (at least for all cases I
tested).  The fourth one requires { unset "hash[${(q)k}]" } instead,
but I think it otherwise works for all cases.  Both of those also work
for "hash[]".

Therefore I think the best option is to choose one of the latter two,
possibly depending on which one induces the least damage to any
workarounds for the current behavior that are known in the wild,
though aesthetically I'd rather use the literal version.
diff --git a/Src/builtin.c b/Src/builtin.c
index a16fddcb7..ec0376367 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -1933,10 +1933,14 @@ getasg(char ***argvp, LinkList assigns)
     asg.flags = 0;
 
     /* search for `=' */
-    for (; *s && *s != '='; s++);
+    for (; *s && *s != '[' && *s != '=' /* && *s != '+' */; s++);
+    if (s > asg.name && *s == '[') {
+	char *se = parse_subscript(s + 1, 1, ']');
+	if (se && *se == ']') s = se + 1;
+    }
 
     /* found `=', so return with a value */
-    if (*s) {
+    if (*s && *s == '=') {
 	*s = '\0';
 	asg.value.scalar = s + 1;
     } else {
@@ -3724,6 +3728,11 @@ bin_unset(char *name, char **argv, Options ops, int func)
     while ((s = *argv++)) {
 	char *ss = strchr(s, '['), *subscript = 0;
 	if (ss) {
+#define unset_workers_37914 0
+#define unset_hashelem_empty_only 0
+#define unset_hashelem_literal 1
+#define unset_hashelem_stripquote 0
+#if unset_workers_37914
 	    char *sse;
 	    *ss = 0;
 	    if ((sse = parse_subscript(ss+1, 1, ']'))) {
@@ -3733,6 +3742,47 @@ bin_unset(char *name, char **argv, Options ops, int func)
 		remnulargs(subscript);
 		untokenize(subscript);
 	    }
+#elif unset_hashelem_empty_only
+	    char *sse;
+	    *ss = 0;
+	    if (ss[1] == ']' && !ss[2] ? (sse = ss+1) :
+		(sse = parse_subscript(ss+1, 1, ']'))) {
+		*sse = 0;
+		subscript = dupstring(ss+1);
+		*sse = ']';
+		remnulargs(subscript);
+		untokenize(subscript);
+	    }
+#else
+	    char *sse = ss + strlen(ss)-1;
+	    *ss = 0;
+	    if (*sse == ']') {
+# if unset_hashelem_literal
+		*sse = 0;
+		subscript = dupstring(ss+1);
+		*sse = ']';
+# elif unset_hashelem_stripquote
+		int ne = noerrs;
+		noerrs = 2;
+		*ss = 0;
+		*sse = 0;
+		subscript = dupstring(ss+1);
+		*sse = ']';
+		/*
+		 * parse_subst_string() removes one level of quoting.
+		 * If it returns nonzero, substring is unchanged, else
+		 * it has been re-tokenized in place, so clean it up.
+		 */
+		if (!parse_subst_string(subscript)) {
+		    remnulargs(subscript);
+		    untokenize(subscript);
+		}
+		noerrs = ne;
+# else
+		XXX parse error XXX
+# endif
+	    }
+#endif
 	}
 	if ((ss && !subscript) || !isident(s)) {
 	    if (ss)


Messages sorted by: Reverse Date, Date, Thread, Author