Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: field splitting with empty fields
- X-seq: zsh-users 12149
- From: Peter Stephenson <pws@xxxxxxx>
- To: zsh-users@xxxxxxxxxx
- Subject: Re: field splitting with empty fields
- Date: Tue, 30 Oct 2007 13:50:45 +0000
- In-reply-to: <20071030123025.GD5398@xxxxxxxxxxxxxxx>
- Mailing-list: contact zsh-users-help@xxxxxxxxxx; run by ezmlm
- Organization: CSR
- References: <20071029235835.GA29356@xxxxxxxxxxx> <071029171355.ZM28438@xxxxxxxxxxxxxxxxxxxxxx> <20071030042048.GA32506@xxxxxxxxxxx> <20071030104459.562a77b1@news01> <20071030115827.GC5398@xxxxxxxxxxxxxxx> <200710301204.l9UC4snN010037@xxxxxxxxxxxxxx> <20071030123025.GD5398@xxxxxxxxxxxxxxx>
On Tue, 30 Oct 2007 12:30:25 +0000
Stephane Chazelas <Stephane_Chazelas@xxxxxxxx> wrote:
> Best would probably be to do a quick search for (f) and (s:...:)
> in /usr/share/zsh to see if any of them rely on that.
>
> I can see for instance:
>
> 4.3.4/functions/Completion/Unix/_java_class:for i in "${(s.:.)classpath}"; do
It does look suspiciously like we can't rely on people expecting the
behaviour I'd like them to expect, although in that particular case it
wouldn't matter.
Here's a patch so that an explicit (@) forces it to do the right thing,
together with some documentation and a test. Luckily it's quite a simple
change.
It's not impossible that this breaks something, somewhere, but I don't have
a great deal of sympathy in that case since the code would be entirely
against the spirit of "$@"-style substitution.
Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.82
diff -u -r1.82 expn.yo
--- Doc/Zsh/expn.yo 11 Oct 2007 09:06:20 -0000 1.82
+++ Doc/Zsh/expn.yo 30 Oct 2007 13:07:12 -0000
@@ -960,6 +960,17 @@
characters means that all of them must match in sequence; this differs from
the treatment of two or more characters in the tt(IFS) parameter.
See also the tt(=) flag and the tt(SH_WORD_SPLIT) option.
+
+For historical reasons, the usual behaviour that empty array elements
+are retained inside double quotes is disabled for arrays generated
+by splitting; hence the following:
+
+example(line="one::three"
+print -l "${(s.:.)line}")
+
+produces two lines of output for tt(one) and tt(three) and elides the
+empty field. To override this behaviour, supply the "(@)" flag as well,
+i.e. tt("${(@s.:.)line}").
)
enditem()
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.79
diff -u -r1.79 subst.c
--- Src/subst.c 27 Jun 2007 13:56:11 -0000 1.79
+++ Src/subst.c 30 Oct 2007 13:07:14 -0000
@@ -1260,13 +1260,16 @@
* parameter (the value v) to storing them in val and aval.
* However, sometimes you find v reappearing temporarily.
*
- * The values -1 and 2 are special to isarr. It looks like 2 is
- * some kind of an internal flag to do with whether the array's been
- * copied, in which case I don't know why we don't use the copied
- * flag, but they do both occur close together so they presumably
- * have different effects. The value -1 is used to force us to
- * keep an empty array. It's tested in the YUK chunk (I mean the
- * one explicitly marked as such).
+ * The values -1 and 2 are special to isarr. The value -1 is used
+ * to force us to keep an empty array. It's tested in the YUK chunk
+ * (I mean the one explicitly marked as such). The value 2
+ * indicates an array has come from splitting a scalar. We use
+ * that to override the usual rule that in double quotes we don't
+ * remove empty elements (so "${(s.:):-foo::bar}" produces two
+ * words). This seems to me to be quite the wrong thing to do,
+ * but it looks like code may be relying on it. So we require (@)
+ * as well before we keep the empty fields (look for assignments
+ * like "isarr = nojoin ? 1 : 2").
*/
int isarr = 0;
/*
@@ -2453,7 +2456,7 @@
char *arr[2], **t, **a, **p;
if (spsep || spbreak) {
aval = sepsplit(val, spsep, 0, 1);
- isarr = 2;
+ isarr = nojoin ? 1 : 2;
l = arrlen(aval);
if (l && !*(aval[l-1]))
l--;
@@ -2772,7 +2775,7 @@
else if (!aval[1])
val = aval[0];
else
- isarr = 2;
+ isarr = nojoin ? 1 : 2;
}
if (isarr)
l->list.flags |= LF_ARRAY;
@@ -2974,7 +2977,7 @@
val = getdata(firstnode(list));
else {
aval = hlinklist2array(list, 0);
- isarr = 2;
+ isarr = nojoin ? 1 : 2;
l->list.flags |= LF_ARRAY;
}
copied = 1;
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.28
diff -u -r1.28 D04parameter.ztst
--- Test/D04parameter.ztst 23 Aug 2007 22:04:25 -0000 1.28
+++ Test/D04parameter.ztst 30 Oct 2007 13:07:14 -0000
@@ -942,3 +942,35 @@
>some
>sunny
>day
+
+ foo="line:with::missing::fields:in:it"
+ print -l ${(s.:.)foo}
+0:Removal of empty fields in unquoted splitting
+>line
+>with
+>missing
+>fields
+>in
+>it
+
+ foo="line:with::missing::fields:in:it"
+ print -l "${(s.:.)foo}"
+0:Hacky removal of empty fields in quoted splitting with no "@"
+>line
+>with
+>missing
+>fields
+>in
+>it
+
+ foo="line:with::missing::fields:in:it"
+ print -l "${(@s.:.)foo}"
+0:Retention of empty fields in quoted splitting with "@"
+>line
+>with
+>
+>missing
+>
+>fields
+>in
+>it
--
Peter Stephenson <pws@xxxxxxx> Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070
Messages sorted by:
Reverse Date,
Date,
Thread,
Author