Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: PATCH: bash-style substrings & subarrays



On Sun, 21 Nov 2010 17:02:38 +0000
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx> wrote:
> Should ${foo:1} always start 1 character/element beyond the
> first one, regardless which subscripting rules are in use?  I'm now
> inclining in that direction.

Nobody commented but this is the change with some more careful
documentation.

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.123
diff -p -u -r1.123 expn.yo
--- Doc/Zsh/expn.yo	18 Nov 2010 13:57:19 -0000	1.123
+++ Doc/Zsh/expn.yo	23 Nov 2010 11:09:33 -0000
@@ -588,23 +588,29 @@ remove the non-matched elements).
 xitem(tt(${)var(name)tt(:)var(offset)tt(}))
 item(tt(${)var(name)tt(:)var(offset)tt(:)var(length)tt(}))(
 This syntax gives effects similar to parameter subscripting
-in the form tt($)var(name)tt({)var(offset)tt(,)var(end)tt(}) but in
-a form compatible with other shells.
+in the form tt($)var(name)tt({)var(start)tt(,)var(end)tt(}), but is
+compatible with other shells; note that both var(offset) and var(length)
+are interpreted differently from the components of a subscript.
+
+If var(offset) is non-negative, then if the variable var(name) is a
+scalar substitute the contents starting var(offset) characters from the
+first character of the string, and if var(name) is an array substitute
+elements starting var(offset) elements from the first element.  If
+var(length) is given, substitute that many characters or elements,
+otherwise the entire rest of the scalar or array.
+
+A positive var(offset) is always treated as the offset of a character or
+element in var(name) from the first character or element of the array
+(this is different from native zsh subscript notation).  Hence 0
+refers to the first character or element regardless of the setting of
+the option tt(KSH_ARRAYS).
 
-If the variable var(name) is a scalar, substitute the contents
-starting from offset var(offset); if var(name) is an array,
-substitute elements from element var(offset).  If var(length) is
-given, substitute that many characters or elements, otherwise the
-entire rest of the scalar or array.
-
-var(offset) is treated similarly to a parameter subscript:
-the offset of the first character or element in var(name)
-is 0 if the option tt(KSH_ARRAYS) is set, else 1; a negative
-subscript counts backwards so that -1 corresponds to the last
-character or element.
+A negative offset counts backwards from the end of the scalar or array,
+so that -1 corresponds to the last character or element, and so on.
 
 var(length) is always treated directly as a length and hence may not be
-negative.
+negative.  The option tt(MULTIBYTE) is obeyed, i.e. the offset and length
+count multibyte characters where appropriate.
 
 var(offset) and var(length) undergo the same set of shell substitutions
 as for scalar assignment; in addition, they are then subject to arithmetic
@@ -615,19 +621,29 @@ print ${foo: 1 + 2}
 print ${foo:$(( 1 + 2))}
 print ${foo:$(echo 1 + 2)})
 
-all have the same effect.
+all have the same effect, extracting the string starting at the fourth
+character of tt($foo) if the substution would otherwise return a scalar,
+or the array starting at the fourth element if tt($foo) would return an
+array.  Note that with the option tt(KSH_ARRAYS) tt($foo) always returns
+a scalar (regardless of the use of the offset syntax) and a form
+such as tt($foo[*]:3) is required to extract elements of an array named
+tt(foo).
 
-Note that if var(offset) is negative, the tt(-) may not appear immediately
+If var(offset) is negative, the tt(-) may not appear immediately
 after the tt(:) as this indicates the
-tt(${)var(name)tt(:-)var(word)tt(}) form of substitution; a space
+tt(${)var(name)tt(:-)var(word)tt(}) form of substitution.  Instead, a space
 may be inserted before the tt(-).  Furthermore, neither var(offset) nor
 var(length) may begin with an alphabetic character or tt(&) as these are
-used to indicate history-style modifiers.
+used to indicate history-style modifiers.  To substitute a value from a
+variable, the recommended approach is to proceed it with a tt($) as this
+signifies the intention (parameter substitution can easily be rendered
+unreadable); however, as arithmetic substitution is performed, the
+expression tt(${var: offs}) does work, retrieving the offset from
+tt($offs).
 
 For further compatibility with other shells there is a special case
-when the tt(KSH_ARRAYS) option is active, as in emulation of
-Bourne-style shells.  In this case array subscript 0 usually refers to the
-first element of the array.  However, if the substitution refers to the
+for array offset 0.  This usually accesses to the
+first element of the array.  However, if the substitution refers the
 positional parameter array, e.g. tt($@) or tt($*), then offset 0
 instead refers to tt($0), offset 1 refers to tt($1), and so on.  In
 other words, the positional parameter array is effectively extended by
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.111
diff -p -u -r1.111 subst.c
--- Src/subst.c	20 Nov 2010 23:46:26 -0000	1.111
+++ Src/subst.c	23 Nov 2010 11:09:34 -0000
@@ -1640,7 +1640,7 @@ paramsubst(LinkList l, LinkNode n, char 
     int subexp;
     /*
      * If we're referring to the positional parameters, then
-     * e.g ${*:1:1} refers to $1 even if KSH_ARRAYS is in effect.
+     * e.g ${*:1:1} refers to $1.
      * This is for compatibility.
      */
     int horrible_offset_hack = 0;
@@ -2768,16 +2768,15 @@ paramsubst(LinkList l, LinkNode n, char 
 			return NULL;
 		    }
 		}
-		if (!isset(KSHARRAYS) || horrible_offset_hack) {
+		if (horrible_offset_hack) {
 		    /*
 		     * As part of the 'orrible hoffset 'ack,
 		     * (what hare you? Han 'orrible hoffset 'ack,
 		     * sergeant major), if we are given a ksh/bash/POSIX
-		     * style array which includes offset 0, we use
-		     * $0.
+		     * style positional parameter array which includes
+		     * offset 0, we use $0.
 		     */
-		    if (isset(KSHARRAYS) && horrible_offset_hack &&
-			offset == 0 && isarr) {
+		    if (offset == 0 && isarr) {
 			offset_hack_argzero = 1;
 		    } else if (offset > 0) {
 			offset--;
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.46
diff -p -u -r1.46 D04parameter.ztst
--- Test/D04parameter.ztst	18 Nov 2010 13:57:19 -0000	1.46
+++ Test/D04parameter.ztst	23 Nov 2010 11:09:34 -0000
@@ -1268,15 +1268,15 @@
    print ${foo:$(echo 3 + 3):`echo 4 - 3`}
    print ${foo: -1}
    print ${foo: -10}
-0:Bash-style subscripts, scalar
->3456789
+0:Bash-style offsets, scalar
 >456789
 >56789
 >6789
->3
+>789
 >4
 >5
 >6
+>7
 >9
 >123456789
 
@@ -1291,15 +1291,15 @@
    print ${foo:$(echo 3 + 3):`echo 4 - 3`}
    print ${foo: -1}
    print ${foo: -10}
-0:Bash-style subscripts, array
->3 4 5 6 7 8 9
+0:Bash-style offsets, array
 >4 5 6 7 8 9
 >5 6 7 8 9
 >6 7 8 9
->3
+>7 8 9
 >4
 >5
 >6
+>7
 >9
 >1 2 3 4 5 6 7 8 9
 
@@ -1321,7 +1321,7 @@
      echo ${str: -1:1}
    }
    testfn
-0:Bash-style subscripts, Bourne-style indexing
+0:Bash-style offsets, Bourne-style indexing
 >1
 >2
 >3

-- 
Peter Stephenson <pws@xxxxxxx>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom



Messages sorted by: Reverse Date, Date, Thread, Author