Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: more splitting



As Bart pointed out, the two hex versions are not interchangeable. For clarity, let me call hex-doors my version, which shows exactly what's at the front and back doors, and hex-var the version below, which shows exactly what's inside the variable passed by name.

function hex-var()
{
  echo "--- ${(r:25::-:: :)1}"
  if [[ -v $1 ]]; then
    set -- "${(@P)1}"
    print -rC1 -- ${(q+)@}
    echo "\n-----------------------------\n"
    for element ("$@") print -rn -- $element | od -vAx -tx1 -tc
  else
    echo "No such variable: ${(q+)1}"
  fi
}

And, all else equal it's probably more othodox to pass the value of the
variable than the name.

It all depends on what you want to do. For instance, if you want to see exactly what's in a variable, you better pass its name to a function like hex-var than (try to) pass its value to a function like hex-doors because with Zsh chances are high that if you try to attempt the latter you will fail.

Here is a first example of failure:

% var=""; hex-var var; hex-doors $var
--- var ---------------------
''

-----------------------------


hex-var correctly shows that var contains the empty string but hex-doors see nothing at the front door. The "problem" is that Zsh entirely drops $var if it expands to an empty string. Thus, in this case, hex-doors is called with no arguments instead of with a single empty string.

You can fix the "problem" by quoting $var, then hex-doors also sees the empty string:

% var=""; hex-var var; hex-doors "$var"

--- var ---------------------
''

-----------------------------

--- Front door --------------
''

-----------------------------


Here is another example of where $var fails to pass the right value(s) to hex-doors:

% var=(aa "" bb cc); hex-var var; hex-doors $var
--- var ---------------------
aa
''
bb
cc

-----------------------------

0000000    61  61                                                        
           a   a                                                        
0000002
0000000    62  62                                                        
           b   b                                                        
0000002
0000000    63  63                                                        
           c   c                                                        
0000002
--- Front door --------------
aa
bb
cc

-----------------------------

0000000    61  61                                                        
           a   a                                                        
0000002
0000000    62  62                                                        
           b   b                                                        
0000002
0000000    63  63                                                        
           c   c                                                        
0000002


Again, hex-doors doesn't see the empty string. The "problem" is again that when Zsh epands $var and replaces it by its values, it drops any empty string. Thus, it calls hex-doors with 3 strings: "aa", "bb", and "cc", instead of calling it with the 4 values stored inside var. If we apply the same trick as above and quote $var, we still don't get the right result:

% var=(aa "" bb cc); hex-var var; hex-doors "$var"
--- var ---------------------
aa
''
bb
cc

-----------------------------

0000000    61  61                                                        
           a   a                                                        
0000002
0000000    62  62                                                        
           b   b                                                        
0000002
0000000    63  63                                                        
           c   c                                                        
0000002
--- Front door --------------
'aa  bb cc'

-----------------------------

0000000    61  61  20  20  62  62  20  63  63                            
           a   a           b   b       c   c                            
0000009


Here you can see that the empty string was taken into account because there are two spaces between aa and bb, while there is only one between bb and cc. Unfortunately, all the values were jammed together. That's because the default behavior of quotes on array values is to join their values with spaces between them. To prevent this joining, we can use the (@) expansion flag, which tells Zsh to keep array values separated:

% var=(aa "" bb cc); hex-var var; hex-doors "${(@)var}"
--- var ---------------------
aa
''
bb
cc

-----------------------------

0000000    61  61                                                        
           a   a                                                        
0000002
0000000    62  62                                                        
           b   b                                                        
0000002
0000000    63  63                                                        
           c   c                                                        
0000002
--- Front door --------------
aa
''
bb
cc

-----------------------------

0000000    61  61                                                        
           a   a                                                        
0000002
0000000    62  62                                                        
           b   b                                                        
0000002
0000000    63  63                                                        
           c   c                                                        
0000002


Now, hex-doors finally see the same as hex-var. And the good thing is that the same _expression_ also works for the string variable:

% var=""; hex-var var; hex-doors "${(@)var}"
--- var ---------------------
''

-----------------------------

--- Front door --------------
''

-----------------------------


What this shows is that you can indeed use hex-doors to see what's inside a variable but it requires being very careful on how you pass that variable's value to hex-doors. If your aim is to see what's inside a variable, it's much more reliable to pass its name to hax-var.

Now, to see whether you better understand what's going on, you can try to guess what's the result of the following three calls to hex-doors:

% var=(aa "" bb cc)
% hex-doors foo${var}bar
% hex-doors "foo${var}bar"
% hex-doors "foo${(@)var}bar"

If it's at the front door the splitting is retained.

It all depends on how you refer to the variable. With $var, array values remain separate values but empty strings are lost, with "$var" array values are joined together, it's only with "${(@)var}" (or "${var[@]}") that all array values, including empty strings, are kept and remain separate values.

It really is impossible to know from the pipe version where the input
array was split, the information no longer exists.

Indeed, in a similar way, I can't tell whether you typed your reply with your hands, or with your feet, whether you used some voice-to-text tool, or whether it was your cat that randomly jumped on your keyboard. The only thing I see is the resulting mail.


Still, there is an ambiguity:
 
% var=("a b" c$'\n''d e f'' ''g h' ij)
% var2=( ${(f)var} )

 
It seems that the splitting char is always removed, thus one can't look
at $var2 and determine that it was created via split on newlines,

Obviously not. What you have to understand is that var2 doesn't store the literal _expression_ ${(f)var}. It stores the result of its expansion. 

one
might just as well conclude that it was split on spaces, with a space
between 'c' and 'd' instead of a newline.

Indeed, there are plenty of ways to generate the same result. Here 6 different ways of initializing var2 to the same values, namely "aa", "bb", and "cc"

var2=( aa bb cc )                         # Explicit values
var1=( aa bb cc ); var2=( $var1 )         # Expansion of array variable
var1=$'aa\nbb\ncc'; var2=( ${(f)var1} )   # Expansion of string variable split by newlines
var1="aa bb cc"; var2=( ${(s: :)var1} )   # Expansion of string variable String split by spaces
var2=( $(echo "aa bb cc") )               # String printed and split by spaces (and by all other characters present in IFS)
var2=( aa $(echo "bb cc") )               # Mix of explicit values and split of printed string
 
This seems to me an
imperfection, tho obviously one would have to go back to the 70s to have
done anything about it.  In the interests of perfect information, I
wonder if there's any way that even the 'od' output could inform us that
the output of $var2 is an array split on newlines?  I suspect that 'od'
has no way of knowing.

How should that work? How could var2 remember in which of the 6 ways listed above it was initialized? Where should that information be stored? And what for?

I suspect that some of your confusion comes from C. In C, when you call a function with a variable like in foo(var), the function foo receives exactly one argument which contains the value of var. In Zsh, the equivalent foo $var works very differently. In Zsh, the _expression_ $var gets first expanded before foo is called and foo is then called with the values resulting from that expansion. As we have seen above, depending on the nature and content of the variable var, the expansion of $var may lead to zero, one, or more values, which will translate into as many arguments for foo. When foo is finally called, foo only sees the result of the expansion of $var. It can't know whether its arguments were provided as explicit values, are the result of the expansion of a variable, come from a command substitution, or were produced in some other fashion.

Philippe



Messages sorted by: Reverse Date, Date, Thread, Author