Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: more splitting





On 2026-04-15 15:10, Mark J. Reed wrote:
On Wed, Apr 15, 2026 at 5:18 PM Ray Andrews <rayandrews@xxxxxxxxxxx> wrote:
I mean that arrays might have been designed to carry a separator character between elements.  ASCII FF maybe.  So when piped, reconstructing an array would be easier. 

Arrays having a built-in separator character is an intriguing notion, but I don't know of any programming language that has such a construct.
It's only a point of mild interest Mark, nothing worth spending any time on.  I remember that a C array can be passed and accessed as a single unit, it never loses it's ability to be indexed.  I'm guessing what when the first shells were written there were no arrays just strings, and then the array idea was implemented by using multiple strings.  Dunno, it is what it is.  Elements might be nul terminated and the whole array implemented as a count of a certain number of nul terminated strings.  Anyway a shell is a very different kind of thing.  Point is that functionality is there, not now it's implemented.  Bart's idea of passing the name of the array is entirely satisfactory.
 It's just easier to specify the separator when you request the operation that needs it (usually called join). When you do use $array somewhere that wants a string, you're effectively joining the elements together with spaces. When you do print -l $array you're joining them elements together with newlines and printing the result. But you can turn an array into a string with any delimiter you like between elements using the j expansion flag - short for join

Pipes in PowerShell may carry structured data,
Yeah, that's what I was waving a stick at.
 but pipes in Zsh are undecorated bog-standard UNIX pipes as presented by the OS, and those just carry bytes. They have no idea of arrays or variables in general. And that goes not just for shell pipes but network sockets, which is why so much modern software tends to spend (arguably waste) a lot of time serializing and deserializing data (turning it into streams of bytes and back, often in the form of JSON). Your goal when writing software should be to avoid serialization as much as possible - deserialize your input if needed, serialize your output if needed, but in between you should try to keep everything in its natural unserialized state.
Sure.  Before this I'd used pipes a few times without any issues, but the input was indeed serial data so no trouble.
 
% print -n $var
a b c
d e f g h ij#   # What's the hash for?

I'm guessing you have PROMPT_EOL_MARK set to "#"? 
Nope, it's unset.  Never heard of it.
When the output of a command doesn't end with a newline, Zsh adds one anyway before printing the next prompt, 
Whatever, just curious.  I doubt it's anything I have to worry about.


An unquoted newline is treated the same as a space or a tab - or a vertical tab, probably. Space is space.
Ok, good to know. 

% var2=( ${(f)var} ); hex var2

'a b c'
'd e f g h ij'

... love it.  Split on newline.  All spaces preserved. One thing tho: when $var is created, would not the splitting spaces vanish? But we see the space between 'b' and 'c' remains in var2.  Likewise the space between 'h' and 'i' -- how are they retained? Or are they put back for display? 
 
You used ${(f)var}. (f) means "split the contents of this variable on newlines". Since the variable is an array in this case, the only way to get something splittable is to first turn it into a string.  
I thought so.  All previous splitting abandoned first.  
And any time you're treating an array as if it were a single string - even as an intermediate step in an operation like the (f) expansion - the default translation puts a space between elements.
As I conjectured -- it's put back.  

... so typeset output very close to original variable creation keystrokes.  Better! Running output of typeset -p to create a new array, it's identical to the original.  :-)  Typeset edits my input string while retaining the product exactly.  Love it.  IOW, typeset returns what I should have typed in the first place.  That explains the movement of the dollar too, which has always baffled me. 

Quotation marks in the shell are not word delimiters, so you can change between quotation mark types in the middle of a word.  "Li"'ke'$' 'this. That's what you were doing - switching to ANSI quotes long enough to include the newline. But it's usually clearer to pick a type and stick with it.
Yeah, it replaced my double quotes with single quotes because the former were superfluous.  I have no problem with that.
 I like to use the "least powerful" form I can get away with - single quotes if I don't need any expansion, ANSI quotes (that's the $'...') if I need control characters (which I can enter as backslash escapes) or a literal apostrophe in the string, double quotes if I need any variable expansion.

So a mad scientist could take the output from typeset -p, pipe that string and use it to create a new var downstream of the pipe!  

Yes, but you should avoid trying to parse shell code in shell. Use eval if you have to, but it's better to avoid doing anything like that in the first place.
It's not something I'd contemplate, just a remark.  Man!  I've been around for maybe 10 years and all this time I've been basically witless as to splitting -- ad hoc solutions, mostly got here, with no deep understanding.  But I dare say I've basically got it -- at least  conceptually.

Tx. all for the instruction and the patience.


--
Mark J. Reed <markjreed@xxxxxxxxx>



Messages sorted by: Reverse Date, Date, Thread, Author