Yep, it looks like there is a large number of programs where zsh and ksh will differ even in the absence of named references. From what I understand, this includes most, if not all non-trivial programs where variables of the same name are defined simultaneously in multiple scopes. Since "all the way up" only makes a difference for such programs, I don't think ksh-compatibility should be a criteria for deciding whether to adopt it. On the other hand, "all the way up" offers multiple benefits. One of them is a not too complicated specification that accurately describes how named references behave in all contexts. Another benefit is that it prevents local changes (like defining a new nested local variable) from modifying how the program behaves in unrelated places. For these reasons, I strongly believe that we should adopt it.
I'm still working on my second doc. I will probably need a few more days to finish it. As I was replying to various threads and investigating how ksh works, there was less progress than I wished but that's not necessarily a bad thing. It opened a number of new perspectives and for example made me realize that a much nicer specification than the one I had in my first doc was possible.
I'm thinking of including in my second doc a discussion about which ksh programs we should consider when trying to be compatible and whether we are compatible. This and a number of other things could be relevant to help decide things like whether to adopt "all the way up", even if none of the proposed changes are adopted. So, I would suggest waiting a little before making more changes. Even I myself don't yet know the outcome of the ksh investigation as I have only started working on it.
In any case, I must say that the changes so far are already a big progress, and the discussions helped me get a much better grasp of the problems at hand. Not so long ago I still had the impression that there were an almost infinite number of cases and combinations that needed to be understood and tested. I have now managed to consolidate that into something much simpler, which is translated in the specification I have given in this thread.
Speaking of that specification, I had initially one that distinguished more cases until I noticed that I could simplify it but now I realize that I simplified it a little too much.
At any point in time, if ref=var, then
- if var was locally defined in the current scope before ref was initialized, then ref refers to it,
- otherwise, ref refers to the first (strictly) enclosing var, if any and otherwise to the local var, if any.
The problem with the version above is that once the reference ref is defined and initialized to refer to some variable var, a new variable var may be defined in a nested scope. The reference will keep referring the same variable but the specification above says that it will refer to the newly defined variable. Below is the corrected specification with the additions and changes in bold.
At any point in time, if ref=var, then consider the innermost scope that was already present when ref was initialized to var,
- if var was locally defined in that scope before ref was initialized, then ref refers to it,
- otherwise, ref refers to the first (strictly) enclosing var at that scope, if any and otherwise to the outermost var, if any.
For -u named references, where -u is a property of the reference, I had the following specification:
At any point in time, if ref=var and ref was defined with -u, then
- ref refers to the first (strictly) enclosing var, if any and otherwise refers to the local var, if any.
Here is the corrected one:
At any point in time, if ref=var and ref was defined with -u, then consider the innermost scope that was already present when ref was initialized to var,
- ref refers to the first (strictly) enclosing var at that scope, if any and otherwise refers to the outermost var, if any.
Btw, the algorithm to achieve this is the same as for regular named references, except that when you perform the "all the way up" (and the initial look) you don't start at the current scope but one scope up.
Since the raison d'être of the -u option is to support passing variables by name, I'm getting convinced that -u is a property of the reference. And not only that. I'm now also convinced that the lookup should not be performed relative to the scope of where the reference is initialized but relative to where the reference is defined.
When someone writes "typeref -n -u ref=$1" at the start of their function, it clearly states that the aim is to access a variable defined at the scope of the calling function. This should still be true if the initialization is delegated into a nested function, for example because one or more non-trivial references are to be set up. Imagine for example a library implementing some data structure consisting of a bunch of functions that expect an array and a number of indexes passed by name and that set up a number of named references to perform their work. Instead of repeating the non-trivial reference initialization in each function, it could make sense to move that code into a shared helper function. That only works if the looks are still performed relative to where the references were defined.
Lastly, again because -u is intended to get access to a variable in the caller's scope, I think that if no variable is found there, it should default to an invalid reference that expends to the empty string or maybe even better to an error and not default, like currently, to whatever, if anything, is found in the current scope. Note btw, that's what ksh does. At least when static scoping is in use, which is also what -u corresponds to.
Put together, this leads to the following specification:
At any point in time, if ref=var and ref was defined with -u, then,
- ref refers to the first (strictly) enclosing var at the scope where ref was defined, if any.
Which is super-simple and, I think, also exactly what we want for the -u use cases. Implementation-wise the only difference with regular references is that if the lookup at the lookup scope S returns nothing, then it defaults to nothing (rather than the variable with the outermost scope that is deeper than S).
Philippe