An alternative to variable hopping

Bart's example of factoring "typeref -n ref" out of nested scopes made me realize that there was an alternative to variable hopping that is both more sensible (if you consider named references as a kind of pointer) and more useful.

What do I mean with variable hopping?

Consider the following example

function f1() {
typeset -n ref
typeset var=var-in-$0
function f2() {
function f3() {
typeset var=var-in-$0
function f4() {
typeset var=var-in-$0
ref=var
echo "$0: ref=$ref"
}
f4
echo "$0: ref=$ref"
}
f3
echo "$0: ref=$ref"
typeset var=var-in-$0
echo "$0: ref=$ref"
}
f2
typeset var=var-in-$0
echo "$0: ref=$ref"
}
f1

Output with the current implementation

f4: ref=var-in-f4
f3: ref=var-in-f3
f2: ref=var-in-f1
f2: ref=var-in-f2
f1: ref=var-in-f1

Output with all-the-way-up

f4: ref=var-in-f4
f3: ref=var-in-f3
f2: ref=var-in-f1
f2: ref=var-in-f1
f1: ref=var-in-f1

The only difference between the two implementations is the way the reference behaves once the first referred variable ceases to exist after "f4" is exited. With all-the-way-up, the reference is more sticky. Technically, with the current implementation, after exiting "f4", "ref" becomes a floating/loose reference that is successivly tied to the scope of "f3", "f2", and "f1". With all-the-way-up, "ref" remains a sticky reference that successivly sticks to the "var" in "f3" and the "var" in "f1".

Ignoring these differences, both implementations exhibit variable hopping, namely successive rebinds to different variables when nested scopes are exited and/or when new variables with a matching name are defined.

The implementation of named references simply remembers the name of the variable with which the reference is initialized ("var" in the example above) and then, whenever the reference is accessed, searches for a variable with that name starting from a specific scope that may be different from the scope where the access takes place. If you understand this, then you also understand why the variable hopping happens; it's a direct consequence of the implementation.

Understanding what triggers the variable hopping and how it works doesn't give it any meaning. That point always bothered me. The variable hopping doesn't make much sense to me; it doesn't look useful. It looks much more like a side-effect of the current implementation and not so much as a desired language feature.

If you see named references as a kind of pointer then when the referred variable ceases to exit because its scope is exited, then, well, the reference should simply stop to refer to anything. This could be achieved by returning the reference to an uninitialized state. Afaict, implementing this behavior should be as simple as the current one. The example above would then produce the following output:

f4: ref=var-in-f4
f3: ref=
f2: ref=
f2: ref=
f1: ref=

Bart mentioned that one may sometimes want to factor "typeref -n ref" out of nested scopes. The following example is a case where that works with the current implementation.

() {
local -n ref;
() { local A=a B=b; for ref in A B; do echo "ref=$ref"; done }
() { local X=x Y=y; for ref in X Y; do echo "ref=$ref"; done }
}

Output:

ref=a
ref=b
ref=x
ref=y

I pointed out that this worked only thanks to the special behavior of the "for" construct. The following example doesn't work with the current implementation

() {
local -n ref;
() { local A=a; ref=A; echo "ref=$ref" }
() { local X=x; ref=X; echo "ref=$ref" }
}

Output:

ref=a
ref=X

With the current implementation, "ref=X" defines a global variable "A" and initializes it with "X", while one might have expected that it would (re)initialize "ref" to refer to "X", like in the example with "for".

This example made me realize that an alternative approach to references to nested variables that cease to exist is to return the reference to an uninitialized state. With that approach, the example above would work. Much more importantly, that approach is, in my opinion, much more sensibe. Especially if you consider named references as a kind of pointer and ignore, like most users, how named reference are implemented under the hood.

I mentioned at the beginning that the alternative approach was also more useful than the current one. The example above is one where the alternative is more likely to to do what the user wants than the current implementation but it looks like a corner case that's not necessarily very relevant. The claimed greater usefulness actually comes from how references to not yet defined variables would be handled.

Consider the following example

function f1() {
typeset -n ref=var
function f2() {
typeset var=var-in-$0
echo "$0: ref=$ref"
}
f2
function g2() {
function g3() {
typeset var=var-in-$0
echo "$0: ref=$ref"
}
g3
typeset var=var-in-$0
echo "$0: ref=$ref"
}
g2
typeset var=var-in-$0
echo "$0: ref=$ref"
}
f1

Output with the current implementation and with all-the-way-up

f2: ref=var-in-f2
g3: ref=var-in-g3
g2: ref=var-in-g2
f1: ref=var-in-f1

Here again there is considerable variable hopping. This jumping around makes very little sense if you consider named reference as a kind of pointer. One could maybe claim that "typeref -n ref=var" was factored out of the various nested scopes but I find this very dubious.

To me the current behavior of references to not yet defined variabels makes little sense and looks more like a side effect of the underlying implementation than a desired language feature.

More importantly, the current behavior is detrimental in cases that may not be that uncommon if usage of named refrences picks up. Consider the following example where the named reference "ref_to_var" is passed by name to some library function "libf".

function libf() {
typeset -nu ref=$1
typeset var=var-in-libf
echo ref=$ref
}
function main() {
typeset -n ref_to_var=var
libf ref_to_var
typeset var=var-in-main
libf ref_to_var
}
main

Output:

ref=var-in-libf
ref=var-in-main

Expected output:

ref=
ref=var-in-main

Because "libf" happens to define a variable named "var", the first call to libf produces an output that is arguably not the expected one.

An alternative implementation of references to not yet defined variables, is to limit their search to variables visible in the scope where the reference was defined (i.e., they are equivalent to a "typeset -nu0"). That implementation produces the expected output for the example above.

Put together this gives the following specification for named reference.

A named reference "ref" initialized with "var" refers to the variable that "var" referred to when "ref=var" was called. If that variable ceases to exist, "ref" returns to an uninitialized state (it can be reused by reinitializing it with a new variable name). If no variable "var" existed when "ref=var" was called, "ref" is set to refer to the first variable "var" that gets defined in the same scope as where "ref" was defined (or in the global scope).

Arguably the behavior for not yet define variables is still not exactly what you would expect from a pointer but at least now these references exhibit the same stabilty as pointers and never hop from one variable to another one.

In my opinion, this specification is easier to understand than the current one. Named references behave much more like pointers and never jump around. Afaict, the implementation is as simple as the current one. In addition to that, named references passed by name work better (no risk of accidental variable capture).

Philippe