Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
RE: Zsh rewritten in Ruby for OpenBSD 8.0+
On Mon, 1 Apr 2024 05:16:47 +0200, Trent Acklez <trentacklez@xxxxxxxxx> wrote:
> --00000000000046708e0619dfa65c
> Content-Type: text/plain; charset="UTF-8"
>
> Hi,
>
> Just some concept art:
> https://www.tiktok.com/@openbsd_fan_club/video/7375736163211939104
>
> Wouldn't it be nice to have ksh with zsh's beautiful syntax? Let me know
> what you think!
Oh dear, looks like ntpd(8) threw me two months forward!
--zeurkous.
P.S.: amidst the confusion me'll sneak in today's quickly-written
report. (Discussion off-list.)
>
> --Trent
>
> --00000000000046708e0619dfa65c
> Content-Type: text/html; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
>
> <div dir=3D"ltr">Hi,<div><br></div><div>Just some concept art:</div><div><a=
> href=3D"https://www.tiktok.com/@openbsd_fan_club/video/7375736163211939104=
> ">https://www.tiktok.com/@openbsd_fan_club/video/7375736163211939104</a><br=
>><br>Wouldn't it be nice to have ksh with zsh's beautiful syntax? L=
> et me know what you think!</div><div><br></div><div>--Trent</div></div>
>
> --00000000000046708e0619dfa65c--
>
Three flaws in UNIX environment handling
by De Zeurkous
0. Abstract
UNIX programs, when executed, carry a special kind of argument
array called the "environment", that is also passed to child
programs, typically unmodified. While this largely transparent
mechanism is functional, in this form it has several major flaws,
three of which are outlined in this paper. Remedies are also proposed,
as well as an alternative recommendation for a future system.
1. Introduction
UNIX programs are normally invoked with an array of a special kind of
argument; this array is called the /environment/. This environment,
unlike the proper arguments, takes the form of an array of "key=value"
pairs that is also expected to be propagated, possibly modified, to
child programs.
The environment is normally carried silently from program to program;
several interfaces, including a set of library routines -- the
getenv(3) and setenv(3) family -- and dedicated syntax in the shell,
and be used to query and manipulate it.
Standard environment variables include--
HOME the path to the user's home directory
PATH a comma-separated list of paths to directories with executable
programs
TERM the name of the type of the user's current terminal
As an example, the latter variable may appear in the environment as--
TERM=vt100
(This would indicate that the relevant terminal is a DEC VT100, or, as
in most contemporary cases, one that is mostly compatible.)
Note that while the standard variable names are invariably in
uppercase, this is merely a convention and not a requirement;
especially in shell usage, lowercase and mixed-case variable names
regularly occour.
2. Environment structure
Internally, the environment is kept by each program as an array of
addresses of zero-terminated strings containing "key=value" pairs;
a null address terminates the array. In C syntax, it is declared as
follows--
extern char **environ;
This variable "environ", which specifies the address of the array, is
typically initialized before control is passed to the program proper.
3. Environment manipulation
The possible manipulations of the environment include--
0) removing a variable from the environment;
1) adding a variable to the environment;
2) updating a variable already in the environment.
The setenv(3) family of standard library routines is provided with the
aim of automating these operations; they also perform some extra sanity
checking on their inputs.
4. The three flaws
4.0. Flaw zero
There is no standard way to keep track of the dynamicity of the
enviroment array.
When removing from, or adding a variable to, the environment, it is
necessary to re-size the array, however, since it is not possible to
determine if the value of the variable "environ" has been updated
since the start of the program proper, the program must presume that
it the environment array has been statically allocated, and thus that
a dynamic allocation has to be performed anew.
The result is a memory leak in which environment arrays may be
repeatedly allocated anew, discarding but not freeing the previous
one (because there is no way to ensure that it has been dynamically
allocated in the zeroth place).
The setenv(3) family of routines attempt to work around this by
comparing the environment address to a shadow copy in a private
variable; the area of effectiveness of this is however restricted
to that family of routines, and any change, of the environment
address, outside the control of those routines will thus still
lead to an instance of the same memory leak.
This flaw can be remedied by maintaining a simple boolean variable,
initialized to "false", that can be set to the "true" condition when
the address in "environ" points to a dynamic allocation of memory.
4.1. Flaw one
There is no standard way to keep track of the dynamicity of
environment variables.
When removing from, adding a variable to, or updating a variable
already in the environment, the memory for the new variable may be
(and in practice often is) dynamically allocated. Thus, as dynamic
environment variables are removed and/or updated, the previous (old)
version may be plausibly freed upon completion. However, the latter
cannot happen in a standard manner since there is no standard way to
indicate that the allocation for the variable is dynamic; the removal
or update thus results in a leak of that allocated memory.
setenv(3), which composes a variable from separate name and value
strings, displays this exact problem by promptly forgetting about
the dynamic allocation of the variable once it has been put in the
environment.
This flaw can be remedied by maintaining a simple boolean variable per
environment variable, that indicates whether or not the memory for
that variable constitutes a dynamic allocation.
Another potential remedy is to make a complete copy, of every variable
string, to dynamically-allocated memory, upon initialization of the
dynamic environment array (see above).
4.2 Flaw two
Programs may peek at each other's environments, but this mechanism
lacks synchronization.
It is possible for a program to take an interest in, amongst other
things, the current environment of a running program on the system,
and have this interest statisfied by the provision of that data by
the kernel.
The kernel has to get this data from somewhere, and that somewhere
is the recorded location of the "environ" variable in the target
program's memory. However, there is no guarantee that, during this
operation, the target program is not in the middle of an update of
its environment. Depending on the algorithm used by the target
program, the possible nefarious results include--
0) duplicate variables being retrieved;
1) not all variables being retrieved;
2) outdated variables being retrieved; and, worst of all:
3) the terminator not being found and the contents of unrelated
memory thus retrieved.
The author has detected no attempt to work around this problem.
This flaw can be remedied by not only taking the greatest care in
manipulating the environment array, but also maintaining a simple
boolean variable per environment array member, to indicate whether
or not other programs should ignore the relevant entry when peeking at
it.
An alternative remedy is to always make a new, manipulated environment
array, and to ensure that the resulting singular update of the
variable "environ" is atomic.
5. Recommendation for the future
In a future system, it may be best to implement the environment as
an linked list, with a proposed entry structure as follows (in C
syntax)--
bool valid;
char* string;
Or, alternatively, separating the "key=value" strings--
bool valid;
char* name;
char* value;
The latter has the advantage that the use of the equals sign ("=")
separator would be pure syntax, as it is in C, and names would thus
no longer be prevented from including the equals sign as character.
Another advantage of the {name, value} separation is that the string
"name" can be preserved on updates, thus eliminating the string
composition as hidden by the routine setenv(3).
6. Conclusion
The current environment argument mechanism, while functional, has
several major flaws resulting in potential memory leaks and
inconsistent state. Remedies are possible through the implementation
of additional mechanisms, yet for a future system better-designed
alternatives deserve recommendation.
[Last updated at Sun Jun 2 02:54:28 UTC 2024 by De Zeurkous.]
--
Friggin' Machines!
Messages sorted by:
Reverse Date,
Date,
Thread,
Author