Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author
RE: Zsh rewritten in Ruby for OpenBSD 8.0+

X-seq: zsh-users 29977
From: zeurkous@xxxxxxxxxxxxxxx
To: Trent Acklez <trentacklez@xxxxxxxxx>, zsh-users@xxxxxxx
Subject: RE: Zsh rewritten in Ruby for OpenBSD 8.0+
Date: Sun, 02 Jun 2024 03:27:05 +0000 (UTC)
Archived-at: <https://zsh.org/users/29977>
In-reply-to: <CAOwzMra3u86inWC0Gt_6bx1Jh0Jq+Fh1Xume+opyVMcQoeACOQ@mail.gmail.com>
List-id: <zsh-users.zsh.org>
References: <CAOwzMra3u86inWC0Gt_6bx1Jh0Jq+Fh1Xume+opyVMcQoeACOQ@mail.gmail.com>
On Mon, 1 Apr 2024 05:16:47 +0200, Trent Acklez <trentacklez@xxxxxxxxx> wrote:
> --00000000000046708e0619dfa65c
> Content-Type: text/plain; charset="UTF-8"
>
> Hi,
>
> Just some concept art:
> https://www.tiktok.com/@openbsd_fan_club/video/7375736163211939104
>
> Wouldn't it be nice to have ksh with zsh's beautiful syntax? Let me know
> what you think!

Oh dear, looks like ntpd(8) threw me two months forward!
       --zeurkous.

P.S.: amidst the confusion me'll sneak in today's quickly-written
      report. (Discussion off-list.)

>
> --Trent
>
> --00000000000046708e0619dfa65c
> Content-Type: text/html; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
>
> <div dir=3D"ltr">Hi,<div><br></div><div>Just some concept art:</div><div><a=
>  href=3D"https://www.tiktok.com/@openbsd_fan_club/video/7375736163211939104=
> ">https://www.tiktok.com/@openbsd_fan_club/video/7375736163211939104</a><br=
>><br>Wouldn&#39;t it be nice to have ksh with zsh&#39;s beautiful syntax? L=
> et me know what you think!</div><div><br></div><div>--Trent</div></div>
>
> --00000000000046708e0619dfa65c--
>

Three flaws in UNIX environment handling

 by De Zeurkous

0. Abstract

 UNIX programs, when executed, carry a special kind of argument
 array called the "environment", that is also passed to child
 programs, typically unmodified. While this largely transparent
 mechanism is functional, in this form it has several major flaws,
 three of which are outlined in this paper. Remedies are also proposed,
 as well as an alternative recommendation for a future system.

1. Introduction

 UNIX programs are normally invoked with an array of a special kind of
 argument; this array is called the /environment/. This environment,
 unlike the proper arguments, takes the form of an array of "key=value"
 pairs that is also expected to be propagated, possibly modified, to
 child programs.

 The environment is normally carried silently from program to program;
 several interfaces, including a set of library routines -- the
 getenv(3) and setenv(3) family -- and dedicated syntax in the shell,
 and be used to query and manipulate it.

 Standard environment variables include--

  HOME   the path to the user's home directory
  PATH   a comma-separated list of paths to directories with executable
          programs
  TERM   the name of the type of the user's current terminal 

 As an example, the latter variable may appear in the environment as--

  TERM=vt100

 (This would indicate that the relevant terminal is a DEC VT100, or, as
  in most contemporary cases, one that is mostly compatible.)

 Note that while the standard variable names are invariably in
 uppercase, this is merely a convention and not a requirement;
 especially in shell usage, lowercase and mixed-case variable names
 regularly occour.

2. Environment structure

 Internally, the environment is kept by each program as an array of
 addresses of zero-terminated strings containing "key=value" pairs;
 a null address terminates the array. In C syntax, it is declared as
 follows--

  extern char **environ;

 This variable "environ", which specifies the address of the array, is
 typically initialized before control is passed to the program proper.

3. Environment manipulation

 The possible manipulations of the environment include--

  0) removing a variable from the environment;
  1) adding a variable to the environment;
  2) updating a variable already in the environment.

 The setenv(3) family of standard library routines is provided with the
 aim of automating these operations; they also perform some extra sanity
 checking on their inputs.

4. The three flaws

 4.0. Flaw zero

  There is no standard way to keep track of the dynamicity of the
  enviroment array.

  When removing from, or adding a variable to, the environment, it is
  necessary to re-size the array, however, since it is not possible to
  determine if the value of the variable "environ" has been updated
  since the start of the program proper, the program must presume that
  it the environment array has been statically allocated, and thus that
  a dynamic allocation has to be performed anew.

  The result is a memory leak in which environment arrays may be
  repeatedly allocated anew, discarding but not freeing the previous
  one (because there is no way to ensure that it has been dynamically
  allocated in the zeroth place).

  The setenv(3) family of routines attempt to work around this by
  comparing the environment address to a shadow copy in a private
  variable; the area of effectiveness of this is however restricted
  to that family of routines, and any change, of the environment
  address, outside the control of those routines will thus still 
  lead to an instance of the same memory leak.

  This flaw can be remedied by maintaining a simple boolean variable,
  initialized to "false", that can be set to the "true" condition when
  the address in "environ" points to a dynamic allocation of memory.

 4.1. Flaw one

  There is no standard way to keep track of the dynamicity of
  environment variables.

  When removing from, adding a variable to, or updating a variable
  already in the environment, the memory for the new variable may be
  (and in practice often is) dynamically allocated. Thus, as dynamic
  environment variables are removed and/or updated, the previous (old)
  version may be plausibly freed upon completion. However, the latter
  cannot happen in a standard manner since there is no standard way to
  indicate that the allocation for the variable is dynamic; the removal
  or update thus results in a leak of that allocated memory.

  setenv(3), which composes a variable from separate name and value
  strings, displays this exact problem by promptly forgetting about
  the dynamic allocation of the variable once it has been put in the
  environment.

  This flaw can be remedied by maintaining a simple boolean variable per
  environment variable, that indicates whether or not the memory for
  that variable constitutes a dynamic allocation. 

  Another potential remedy is to make a complete copy, of every variable
  string, to dynamically-allocated memory, upon initialization of the 
  dynamic environment array (see above).
 
 4.2 Flaw two

  Programs may peek at each other's environments, but this mechanism
  lacks synchronization.

  It is possible for a program to take an interest in, amongst other
  things, the current environment of a running program on the system,
  and have this interest statisfied by the provision of that data by
  the kernel.

  The kernel has to get this data from somewhere, and that somewhere
  is the recorded location of the "environ" variable in the target
  program's memory. However, there is no guarantee that, during this
  operation, the target program is not in the middle of an update of
  its environment. Depending on the algorithm used by the target
  program, the possible nefarious results include--
 
   0) duplicate variables being retrieved;
   1) not all variables being retrieved;
   2) outdated variables being retrieved; and, worst of all:
   3) the terminator not being found and the contents of unrelated
       memory thus retrieved.
 
  The author has detected no attempt to work around this problem.

  This flaw can be remedied by not only taking the greatest care in
  manipulating the environment array, but also maintaining a simple
  boolean variable per environment array member, to indicate whether
  or not other programs should ignore the relevant entry when peeking at
  it.

  An alternative remedy is to always make a new, manipulated environment
  array, and to ensure that the resulting singular update of the
  variable "environ" is atomic.

5. Recommendation for the future

 In a future system, it may be best to implement the environment as
 an linked list, with a proposed entry structure as follows (in C
 syntax)--

  bool valid;
  char* string;

 Or, alternatively, separating the "key=value" strings--

  bool valid;
  char* name;
  char* value;

 The latter has the advantage that the use of the equals sign ("=")
 separator would be pure syntax, as it is in C, and names would thus
 no longer be prevented from including the equals sign as character.

 Another advantage of the {name, value} separation is that the string
 "name" can be preserved on updates, thus eliminating the string
 composition as hidden by the routine setenv(3).

6. Conclusion

 The current environment argument mechanism, while functional, has
 several major flaws resulting in potential memory leaks and
 inconsistent state. Remedies are possible through the implementation 
 of additional mechanisms, yet for a future system better-designed
 alternatives deserve recommendation.

[Last updated at Sun Jun  2 02:54:28 UTC 2024 by De Zeurkous.]

-- 
Friggin' Machines!
References:
- Zsh rewritten in Ruby for OpenBSD 8.0+
  - From: Trent Acklez
Messages sorted by: Reverse Date, Date, Thread, Author