Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: indented heredocs

X-seq: zsh-workers 40245
From: "Nikolay Aleksandrovich Pavlov (ZyX)" <kp-pav@xxxxxxxxx>
To: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>, Dave Yost <dave@xxxxxxxx>
Subject: Re: indented heredocs
Date: Fri, 30 Dec 2016 01:31:37 +0300
Authentication-results: mxback6o.mail.yandex.net; dkim=pass header.i=@yandex.ru
Cc: zsh workers <zsh-workers@xxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1483050697; bh=lG7ke/W6HQLLJGWEFKhNUXGVWCyvdwTv1hM6tkkI+so=; h=From:To:Cc:In-Reply-To:References:Subject:Message-Id:Date; b=PdsT03Jum6ztByPpVZZ2oJbuDiCVdIFiQVxDwGDeC21pMFt9zR9z5HKm8y5Ba9qPf DNH2FI+i+xGg+/TX96wVQWcClYOx4aAc0DkwMXMN3ZeBVf4Qo3VET4fILwDcToL77b oAkANXHUwO53VVfIJtT5+ZnudEBTD8BfkO4IdXd8=
In-reply-to: <CAH+w=7bGEb13SUxX-whdHzWRktoiMdvgSosJvcEoZ+t0z3FOhA@mail.gmail.com>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
References: <CFA2339B-26F2-4104-AAD2-64852509286B@yost.com> <CAH+w=7bGEb13SUxX-whdHzWRktoiMdvgSosJvcEoZ+t0z3FOhA@mail.gmail.com>

22.12.2016, 01:11, "Bart Schaefer" <schaefer@xxxxxxxxxxxxxxxx>:
> On Wed, Dec 21, 2016 at 11:29 AM, Dave Yost <Dave@xxxxxxxx> wrote:
>>  Surely people have thought of this (Alternative 1):
>>
>>  0 Wed 10:53:53 ~
>>  205 Z% cat <<xx
>>    foo
>>    bar
>>    xx
>>  foo
>>  bar
>>  0 Wed 10:53:53 ~
>>  206 Z%
>>
>>  but shells don’t do that.
>
> [...]
>
>>  I suggested this (Alternative 2), which [Bourne] liked:
>>
>>  0 Wed 10:53:53 ~
>>  206 Z% cat \
>>    <<xx
>>    foo
>>    bar
>>    xx
>>  foo
>>  bar
>>  0 Wed 10:54:10 ~
>>  207 Z%
>
> I'm not thrilled with this idea because it gives special semantics to
> backslash-newline (as well as to leading spaces before "<<") which do
> not currently exist. In existing syntax, backslash-newline can simply
> be discarded without changing the meaning of the command line, I think
> even before tokenization.
>
> I would propose instead something similar (read on below) to this:
>
> % cat <<-' xx'
>   foo
>   bar
>   xx
> foo
> bar
> %
>
> This explicitly quotes the leading space that is to be stripped, so
> there is no parsing ambiguity, and it piggybacks on the existing <<-
> syntax, merely changing the expected leading space from "all tabs" to
> "the leading whitespace on the end marker".

This makes changing the indent rather tricky. YAML does better here: amount of stripped indent is either determined based on the first non-blank line (e.g.

```
 cat <<| EOF

   xx
    x
 EOF
```

will produce

```

xx
 x
```

because `xx` is first non-blank and it has 3 leading spaces here and `x` has four, meaning that the result is "\nxx\n\x20x") or is specified explicitly, relative to the indent of the line where block scalar starts (e.g.

```
 cat <<|1 EOF

   xx
    x
 EOF
```

will produce

```

 xx
  x
```

because `cat` has single space as indent, `xx` has 3 and it was requested that meaningful content starts with 1 (cat indent) + 1 (`1` before EOF) = 2 spaces, meaning that the result is "\n\x20xx\n\x20\x20x": has one more indent then in previous example).

>
>>  I don’t think that would help anything. If the parser doesn’t know how to do
>>  the new syntax with the existing << operator, you’ll get an error, and if the
>>  parser doesn’t know the new operator, you’ll get an error. Same difference.
>
> It is a consideration that we might prefer that older shells choke on
> the new syntax. I think having them choke by failing to find the end
> marker is rather worse than having them choke by failing to recognize
> the operator -- something that wrongly appears to be the end marker
> might appear later in the script if we go your "Alternative 2" route.
>
> Taken literally, my example above would be accepted by an older shell
> and processed without stripping the leading spaces. If that's
> unacceptable, we need a different (and currently invalid) replacement
> for "<<-" (the only thing that comes to mind is "<<|" which seems a
> bad choice).

YAML uses `|` and `>` to start block scalars, that’s why I used `|` above (`<<>` seems odd and may be confused with `<>`). Not sure why this should be a bad choice: `|` already has different meanings in different contexts, though only three (pipe, or and array subtraction (`${:|}`)) so far. `-` used  in `<<-` has much more meanings: negation/subtraction, stripping leading spaces, prepending `-` to `argv[0]` (i.e. running as login shell in most cases), stdin, rest arguments separator (`echo - -E` outputs just `-E`, though not sure whether it is intentional, `--` in many commands definitely is), close (in `>& -`), range, default (in `${:-}`), dereference (in `*(-/)`), flags leader (in almost any command and also in `$-`).

---

`sed`-based alternative is not good for the same reason I would reject any explicitly added spaces. If bother with this at all, it should satisfy the following requirements:

- Keep extra indent (or `<<-` would be mostly fine, though better something which also removes spaces).
- Allow easy reindenting with simple editor command that reindents (like `<{motion}` and `>{motion}` in Vim) without any additional actions (or `sed` would be mostly fine).
- Allow indenting end marker as user likes (or, at least, as the initial indent: one space in the examples): basically I would treat `cat <<| EOF` as something like `{` or `do` and `EOF` as `}` or `done`: semantically they are literal block header and literal block terminator and thus `EOF` should be with the same indent as `cat` and *less* indented then other text which it is not a part of.

Follow-Ups:
- Re: indented heredocs
  - From: Bart Schaefer

References:
- indented heredocs
  - From: Dave Yost
- Re: indented heredocs
  - From: Bart Schaefer

Messages sorted by: Reverse Date, Date, Thread, Author