Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: indented heredocs
- X-seq: zsh-workers 40245
- From: "Nikolay Aleksandrovich Pavlov (ZyX)" <kp-pav@xxxxxxxxx>
- To: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>, Dave Yost <dave@xxxxxxxx>
- Subject: Re: indented heredocs
- Date: Fri, 30 Dec 2016 01:31:37 +0300
- Authentication-results: mxback6o.mail.yandex.net; dkim=pass header.i=@yandex.ru
- Cc: zsh workers <zsh-workers@xxxxxxx>
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1483050697; bh=lG7ke/W6HQLLJGWEFKhNUXGVWCyvdwTv1hM6tkkI+so=; h=From:To:Cc:In-Reply-To:References:Subject:Message-Id:Date; b=PdsT03Jum6ztByPpVZZ2oJbuDiCVdIFiQVxDwGDeC21pMFt9zR9z5HKm8y5Ba9qPf DNH2FI+i+xGg+/TX96wVQWcClYOx4aAc0DkwMXMN3ZeBVf4Qo3VET4fILwDcToL77b oAkANXHUwO53VVfIJtT5+ZnudEBTD8BfkO4IdXd8=
- In-reply-to: <CAH+w=7bGEb13SUxX-whdHzWRktoiMdvgSosJvcEoZ+t0z3FOhA@mail.gmail.com>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- References: <CFA2339B-26F2-4104-AAD2-64852509286B@yost.com> <CAH+w=7bGEb13SUxX-whdHzWRktoiMdvgSosJvcEoZ+t0z3FOhA@mail.gmail.com>
22.12.2016, 01:11, "Bart Schaefer" <schaefer@xxxxxxxxxxxxxxxx>:
> On Wed, Dec 21, 2016 at 11:29 AM, Dave Yost <Dave@xxxxxxxx> wrote:
>> Surely people have thought of this (Alternative 1):
>>
>> 0 Wed 10:53:53 ~
>> 205 Z% cat <<xx
>> foo
>> bar
>> xx
>> foo
>> bar
>> 0 Wed 10:53:53 ~
>> 206 Z%
>>
>> but shells don’t do that.
>
> [...]
>
>> I suggested this (Alternative 2), which [Bourne] liked:
>>
>> 0 Wed 10:53:53 ~
>> 206 Z% cat \
>> <<xx
>> foo
>> bar
>> xx
>> foo
>> bar
>> 0 Wed 10:54:10 ~
>> 207 Z%
>
> I'm not thrilled with this idea because it gives special semantics to
> backslash-newline (as well as to leading spaces before "<<") which do
> not currently exist. In existing syntax, backslash-newline can simply
> be discarded without changing the meaning of the command line, I think
> even before tokenization.
>
> I would propose instead something similar (read on below) to this:
>
> % cat <<-' xx'
> foo
> bar
> xx
> foo
> bar
> %
>
> This explicitly quotes the leading space that is to be stripped, so
> there is no parsing ambiguity, and it piggybacks on the existing <<-
> syntax, merely changing the expected leading space from "all tabs" to
> "the leading whitespace on the end marker".
This makes changing the indent rather tricky. YAML does better here: amount of stripped indent is either determined based on the first non-blank line (e.g.
```
cat <<| EOF
xx
x
EOF
```
will produce
```
xx
x
```
because `xx` is first non-blank and it has 3 leading spaces here and `x` has four, meaning that the result is "\nxx\n\x20x") or is specified explicitly, relative to the indent of the line where block scalar starts (e.g.
```
cat <<|1 EOF
xx
x
EOF
```
will produce
```
xx
x
```
because `cat` has single space as indent, `xx` has 3 and it was requested that meaningful content starts with 1 (cat indent) + 1 (`1` before EOF) = 2 spaces, meaning that the result is "\n\x20xx\n\x20\x20x": has one more indent then in previous example).
>
>> I don’t think that would help anything. If the parser doesn’t know how to do
>> the new syntax with the existing << operator, you’ll get an error, and if the
>> parser doesn’t know the new operator, you’ll get an error. Same difference.
>
> It is a consideration that we might prefer that older shells choke on
> the new syntax. I think having them choke by failing to find the end
> marker is rather worse than having them choke by failing to recognize
> the operator -- something that wrongly appears to be the end marker
> might appear later in the script if we go your "Alternative 2" route.
>
> Taken literally, my example above would be accepted by an older shell
> and processed without stripping the leading spaces. If that's
> unacceptable, we need a different (and currently invalid) replacement
> for "<<-" (the only thing that comes to mind is "<<|" which seems a
> bad choice).
YAML uses `|` and `>` to start block scalars, that’s why I used `|` above (`<<>` seems odd and may be confused with `<>`). Not sure why this should be a bad choice: `|` already has different meanings in different contexts, though only three (pipe, or and array subtraction (`${:|}`)) so far. `-` used in `<<-` has much more meanings: negation/subtraction, stripping leading spaces, prepending `-` to `argv[0]` (i.e. running as login shell in most cases), stdin, rest arguments separator (`echo - -E` outputs just `-E`, though not sure whether it is intentional, `--` in many commands definitely is), close (in `>& -`), range, default (in `${:-}`), dereference (in `*(-/)`), flags leader (in almost any command and also in `$-`).
---
`sed`-based alternative is not good for the same reason I would reject any explicitly added spaces. If bother with this at all, it should satisfy the following requirements:
- Keep extra indent (or `<<-` would be mostly fine, though better something which also removes spaces).
- Allow easy reindenting with simple editor command that reindents (like `<{motion}` and `>{motion}` in Vim) without any additional actions (or `sed` would be mostly fine).
- Allow indenting end marker as user likes (or, at least, as the initial indent: one space in the examples): basically I would treat `cat <<| EOF` as something like `{` or `do` and `EOF` as `}` or `done`: semantically they are literal block header and literal block terminator and thus `EOF` should be with the same indent as `cat` and *less* indented then other text which it is not a part of.
Messages sorted by:
Reverse Date,
Date,
Thread,
Author