Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: extraction of a string from one another
- X-seq: zsh-users 30139
- From: Stephane Chazelas <stephane@xxxxxxxxxxxx>
- To: aegy <aegy@xxxxxxx>
- Cc: Zsh Users <zsh-users@xxxxxxx>
- Subject: Re: extraction of a string from one another
- Date: Tue, 10 Dec 2024 08:48:04 +0000
- Archived-at: <https://zsh.org/users/30139>
- In-reply-to: <473ab10d-04d2-4da2-958e-1ef00afa9640@free.fr>
- List-id: <zsh-users.zsh.org>
- Mail-followup-to: aegy <aegy@xxxxxxx>, Zsh Users <zsh-users@xxxxxxx>
- References: <473ab10d-04d2-4da2-958e-1ef00afa9640@free.fr>
2024-12-08 05:16:49 +0100, aegy:
> Hi,
>
> h=http://www.try.org/examples/easy
>
> how can I obtain http://www.try.org from $h ?
[...]
Some more options:
$ print -r - ${h:h2}
http://www.try.org
(the first 2 "h"ead components)
$ set -o extendedglob
$ print -r - ${(M)h##*://[^/]#}
http://www.try.org
The longest (2 #s) leading (#) part that "M"atches *://[^/]#
(any number of characters, followed by :// followed by any
number of characters other than /).
Since zsh supports perl regexps, you can also use the regexp
from the Regexp::Common::URI::http perl module:
$ perl -MRegexp::Common=URI -E 'say $RE{URI}{HTTP}{-keep}'
((http)://((?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::((?:[0-9]*)))?(/(((?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?]((?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)
So:
set -o rematchpcre
uri_regex='((http)://((?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::((?:[0-9]*)))?(/(((?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?]((?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'\''()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)'
if [[ $h =~ "^(?:$uri_regex)\z" ]]; then
print -r - "$match[2]://$match[3]${match[4]:+:$match[4]}"
else
print -ru2 Not a valid URI
fi
(strangely enough, it doesn't support http://user@host and
doesn't extract the fragment, sounds like there's room for
improvement)
--
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author