Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: extraction of a string from one another

X-seq: zsh-users 30139
From: Stephane Chazelas <stephane@xxxxxxxxxxxx>
To: aegy <aegy@xxxxxxx>
Cc: Zsh Users <zsh-users@xxxxxxx>
Subject: Re: extraction of a string from one another
Date: Tue, 10 Dec 2024 08:48:04 +0000
Archived-at: <https://zsh.org/users/30139>
In-reply-to: <473ab10d-04d2-4da2-958e-1ef00afa9640@free.fr>
List-id: <zsh-users.zsh.org>
Mail-followup-to: aegy <aegy@xxxxxxx>, Zsh Users <zsh-users@xxxxxxx>
References: <473ab10d-04d2-4da2-958e-1ef00afa9640@free.fr>

2024-12-08 05:16:49 +0100, aegy:
> Hi,
> 
> h=http://www.try.org/examples/easy
> 
> how can I obtain  http://www.try.org from $h ?
[...]

Some more options:

$ print -r - ${h:h2}
http://www.try.org

(the first 2 "h"ead components)

$ set -o extendedglob
$ print -r - ${(M)h##*://[^/]#}
http://www.try.org

The longest (2 #s) leading (#) part that "M"atches *://[^/]#
(any number of characters, followed by :// followed by any
number of characters other than /).

Since zsh supports perl regexps, you can also use the regexp
from the Regexp::Common::URI::http perl module:

$ perl -MRegexp::Common=URI -E 'say $RE{URI}{HTTP}{-keep}'
((http)://((?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::((?:[0-9]*)))?(/(((?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?]((?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)

So:

set -o rematchpcre
uri_regex='((http)://((?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::((?:[0-9]*)))?(/(((?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?]((?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'\''()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)'
if [[ $h =~ "^(?:$uri_regex)\z" ]]; then
  print -r - "$match[2]://$match[3]${match[4]:+:$match[4]}"
else
  print -ru2 Not a valid URI
fi 

(strangely enough, it doesn't support http://user@host and
doesn't extract the fragment, sounds like there's room for
improvement)

-- 
Stephane

References:
- extraction of a string from one another
  - From: aegy

Messages sorted by: Reverse Date, Date, Thread, Author