Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: extraction of a string from one another



2024-12-08 05:16:49 +0100, aegy:
> Hi,
> 
> h=http://www.try.org/examples/easy
> 
> how can I obtain  http://www.try.org from $h ?
[...]

Some more options:

$ print -r - ${h:h2}
http://www.try.org

(the first 2 "h"ead components)

$ set -o extendedglob
$ print -r - ${(M)h##*://[^/]#}
http://www.try.org

The longest (2 #s) leading (#) part that "M"atches *://[^/]#
(any number of characters, followed by :// followed by any
number of characters other than /).

Since zsh supports perl regexps, you can also use the regexp
from the Regexp::Common::URI::http perl module:

$ perl -MRegexp::Common=URI -E 'say $RE{URI}{HTTP}{-keep}'
((http)://((?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::((?:[0-9]*)))?(/(((?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?]((?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)

So:

set -o rematchpcre
uri_regex='((http)://((?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::((?:[0-9]*)))?(/(((?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'\''():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?]((?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'\''()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)'
if [[ $h =~ "^(?:$uri_regex)\z" ]]; then
  print -r - "$match[2]://$match[3]${match[4]:+:$match[4]}"
else
  print -ru2 Not a valid URI
fi 

(strangely enough, it doesn't support http://user@host and
doesn't extract the fragment, sounds like there's room for
improvement)

-- 
Stephane




Messages sorted by: Reverse Date, Date, Thread, Author