Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Parsing CVS files
- X-seq: zsh-users 23854
- From: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
- To: Zsh Users <zsh-users@xxxxxxx>
- Subject: Re: Parsing CVS files
- Date: Sat, 02 Feb 2019 20:14:25 +0000
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1549138466; bh=6AxLSX+srRbNJ+kVilc8PURp3G7I9mmQXrobbxnHP2A=; h=Subject:From:To:Date:In-Reply-To:References; b=VoBPRdNWTjMzK/VlCM/WdNlEYQdhnqdUoAi7CntLmTwM47StqXJaDwjd7wqEsUXQG ilXxZCIol78QzBm19KY+0nY1ajJjLk3z31kWHjDHOXP5cDgEnSytA0zdlrYYWQ21TN XIA6haCMpm1lHixVGsDh4DVOz72n8knlWEdUBCotpx2BBhPS+hR2bPQY3UiukO1DxA JvCtlchMtpmsPdSyQuVK31AHV/N7fT26LPx7bV4Q3Ke5R26Iju6nFXL4bK6G/eItlv iyZ2RIrmUfpmam9htje71PqzEFb2aohOdBmVTMU+iXBaIEerb3h9pMfpeW9+ag4kwA jWG1JorxhTJgQ==
- In-reply-to: <20190202185911.vsmvvsp5df3yaa3z@gmx.de>
- List-help: <mailto:zsh-users-help@zsh.org>
- List-id: Zsh Users List <zsh-users.zsh.org>
- List-post: <mailto:zsh-users@zsh.org>
- List-unsubscribe: <mailto:zsh-users-unsubscribe@zsh.org>
- Mailing-list: contact zsh-users-help@xxxxxxx; run by ezmlm
- References: <20190202185911.vsmvvsp5df3yaa3z@gmx.de>
> I'm looking for an easy way to split the lines of a .csv file into
> the fields of an array variable. There's a script that does that
> somewhore on the net. But that script parses lines character by
> character and just manages to parse about 100 (long) lines per
> second.
>
> Fields in a .csv file are separated by commas, *but* commas
> between a pair of quotes do not split. Or phrased differently:
> Commas that have an even number of double quotes left of them do
> split, but commas with an uneven number left of then don't split.
>
> Any ideas for a quick implementation?
Sebastian has done similar things so may have better ideas.
If you're happy to use shell syntax --- in other words, the other forms
of quoting are active, not just double quotes, so backslashes and single
quotes might do inconvenient things --- and you're not too bothered
about unquoted spaces, which will add extra lines of splitting, you can
use this trick:
% line='This,"is, quite possibly, a",line,"of,stuff","with,commas"'
% print -rl ${(Q)${${(z)${line//,/, }}%%,}//, /,}
This
is, quite possibly, a
line
of,stuff
with,commas
Each comma gets a space added, then the line is split on syntactically
active spaces; any comma at the end of a field is removed; the remaining
commas are restored.
To strip the quotes, add the (Q) flag to the outermost step.
If you need to be careful about unquoted spaces, you need to be
cleverer: e.g. backslash quote them and then remove the bacslashes
later. E.g. up to subtle effects associated with backslashes
print -rl ${${${${(z)${${line// /\\ }//,/, }}%%,}//, /,}//\\ / }
will retain existing spaces.
Also, if you want to keep empty fields, you'll need the final result
to use "${(@}this}". Probably easiest to assign to an array as otherwise
the quotes will affect the substitution.
If you're worried about subtle effects with backslashes, I don't think
you're ever going to be satisfied with a quick and dirty hack like this,
so you'll have to decide how sophisticated you need to be.
pws
Messages sorted by:
Reverse Date,
Date,
Thread,
Author