The CSV output format in psql
Exporting query results in CSV has been possible for a long time
(since version 8.0), either with
COPY (SELECT ...) TO STDOUT CSV
as an SQL
command, or with the \copy
meta-command in psql, which invokes
COPY
under the hood and handles the flow of data on the client side.
But there are still a few cases not covered by this functionality,
which is why in PostgreSQL 12,
CSV has been added to the output formats. It means that we can now
issue \pset format csv
, so that all commands producing tabular
results can output them in the CSV format.
It can also be opted for on the command line with the --csv
option.
In practice, we need to use an output format instead of
\copy
:
-
when data must be fetched through a query that is not supported by COPY, for instance a cursor, since
fetch from c
is valid butcopy (fetch from c) to stdout csv
is not. -
when the result is produced by a meta-command instead of a query:
\crosstabview
,\l
,\d
etc…
CSV as a replacement for the unaligned format (-A)
Aside from \copy
, simple exports in tabular format are often done
with the “unaligned” format with its field separator given by the
fieldsep
parameter (-F
option).
But this format has two weaknesses:
- when the separator appears inside the data, there’s no way to distinguish it (no escape mechanism).
- when line feeds are present inside fields (in multi-line text), there’s no way to distinguish a line feed within a field from a record separator.
Example:
$ query="select 'ab,cd' as col1, 'ef'||chr(10)||'gh' as col2"
$ psql -AtF',' -c "$query"
ab,cd,ef
gh
In the above example, when trying to interpret the output, we can’t know where the fields start and end, nor how many records there were in the source data.
Although it comes from a single record of two columns, this output
could just as well represent a single column with
ab,cd,ef
as the value of the first record, and gh
for the second one.
The CSV format solves this problem:
$ psql --csv -t -c "$query"
"ab,cd","ef
gh"
To avoid ambiguities in the output, CSV requires to add double quotes around any field that contains the separator, or a linefeed, or a double quote (in the latter case any double quote inside the field is doubled).
CSV-compliant readers can read that content back and restore it unambiguously, including for multi-line fields.
CSV as an intermediate format
CSV can also be used as a proxy to another format that psql doesn’t generate directly. To that effect, a csv-to-another-format filtering script can be placed to receive the output of psql. For the part meant to read CSV, the script will often be simple to write because most popular languages have full parsers for it ready to be used, thanks to the ubiquity of CSV.
Here’s for instance a Perl csv-to-markdown
program that transforms
UTF-8 CSV into the markdown format as used by github, and meant to produce
HTML.
Writing this in languages like Ruby or Python would probably be
as simple, as well as
generating other formats than markdown or customize this output.
There are different ways to invoke this filter within psql:
\pset format csv
-- method 1 with \g (one output per query)
select 'ab,cd' as col1, 'ef'||chr(10)||'gh' as col2
\g |csv-to-markdown >/tmp/table1.md
-- method 2 with \o (applies to all following queries and metacommands)
\o |csv-to-markdown >/tmp/table2.md
select 'ab,cd' as col1, 'ef'||chr(10)||'gh' as col2;
-- restore the display in aligned format and to the standard output
\a \o
For the output of a single query, it can also be used in a single command line.
$ psql --csv -c "$query" | csv-to-markdown >/tmp/table3.md
In all above cases, the result produced is:
col1|col2
---|---
ab,cd|ef<br>gh
where the two fields are correctly interpreted.