csv | Sandro on Code

CSV files can be used on the Unix command line like database tables. grep can act as a where-clause, and awk can be used as column selector. However the csv utility from vbin makes it easier.

Here it is.

   $csv -p books.csv
   id, isbn,          name,                                    publicated, type
   1,  9781557427960, The Picture of Dorian Gray,              1891,       Novel
   2,  9780140283297, On the Road,                             1957,       Novel
   3,  9781851243969, Frankenstein; or, The Modern Prometheus, 1818,       Novel
   4,  9780345347954, Childhood's End,                         1966,       Novel
   5,  9780451457998, A Clockwork Orange,                      1962,       Novel
   6,  9780440184621, Tai-Pan,                                 1982,       Novel
   7,  9780486266848, Another Turn of the Screw,               1898,       Novel
   8,  9780486280615, Adventures of Huckleberry Finn,          1884,       Novel
   9,  9780143104889, A Princess of Mars,                      1917,       Novel
   13, 9781614270621, The Prophet,                             1923,       Poetry
   21, 9780374528379, Brothers Karamazov,                      1880,       Novel

The -p makes the output pretty and easy to read. Similar to MySQL’s desc output. Here is another example.

   $ csv -s books.csv
   1. id
   2. isbn
   3. name
   4. publicated
   5. type

The -s shows header info. Useful for choosing or rearranging columns by number:

   $ csv -c4,3 books.csv |grep ^18 |sort -n
   1818,Frankenstein; or, The Modern Prometheus
   1880,Brothers Karamazov
   1884,Adventures of Huckleberry Finn
   1891,The Picture of Dorian Gray
   1898,Another Turn of the Screw

This example chooses “published” and “name” columns (switching their order – something Unix cut can not do), and selects only those in the 1800s.

Here’s a humdinger:

   ./providers.py list | grep -ve '^id' -e '^$' |csv - -c2| sort | while read p; do echo -n "$p,"; ./providers.py customers $c | wc -l ; done | csv - -p

This takes the list output of some providers.py script, remove lines not beginning with id, and blank lines, graps the second column, sorts them, and then sends them back into the providers.py script to get a count of customers for that provider. The output might look something like this:

   ACE,      96
   NYSE,     1300
   OPC,      1400
   PGEG,     560
   VERT,     131
   VERT-SCO, 1430

The dash (-) allows csv to process <STDIN> rather than a given filename, like Unix’s gzip does.

Sandro on Code

Category Archives: csv

csv as a database table

Programming Craftmanship