csv as a database table

CSV files can be used on the Unix command line like database tables.  grep can act as a where-clause, and awk can be used as column selector.  However the csv utility from vbin makes it easier.

Here it is.

   $csv -p books.csv
   id, isbn,          name,                                    publicated, type
   1,  9781557427960, The Picture of Dorian Gray,              1891,       Novel
   2,  9780140283297, On the Road,                             1957,       Novel
   3,  9781851243969, Frankenstein; or, The Modern Prometheus, 1818,       Novel
   4,  9780345347954, Childhood's End,                         1966,       Novel
   5,  9780451457998, A Clockwork Orange,                      1962,       Novel
   6,  9780440184621, Tai-Pan,                                 1982,       Novel
   7,  9780486266848, Another Turn of the Screw,               1898,       Novel
   8,  9780486280615, Adventures of Huckleberry Finn,          1884,       Novel
   9,  9780143104889, A Princess of Mars,                      1917,       Novel
   13, 9781614270621, The Prophet,                             1923,       Poetry
   21, 9780374528379, Brothers Karamazov,                      1880,       Novel

The -p makes the output pretty and easy to read. Similar to MySQL’s desc output.  Here is another example.

   $ csv -s books.csv
   1. id
   2. isbn
   3. name
   4. publicated
   5. type

The -s shows header info. Useful for choosing or rearranging columns by number:

   $ csv -c4,3 books.csv |grep ^18 |sort -n
   1818,Frankenstein; or, The Modern Prometheus
   1880,Brothers Karamazov
   1884,Adventures of Huckleberry Finn
   1891,The Picture of Dorian Gray
   1898,Another Turn of the Screw

This example chooses “published” and “name” columns (switching their order – something Unix cut can not do), and selects only those in the 1800s.

Here’s a humdinger:

   ./providers.py list | grep -ve '^id' -e '^$' |csv - -c2| sort | while read p; do echo -n "$p,"; ./providers.py customers $c | wc -l ; done | csv - -p

This takes the list output of some providers.py script, remove lines not beginning with id, and blank lines, graps the second column, sorts them, and then sends them back into the providers.py script to get a count of customers for that provider. The output might look something like this:

   ACE,      96
   NYSE,     1300
   OPC,      1400
   PGEG,     560
   VERT,     131
   VERT-SCO, 1430

The dash (-) allows csv to process <STDIN> rather than a given filename, like Unix’s gzip does.

2 thoughts on “csv as a database table”

  1. Hi David, great post.

    Yep, some powerful unix commands there when wielded right.

    This reminded me of a feature in Oracle that could be used to view a CSV file. It was called external tables. MySQL also has a CSV storage engine.

    1. Thanks for commenting Sean. I didn’t know about mysql csv engine. That’s pretty awesome. I know a few places that will be handy for lightweight apps. I can’t wait to check i tout. The intent of this csv command line utility is to create easy access to data manipulation without the huge Verbosity of SQL.

Leave a Reply

Your email address will not be published. Required fields are marked *