Take some lines

Do you ever want to take some lines from the middle of a file say, lines 300-400?   You can do that in a kludgey way with Unix head and tail:

   cat ids.csv | head -400 | tail -100

The take command from vbin tools does just that.

   cat ids.csv |take 300 400

Here we flatten the results and build an SQL IN statement:

   cat ids.csv |take 20001 30000 |tr '\n' ' ' | sed "s/ /, /g" > get_customers.sql

By adding a little SQL before and after the list we get something like this:

select
   distinct u.customer_id, u.state
from
   imports i
   join usage u on i.id = u.import_id
where
   i.id in (
40722, 41483, 50364, 52623, 53049, 54795, 73451, 
 ... (thousands of ids here) ... 
986764, 986764, 986764, 986764, 986764, 986764
);

 

So, go ahead, takes some lines.

Also see clip, for panning right to left.

MVC is Overrated

The Model-View-Controller Pattern (MVC) is not the only game in town.  In fact it is awkward and cumbersome in some cases.

MVC is a Design Pattern, and Design Patterns show us how to solve common problems with common solutions.   However we should not just use them directly all the time – but rather learn from them and use them as reference points for solving problems.

The most worthwhile component of MVC is the Model, (or Data Layer).  It makes good sense to isolate it and encapsulate it.   It make for very robust and reusable code.  All your SQL related calls should be in one place.

Code above the Data Layer can express itself in more business friendly terms, ie.  book = Book(‘Moby Dick’), and not ‘select * from books where title like ‘%moby dick%’, and status_id = 1;’

In MVC the View and the Controller components are isolated and encapsulated.   I disagree — This can makes simple tasks harder. Views and Controller functionality should be able to commingle.  This is were a lot of the interesting and sophisticated coding happens.

For Web Applications its HTML, CSS, Javascript, and AJAX that we use to communicate with the User.  We should come up with our own patterns that best suite our needs.

What is in a Name?

What should we call it?   Call it whatever you like.

I hear that a lot.  As coders and builders of systems, we should take the time needed to come up with the best names for the systems, databases, tables, columns, filenamesmodules, classes, methods, functions, and variable names, we create.

To name something is to know it.

Good names means less confusion and better understanding.  It makes talking about the system more natural.  It improves dialog between business users and the tech team.  In fact we should think as the business user when naming our components.

It makes it easier for you and others to maintain your code.

So take your time,  step back, think hard about what something truly is –  and name it accurately.

Stitch in Time

A stitch in time saves nine, is a fundamental principals we should use to write code and build systems.   It might be said, a stitch in time saves you from being stuck forever with unruly software.

People just want to get the job done.  But any small mistake or short sightedness up front in critical parts of the system, will trip you up right away in the next wave of feature enhancements, and bug fixes.

An ounce of good design is worth a million dollars a few years down the road, for the people who will be depending on it later.

And good design — takes a little more time.

csv as a database table

CSV files can be used on the Unix command line like database tables.  grep can act as a where-clause, and awk can be used as column selector.  However the csv utility from vbin makes it easier.

Here it is.

   $csv -p books.csv
   id, isbn,          name,                                    publicated, type
   1,  9781557427960, The Picture of Dorian Gray,              1891,       Novel
   2,  9780140283297, On the Road,                             1957,       Novel
   3,  9781851243969, Frankenstein; or, The Modern Prometheus, 1818,       Novel
   4,  9780345347954, Childhood's End,                         1966,       Novel
   5,  9780451457998, A Clockwork Orange,                      1962,       Novel
   6,  9780440184621, Tai-Pan,                                 1982,       Novel
   7,  9780486266848, Another Turn of the Screw,               1898,       Novel
   8,  9780486280615, Adventures of Huckleberry Finn,          1884,       Novel
   9,  9780143104889, A Princess of Mars,                      1917,       Novel
   13, 9781614270621, The Prophet,                             1923,       Poetry
   21, 9780374528379, Brothers Karamazov,                      1880,       Novel

The -p makes the output pretty and easy to read. Similar to MySQL’s desc output.  Here is another example.

   $ csv -s books.csv
   1. id
   2. isbn
   3. name
   4. publicated
   5. type

The -s shows header info. Useful for choosing or rearranging columns by number:

   $ csv -c4,3 books.csv |grep ^18 |sort -n
   1818,Frankenstein; or, The Modern Prometheus
   1880,Brothers Karamazov
   1884,Adventures of Huckleberry Finn
   1891,The Picture of Dorian Gray
   1898,Another Turn of the Screw

This example chooses “published” and “name” columns (switching their order – something Unix cut can not do), and selects only those in the 1800s.

Here’s a humdinger:

   ./providers.py list | grep -ve '^id' -e '^$' |csv - -c2| sort | while read p; do echo -n "$p,"; ./providers.py customers $c | wc -l ; done | csv - -p

This takes the list output of some providers.py script, remove lines not beginning with id, and blank lines, graps the second column, sorts them, and then sends them back into the providers.py script to get a count of customers for that provider. The output might look something like this:

   ACE,      96
   NYSE,     1300
   OPC,      1400
   PGEG,     560
   VERT,     131
   VERT-SCO, 1430

The dash (-) allows csv to process <STDIN> rather than a given filename, like Unix’s gzip does.

grep ‘^$’

That’s wizardry for you.   How many times have you typed grep ‘^$’?

Do you grep?

grep, the Unix command that searches files for patterns, is one of the most useful Unix utilities.   This example here grep ‘^$’ passes a regular expression with two characters:  a carrot (^) which means the beginning of line, and a dollar sign ($) which means, end of line.  Taken together they say, return me ever blank line – that is, has nothing between beginning and end of line.

Here it is in action.  Show me everything that is meaningful in my apache configuration file – that which is not a comment and is not a blank line.  The -e allows multiple patterns.  The -v of course reverses things, it shows everything not matching the pattern.

grep -ve '^#' -e '^$' apache2.conf | more

 

False Optimization

So often I hear tech folks talking about some implementation as being better than another because “It is more efficient“.   Efficient?   Efficient for who?  The computer?  Are we shaving milliseconds off processing time?   That’s ridiculous.

We should be in the business of optimizing human time, not computer time.   Human time is more expensive – while computer time is cheap.

We want code to be laid out clearly and easy to read.  We want code to be like a poem.  We want something that we can maintain over time without having to scratch our heads every time we come back to.

A simple example, have a look at,

select
   id, process_date, qty
from
   usage
where
   coalesce(units, '') not in ('K1', 'K2', 'K3')
;

What the heck is that coalesce doing in the where clause? If you think about it for a minute it make sense.  Oh, it’s more efficient for the computer that way. Who cares?  How about

select
   id, process_date, qty
from
   usage
where
   units is not null and
   units not in in ('K1', 'K2', 'K3')
;

Here the where clause is immediately easy to understand and to maintain.  It speaks to the essence of the problem in clear pseudo English.

PHP vs Python

$What is $easier to $read?  {A $document of $commands in a $pseudo $English $language $strewed with ($dollar $sign $symbols ($$), $French $Braces ({), and $semi-colons)?  Or a $document $without all $that?}

What is easier to read?  A document of commands in a pseudo English language strewed with dollar sign symbols ($), Flower Braces ({), and semi-colons?  Or a document without all that?

PHP

$logger = new Logger();
if ($error) {
   $logger->info('Problemo');
}

4 lines.  70 characters.

Python

logger = Logger()
if error:
   logger.info('Problemo')

3 lines. 50 characters

sav, savdiff, and unsav

vbin has a very useful command called sav.

You always need to make a backup copy of a file before messing with it.  A simple

   $ cp book.xml -p book.xml.sav

will do.  The -p preserves the original date and ownership of the file.

sav does this for you. In other words:

   $ sav book.xml
   cp -p book.xml book.xml.sav

Its that’s simple.  Its just a convenient way to make a quick backup before you edit something.  If you want to see the changes since you backed it up, you can use savdiff:

   $ savdiff book.xml
   diff book.xml.sav book.xml

If you want to revert back you can use unsav:

   $ unsav book.xml
   cp -p book.xml.sav book.xml