Libraries over Frameworks

What is the benefit of using a framework like Ruby-on-Rails, Pylons or Drupal?  Simply put, it helps us start and develop code from nothing quickly.   But it does not have long lasting staying power.   If you plan to be using the system for years to come – the advantage of the quick start up is out weighed by the restrictions placed upon you by the framework.

So often in my experience working on legacy systems in various industries the Framework itself becomes the enemy of quick bug fixes and feature enhancements.

I prefer to use libraries rather than using frameworks.  We all need libraries because we don’t want to reinvent everything like database connectivity, yaml parsing, json parsing, and interesting things like zipcode distance calculations, etc.

But frameworks are something we can write ourselves very easily.   The advantage of doing so is the ability to understand and have complete control over everything in the system, and only have those things we really need, and nothing else.

Lightweight “frameworks” like Cherrypy and Flask for python and Sinatra for ruby, which can be considered libraries for http-routing, rather than frameworks, are more elegant solutions than full fledged MVC frameworks.



Data Layer is King

It is far better to have an excellent Database Design and a crappy Application Layer, than an excellent Application Layer with a crappy Database design.

Why is that?  Simply put, you can always skim your Application code off the top and write a new one.   But you can not simply swap out the Data Layer without completely undermining the Application.

The closer we can model our business logic in the database itself – the better it is for the business.  When new unforseen questions arise about the business, we can always run adhoc queries from a well designed database schema to get the answers.

For example:

  • Who are our top customers?
  • What is revenue month over month?  Year over year?
  • What is the distribution of sales by product types across all income channels?
  • What percentage of our customers pay late over 30 days?
  • What time of day do we have the most volume?
  • What is the percentage of repeat customers vs first time buyers?
  • Has revenue gone up since the last website revamp?

All of these questions are very easily answered with a little SQL magic, when the database is well designed.  And all of these questions can be horrendously hard to answer when it is not.

For all important systems, the Data Layer must be King.



Why SHOUT in your SQL?

Somewhere someone started the tradition of capitalizing keywords in SQL.   Its like shouting all the unimportant parts of the code and making what’s important harder to read.

   SELECT as book_id, as author, b.pub_date as publication_date FORM books b LEFT JOIN authors a on = a.book_id WHERE foo_bar = 'Fibonacci';

I say stop doing that.  And use instead super indentation and formatting.  Pseudo English is easier to read when it is not capitalized.

   select       as book_id,     as author,
      b.pub_date as publication_date
      left join authors a on = a.book_id
      foo_bar = 'Fibonacci'

See: sql_prettyHelp

Pull Requests improve Quality

A Pull Request, which might more sensibly be called a Merge Request, is the act of a someone asking another to merge his or her code into theirs.  It is all so civil.

It solves an age old problem of how to do code review, in a more effective less disruptive way.  The simple rule of thumb is “Nobody merges in their own code.”  You get someone else to do it.

That simple act of asking another to merge implies they sign off on it.   They may simply say, “Oh, Zac knows what he’s doing here, and he knows this code better then I do”, and accept the Pull Request and Merge it in to the main branch.

Or one might download the branch, test it out in isolation, ask questions about why the coder did this or that, suggest changes based on coding standards, convention, coding logic or point out bugs.

Either of the two extremes add value, improves communication and does wonders to improve the overall quality of the output of the team.

And Quality is the name of the game.

Take some lines

Do you ever want to take some lines from the middle of a file say, lines 300-400?   You can do that in a kludgey way with Unix head and tail:

   cat ids.csv | head -400 | tail -100

The take command from vbin tools does just that.

   cat ids.csv |take 300 400

Here we flatten the results and build an SQL IN statement:

   cat ids.csv |take 20001 30000 |tr '\n' ' ' | sed "s/ /, /g" > get_customers.sql

By adding a little SQL before and after the list we get something like this:

   distinct u.customer_id, u.state
   imports i
   join usage u on = u.import_id
where in (
40722, 41483, 50364, 52623, 53049, 54795, 73451, 
 ... (thousands of ids here) ... 
986764, 986764, 986764, 986764, 986764, 986764


So, go ahead, takes some lines.

Also see clip, for panning right to left.

MVC is Overrated

The Model-View-Controller Pattern (MVC) is not the only game in town.  In fact it is awkward and cumbersome in some cases.

MVC is a Design Pattern, and Design Patterns show us how to solve common problems with common solutions.   However we should not just use them directly all the time – but rather learn from them and use them as reference points for solving problems.

The most worthwhile component of MVC is the Model, (or Data Layer).  It makes good sense to isolate it and encapsulate it.   It make for very robust and reusable code.  All your SQL related calls should be in one place.

Code above the Data Layer can express itself in more business friendly terms, ie.  book = Book(‘Moby Dick’), and not ‘select * from books where title like ‘%moby dick%’, and status_id = 1;’

In MVC the View and the Controller components are isolated and encapsulated.   I disagree — This can makes simple tasks harder. Views and Controller functionality should be able to commingle.  This is were a lot of the interesting and sophisticated coding happens.

For Web Applications its HTML, CSS, Javascript, and AJAX that we use to communicate with the User.  We should come up with our own patterns that best suite our needs.

What is in a Name?

What should we call it?   Call it whatever you like.

I hear that a lot.  As coders and builders of systems, we should take the time needed to come up with the best names for the systems, databases, tables, columns, filenamesmodules, classes, methods, functions, and variable names, we create.

To name something is to know it.

Good names means less confusion and better understanding.  It makes talking about the system more natural.  It improves dialog between business users and the tech team.  In fact we should think as the business user when naming our components.

It makes it easier for you and others to maintain your code.

So take your time,  step back, think hard about what something truly is –  and name it accurately.

Stitch in Time

A stitch in time saves nine, is a fundamental principals we should use to write code and build systems.   It might be said, a stitch in time saves you from being stuck forever with unruly software.

People just want to get the job done.  But any small mistake or short sightedness up front in critical parts of the system, will trip you up right away in the next wave of feature enhancements, and bug fixes.

An ounce of good design is worth a million dollars a few years down the road, for the people who will be depending on it later.

And good design — takes a little more time.

csv as a database table

CSV files can be used on the Unix command line like database tables.  grep can act as a where-clause, and awk can be used as column selector.  However the csv utility from vbin makes it easier.

Here it is.

   $csv -p books.csv
   id, isbn,          name,                                    publicated, type
   1,  9781557427960, The Picture of Dorian Gray,              1891,       Novel
   2,  9780140283297, On the Road,                             1957,       Novel
   3,  9781851243969, Frankenstein; or, The Modern Prometheus, 1818,       Novel
   4,  9780345347954, Childhood's End,                         1966,       Novel
   5,  9780451457998, A Clockwork Orange,                      1962,       Novel
   6,  9780440184621, Tai-Pan,                                 1982,       Novel
   7,  9780486266848, Another Turn of the Screw,               1898,       Novel
   8,  9780486280615, Adventures of Huckleberry Finn,          1884,       Novel
   9,  9780143104889, A Princess of Mars,                      1917,       Novel
   13, 9781614270621, The Prophet,                             1923,       Poetry
   21, 9780374528379, Brothers Karamazov,                      1880,       Novel

The -p makes the output pretty and easy to read. Similar to MySQL’s desc output.  Here is another example.

   $ csv -s books.csv
   1. id
   2. isbn
   3. name
   4. publicated
   5. type

The -s shows header info. Useful for choosing or rearranging columns by number:

   $ csv -c4,3 books.csv |grep ^18 |sort -n
   1818,Frankenstein; or, The Modern Prometheus
   1880,Brothers Karamazov
   1884,Adventures of Huckleberry Finn
   1891,The Picture of Dorian Gray
   1898,Another Turn of the Screw

This example chooses “published” and “name” columns (switching their order – something Unix cut can not do), and selects only those in the 1800s.

Here’s a humdinger:

   ./ list | grep -ve '^id' -e '^$' |csv - -c2| sort | while read p; do echo -n "$p,"; ./ customers $c | wc -l ; done | csv - -p

This takes the list output of some script, remove lines not beginning with id, and blank lines, graps the second column, sorts them, and then sends them back into the script to get a count of customers for that provider. The output might look something like this:

   ACE,      96
   NYSE,     1300
   OPC,      1400
   PGEG,     560
   VERT,     131
   VERT-SCO, 1430

The dash (-) allows csv to process <STDIN> rather than a given filename, like Unix’s gzip does.