Unix House Keeping

Where is everything? Where do files go in Unix? There are no bonafide rules. But we’re free to invent some. So here are mine.

The Heap

/data – Everything starts off in this subdirectory. Consider it the Heap from which all other areas are carved out of. We’re using the term data in its most broadest sense.

/data/apps – Installed apps that don’t install themselves into the woodworks.

/data/backups – Back up files

/data/downloads – Downloaded files

/data/www – Website Root dir


We then symlink them to root to make them easier to get to

# ln -s /data/apps /apps
# ln -s /data/backups /backups
# ln -s /data/downloads /downloads
# ln -s /data/www /www

This gives:



/apps/crowfly – Production git master branch

/home/dlink/crowfly – Development version – git feature branches

/www/crowfly.net –> /apps/crowfly/www – A symlink to the web portion or the public portion of a production application

/www/dev.crowfly.net –> /home/dlink/crowfly/www – A symlink to the web portion of a development branch.

/www/crowfly.net/purple-bat.com –> /apps/purple-bat/www – A symlink to the web portion of an application that does not have its own domain name.   It lives over the parent domain.   The source code of the parent needs to list the symlink in its .gitignore file.


/backups/flower_images_20141203.tgz – A backup of images tarred and zipped – with timestamp in name.

/backups/db/books_20150220.dump.gz – A database dump, zipped – with timestamp in name.  These are created like this:

mysqldump -uuser -ppass -hhost db | gzip - > /backups/db/books/20150220.dump.gz

and they are recovered like this:

alias bookdb='mysql -uuser -ppass -hhost db'
echo 'create database books' | booksdb
gunzip -c /backups/db/books_20150220.dump.gz | booksdb


There is such a long tradition of placing logs in /var/log that we don’t  moved to /data.

Application Logs

/var/log/plant/plant.log – Production log
/var/log/plant/stage_plant.log – Staging Log
/var/log/plant/dev_plant.log – Development Log
/var/log/plant/dlink_plant.log – Alternative Development Log (using username)
/home/dlink/log/dev_plant.log – Alternative Development Log (using user homedir)

/var/log/plant/plant_traceback.log – Production Traceback Log
/var/log/plant/stage_plant_traceback.log – Development Traceback log

Web Logs

/var/log/apache2/crowfly.net.log – Apache Log for site.
/var/log/apache2/crowfly.net.error.log – Apache Error Log for site (periods (.) not dashes)


What’s nice about adopting a standard for where files go on all your systems (Your local mac included) is you can easily move files around.  Here is an examples.

cd /backups/db
scp flyingcrows.com:$PWD/world_2014*.dump.gz .

Programs should be like Gardens

I believe programs should be like gardens.   They should be lovely to see.   Gardens are things we walk around in and return to often.  Gardens are things we wish to spend time in with others – to relax in and enjoy its beauty.

If we strive to make your programming systems like gardens, then our days as programmers will be filled with pleasure.


Variable Naming Conventions

A simple rule to help improve convention and maintainability: Camelcasing should be used when dealing with Objects,  Everything else should use underscores.

  • Class definitions should be CapCase
  • Objects should be mixedCase
  • Class Methods should be mixedCase
class BookAuthors(object):    # <-- Class Definition
   def getAuthors(self, book_id):  # <-- Class Method
      return results

bookAuthors = BookAuthors()   # <-- Object
Everything Else

When not dealing with objects we should use underscores and lowercase

  • Simple instance variables
  • Functions
  • Filenames / Modules
  • SQL schema, table and column names
   return sql2

num_books = len(books)
fp_debug = open('/tmp/debug.log', 'a') 

sql = 'select book_id, author_id from books.book_authors where book_id = ?'

This varies slightly from the PEP8 Naming conventions.   The PEP8 suggest class methods and non-class methods (or functions) be treated the same way, with underscores. However I prefer naming class methods with mixedCase to remind us we’re in the domain of objects.

With regard to sql schemas, table and column names, because column names can become very long, underscores makes them more readable.

Defensive Programming is the way to go.

Defensive programming is like defensive driving: Anticipate everything that might go wrong. If a function is passed an Id to a database table, do not assume that it is a valid Id, or an integer, or even that it has a value.

What is most important in defensive programming is to communicate clear and precise error messages when things are not as they aught be.

Here are some examples error messages:

Less then ideal error messages:

AttributeError: 'NoneType' object has no attribute 'last_name' 

_mysql_exceptions.OperationalError: (1054, "Unknown column 'Jerry' in 'where clause'")

IndexError: list index out of range

KeyError: 'Jerry'

Better ones:

BookError: Book not found: id = Jerry

AuthorError: Author not found: id = 506

getCustomers command: Expected 3 parameters, only 2 given.

FoomWebsiteError: Unable to read from http://foom.com: HTTP 500

These better error messages are not hard to do if we think about it ahead of time. Here are some examples:

class BookError(Exception): pass

class Book(object):
   def get(self, id):
      results = self.db.query(select * from books where id = ?, id)
      if not results:
         raise Bookerror('Book not found: id = %s' % id
   return results[0]

Another example:

URL = 'foom.com'
class FoomWebsiteError(Exception): pass

class FoomWebsite(object):
   def scrapePage(path, params):
      website = Website(URL)
      page = website.go(path, params)
      if website.error:
         raise WebsiteError('Unable to read from %s: %s'
            % (URL, website.error)
      lines = page.split('<p>')
      name = lines[3]
      return page


It is okay to have bugs if they are easy to find and easy to fix. Applying a little defensive programming to everything we write make debugging a breeze, and helps everyone using the system.

Working Towards the Ideal

Whether building a new system, or trying to untangle some untenable rats nest of code — One should have an image of the ideal state in their mind.

If we draw up the best plan we can, thinking in terms of the perfect – regardless of its immediate feasibility – we can then put that on the wall and work towards it.

All changes to code come in two forms:  Bug Fixes and Feature Enhancements.   There are many decisions to make in each case.  By holding up an Ideal, it helps us with those decisions.   We choose the path that helps the code converge on some well conceived plan  rather than letting it oscillating around, as is often the case.

The ideal plan is rarely realized but that’s not the point.  By aiming at one target we help the general direction of all the arrows.

So we must spend a lot of time designing the ideal.  To do that we need to create documents such as these:

– Problem Statements
– Use Case Diagrams
– Data Model Entity Relationship Diagrams (ERD),
– Class Diagrams
– Sequence Diagrams
– Wireframes, and
– Mock Reports

It is not enough to simple understand a single aspect of the system and go to work implementing it.   We must take the extra time to see how that component fits into the larger whole. The benefits of which creates flexible, easy to use and fun to maintain code.

It was Abe Lincoln who said: Give me six hours to chop down a tree and I will spend the first four sharpening the axe.

Libraries over Frameworks

What is the benefit of using a framework like Ruby-on-Rails, Pylons or Drupal?  Simply put, it helps us start and develop code from nothing quickly.   But it does not have long lasting staying power.   If you plan to be using the system for years to come – the advantage of the quick start up is out weighed by the restrictions placed upon you by the framework.

So often in my experience working on legacy systems in various industries the Framework itself becomes the enemy of quick bug fixes and feature enhancements.

I prefer to use libraries rather than using frameworks.  We all need libraries because we don’t want to reinvent everything like database connectivity, yaml parsing, json parsing, and interesting things like zipcode distance calculations, etc.

But frameworks are something we can write ourselves very easily.   The advantage of doing so is the ability to understand and have complete control over everything in the system, and only have those things we really need, and nothing else.

Lightweight “frameworks” like Cherrypy and Flask for python and Sinatra for ruby, which can be considered libraries for http-routing, rather than frameworks, are more elegant solutions than full fledged MVC frameworks.



Data Layer is King

It is far better to have an excellent Database Design and a crappy Application Layer, than an excellent Application Layer with a crappy Database design.

Why is that?  Simply put, you can always skim your Application code off the top and write a new one.   But you can not simply swap out the Data Layer without completely undermining the Application.

The closer we can model our business logic in the database itself – the better it is for the business.  When new unforseen questions arise about the business, we can always run adhoc queries from a well designed database schema to get the answers.

For example:

  • Who are our top customers?
  • What is revenue month over month?  Year over year?
  • What is the distribution of sales by product types across all income channels?
  • What percentage of our customers pay late over 30 days?
  • What time of day do we have the most volume?
  • What is the percentage of repeat customers vs first time buyers?
  • Has revenue gone up since the last website revamp?

All of these questions are very easily answered with a little SQL magic, when the database is well designed.  And all of these questions can be horrendously hard to answer when it is not.

For all important systems, the Data Layer must be King.



Why SHOUT in your SQL?

Somewhere someone started the tradition of capitalizing keywords in SQL.   Its like shouting all the unimportant parts of the code and making what’s important harder to read.

   SELECT b.id as book_id, a.name as author, b.pub_date as publication_date FORM books b LEFT JOIN authors a on b.id = a.book_id WHERE foo_bar = 'Fibonacci';

I say stop doing that.  And use instead super indentation and formatting.  Pseudo English is easier to read when it is not capitalized.

      b.id       as book_id,
      a.name     as author,
      b.pub_date as publication_date
      left join authors a on b.id = a.book_id
      foo_bar = 'Fibonacci'

See: sql_prettyHelp

Pull Requests improve Quality

A Pull Request, which might more sensibly be called a Merge Request, is the act of a someone asking another to merge his or her code into theirs.  It is all so civil.

It solves an age old problem of how to do code review, in a more effective less disruptive way.  The simple rule of thumb is “Nobody merges in their own code.”  You get someone else to do it.

That simple act of asking another to merge implies they sign off on it.   They may simply say, “Oh, Zac knows what he’s doing here, and he knows this code better then I do”, and accept the Pull Request and Merge it in to the main branch.

Or one might download the branch, test it out in isolation, ask questions about why the coder did this or that, suggest changes based on coding standards, convention, coding logic or point out bugs.

Either of the two extremes add value, improves communication and does wonders to improve the overall quality of the output of the team.

And Quality is the name of the game.

Take some lines

Do you ever want to take some lines from the middle of a file say, lines 300-400?   You can do that in a kludgey way with Unix head and tail:

   cat ids.csv | head -400 | tail -100

The take command from vbin tools does just that.

   cat ids.csv |take 300 400

Here we flatten the results and build an SQL IN statement:

   cat ids.csv |take 20001 30000 |tr '\n' ' ' | sed "s/ /, /g" > get_customers.sql

By adding a little SQL before and after the list we get something like this:

   distinct u.customer_id, u.state
   imports i
   join usage u on i.id = u.import_id
   i.id in (
40722, 41483, 50364, 52623, 53049, 54795, 73451, 
 ... (thousands of ids here) ... 
986764, 986764, 986764, 986764, 986764, 986764


So, go ahead, takes some lines.

Also see clip, for panning right to left.