Category Archives: SQL

SQL Formatting

Let’s standardize how we write SQL.  What joy.   Rarely, have I seen consistency in how SQL is formatted, and often it is not even considered.

Here is the standard I use which makes it  easy to read both simple and complex queries.  Here is an example:

select
   b.id                                       as book_id,
   b.title                                    ,
   group_concat(a.first_name, ' ', 
                a.last_name, separator ', ')  as authors,
   b.published                                ,
   b.isbn
from
   books b
   left join book_authors ba on b.id = ba.book_id
   left join authors a on ba.author_id = a.id 
where
   b.published < '1900' and
   a.nationality = 'English'
group by
   b.id
order by
   b.published desc,
   b.title
limit
   100
;

Rules

  1. Lower case all keywords. See Why Shout in your SQL?
  2. Keywords that group sections – select, from, where, group by, having, order by, and limit should be on their own line with no leading spaces.
  3. Code between the keyword groupings should be indented
  4. Indentation should be 3 spaces.
  5. No line should exceed 79 characters
  6. All table names must have an alias, if more than one table is used
  7. All field names should be prefixed with an alias,  if more than one table is used
  8. Select, group by, and order by clauses should have each field on it’s own line, but may not if the list is small.
  9. Select fields should be broken up into multiple properly indented lines when long or complicated.  For example, when using functions like concat_ws, substring, etc.
  10. Select clauses may horizontally align ‘as’ names
  11. From clause should have each table on its own line
  12. From join clauses may want to have each join condition on its own line indented in, if they are long
  13. Where and having clauses should have each ‘and’ condition on its own line
  14. Conditions may horizontally align on =, <, <=, >, >=, !=
  15. Joining keywords on, and, and or should be never start a new line.
  16. Nested queries should be formated the same way with additional indentation necessary to line them up
  17. Semicolon should be on it’s own line with no leading spaces.

More Examples

Line up field ‘as’ names
select
   a.col              as name1
   a.long_column_name as name2,
   func(a.other)      as name3
Long join statements
from
   usage u
   left join usage_detail ud on
      u.usage_id = ud.usage_id and
      u.status_id <= 500 and
      (ud.trans_type = 'TD' or ud.position = 'R26')
The case statement
select
   case
      when t.trans_type is null then
         0
      when t.trans_type in (20, 50) then
         (-1) * qty * amount
      else
          qty * amount
      end as total
Subqueries
select
   o.id as order_id
   o.order_date,
   ois.orders
from
   orders o
   left join (
      select
         order_id
         count(*) as orders
      from
         order_details
      ) ois on o.id = ois.order_id
where
   o.order_date       >= '2015-02-01 and
   date(o.order_date) <  '2015-03-01
;
The insert statement

Same general rules apply

insert into audits
   (table_name, column_name, id, old_value, new_value)
values
   ('customers', 'status_id', 8056, 10, 200)
;
The update statement

Same general rules apply

update
   customers
set
   status_id = 500
where
   customer_id = 8011
;
Table creation scripts
set foreign_key_checks = 0;

-- drop table /*! if exists */ parties;

create table parties (
  party_id          integer unsigned  not null  auto_increment primary key ,
  party_type_id     integer unsigned  not null ,
  institution_id    integer unsigned  ,
  name              varchar(255)      ,
  first_name        varchar(255)      ,
  middle_name       varchar(255)      ,
  last_name         varchar(255)      ,
  username          varchar(255)      ,
  email             varchar(255)      ,
  phone             varchar(255)      ,
  phone2            varchar(255)      ,
  source            varchar(255)      ,
  comments          text              ,

  created           datetime          not null ,
  last_updated      timestamp         not null
        default current_timestamp on update current_timestamp ,

  unique key name_institution (name, party_type_id, institution_id),
  foreign key (party_type_id)  references party_types (party_type_id),
  foreign key (institution_id) references institutions (party_id)
)
engine InnoDB default charset=utf8;
;

show warnings;

set foreign_key_checks = 1;

desc parties;

Unix utility: sql_pretty

Here is a simple utility that will reformat SQL. It does most of it. It is a work in progress.  It uses regex but needs to do a full SQL parse to do everything: sql_pretty

Concluding

These conventions have made my life easier. Easy to read code is good code is easy living. So go clean up your SQL scripts!  And have fun.

Variable Naming Conventions

A simple rule to help improve convention and maintainability: Camelcasing should be used when dealing with Objects,  Everything else should use underscores.

Objects
  • Class definitions should be CapCase
  • Objects should be mixedCase
  • Class Methods should be mixedCase
class BookAuthors(object):    # <-- Class Definition
   def getAuthors(self, book_id):  # <-- Class Method
      ...
      return results

bookAuthors = BookAuthors()   # <-- Object
Everything Else

When not dealing with objects we should use underscores and lowercase

  • Simple instance variables
  • Functions
  • Filenames / Modules
  • SQL schema, table and column names
sql_pretty(sql):
   ...
   return sql2

num_books = len(books)
fp_debug = open('/tmp/debug.log', 'a') 

sql = 'select book_id, author_id from books.book_authors where book_id = ?'

This varies slightly from the PEP8 Naming conventions.   The PEP8 suggest class methods and non-class methods (or functions) be treated the same way, with underscores. However I prefer naming class methods with mixedCase to remind us we’re in the domain of objects.

With regard to sql schemas, table and column names, because column names can become very long, underscores makes them more readable.

Why SHOUT in your SQL?

Somewhere someone started the tradition of capitalizing keywords in SQL.   Its like shouting all the unimportant parts of the code and making what’s important harder to read.

   SELECT b.id as book_id, a.name as author, b.pub_date as publication_date FORM books b LEFT JOIN authors a on b.id = a.book_id WHERE foo_bar = 'Fibonacci';

I say stop doing that.  And use instead super indentation and formatting.  Pseudo English is easier to read when it is not capitalized.

   select
      b.id       as book_id,
      a.name     as author,
      b.pub_date as publication_date
   from
      books
      left join authors a on b.id = a.book_id
   where
      foo_bar = 'Fibonacci'

See: sql_prettyHelp

Take some lines

Do you ever want to take some lines from the middle of a file say, lines 300-400?   You can do that in a kludgey way with Unix head and tail:

   cat ids.csv | head -400 | tail -100

The take command from vbin tools does just that.

   cat ids.csv |take 300 400

Here we flatten the results and build an SQL IN statement:

   cat ids.csv |take 20001 30000 |tr '\n' ' ' | sed "s/ /, /g" > get_customers.sql

By adding a little SQL before and after the list we get something like this:

select
   distinct u.customer_id, u.state
from
   imports i
   join usage u on i.id = u.import_id
where
   i.id in (
40722, 41483, 50364, 52623, 53049, 54795, 73451, 
 ... (thousands of ids here) ... 
986764, 986764, 986764, 986764, 986764, 986764
);

 

So, go ahead, takes some lines.

Also see clip, for panning right to left.