Category Archives: database

Saleor Products Data Model

I’m looking to replace our Magento Ecommerce Server with something “Nice” – Something that is not complicated, not over abstracted, something that is well … fun to work with.

Saleor is an elegant solution. It is an Ecommerce app written in Python and Django.  I was able to download it and get it up and running in about 2 hours.    That says a lot.

Now, I need to write a product upload script.  I could not find a diagram of the Product Schema, so I drew one.  I did this by looking at products/models/base.py and images.py and the postgresql database tables.  It is open for comments and corrections.  Here it is:

 

 

DB, Config and Logging

The Database, Config and Logging are at the core of all application development.  Everything starts there.

A Database System stores all of the applications data.  The storage itself can be anything from a loose collection of files like jpegs, a nosql db like mongodb,  or a full fledge relational database like mysql or postgresql.   This subsystem sits right below any Model Objects.

A Configuration System allows you to configure your application without changing code.  It primarily determines where things are, such as: subdirectories, external web services, database credentials, log files etc.    You will have a different configuration file for each type of environment: dev, stage, and prod

A Logging System should be built in from the word go.   All backend tasks should be logged.  One philosophy is that all events write a single line entry of Success or Failure.   Each event can optionally also write multiple Warning or Debug logging messages.  When done this way your logs are themselves databases of all system events.   You can can easily answer questions like how many transactions processed today, and how many of them failed.

These three subsystems are needed for all application development. Here is a link to a library I have written that provides these functions.  Which I reuse it all the time.

Python Application Development – Core Library Classes – https://github.com/dlink/vlib

 

 

SQL Formatting

Let’s standardize how we write SQL.  What joy.   Rarely, have I seen consistency in how SQL is formatted, and often it is not even considered.

Here is the standard I use which makes it  easy to read both simple and complex queries.  Here is an example:

select
   b.id                                       as book_id,
   b.title                                    ,
   group_concat(a.first_name, ' ', 
                a.last_name, separator ', ')  as authors,
   b.published                                ,
   b.isbn
from
   books b
   left join book_authors ba on b.id = ba.book_id
   left join authors a on ba.author_id = a.id 
where
   b.published < '1900' and
   a.nationality = 'English'
group by
   b.id
order by
   b.published desc,
   b.title
limit
   100
;

Rules

  1. Lower case all keywords. See Why Shout in your SQL?
  2. Keywords that group sections – select, from, where, group by, having, order by, and limit should be on their own line with no leading spaces.
  3. Code between the keyword groupings should be indented
  4. Indentation should be 3 spaces.
  5. No line should exceed 79 characters
  6. All table names must have an alias, if more than one table is used
  7. All field names should be prefixed with an alias,  if more than one table is used
  8. Select, group by, and order by clauses should have each field on it’s own line, but may not if the list is small.
  9. Select fields should be broken up into multiple properly indented lines when long or complicated.  For example, when using functions like concat_ws, substring, etc.
  10. Select clauses may horizontally align ‘as’ names
  11. From clause should have each table on its own line
  12. From join clauses may want to have each join condition on its own line indented in, if they are long
  13. Where and having clauses should have each ‘and’ condition on its own line
  14. Conditions may horizontally align on =, <, <=, >, >=, !=
  15. Joining keywords on, and, and or should be never start a new line.
  16. Nested queries should be formated the same way with additional indentation necessary to line them up
  17. Semicolon should be on it’s own line with no leading spaces.

More Examples

Line up field ‘as’ names
select
   a.col              as name1
   a.long_column_name as name2,
   func(a.other)      as name3
Long join statements
from
   usage u
   left join usage_detail ud on
      u.usage_id = ud.usage_id and
      u.status_id <= 500 and
      (ud.trans_type = 'TD' or ud.position = 'R26')
The case statement
select
   case
      when t.trans_type is null then
         0
      when t.trans_type in (20, 50) then
         (-1) * qty * amount
      else
          qty * amount
      end as total
Subqueries
select
   o.id as order_id
   o.order_date,
   ois.orders
from
   orders o
   left join (
      select
         order_id
         count(*) as orders
      from
         order_details
      ) ois on o.id = ois.order_id
where
   o.order_date       >= '2015-02-01 and
   date(o.order_date) <  '2015-03-01
;
The insert statement

Same general rules apply

insert into audits
   (table_name, column_name, id, old_value, new_value)
values
   ('customers', 'status_id', 8056, 10, 200)
;
The update statement

Same general rules apply

update
   customers
set
   status_id = 500
where
   customer_id = 8011
;
Table creation scripts
set foreign_key_checks = 0;

-- drop table /*! if exists */ parties;

create table parties (
  party_id          integer unsigned  not null  auto_increment primary key ,
  party_type_id     integer unsigned  not null ,
  institution_id    integer unsigned  ,
  name              varchar(255)      ,
  first_name        varchar(255)      ,
  middle_name       varchar(255)      ,
  last_name         varchar(255)      ,
  username          varchar(255)      ,
  email             varchar(255)      ,
  phone             varchar(255)      ,
  phone2            varchar(255)      ,
  source            varchar(255)      ,
  comments          text              ,

  created           datetime          not null ,
  last_updated      timestamp         not null
        default current_timestamp on update current_timestamp ,

  unique key name_institution (name, party_type_id, institution_id),
  foreign key (party_type_id)  references party_types (party_type_id),
  foreign key (institution_id) references institutions (party_id)
)
engine InnoDB default charset=utf8;
;

show warnings;

set foreign_key_checks = 1;

desc parties;

Unix utility: sql_pretty

Here is a simple utility that will reformat SQL. It does most of it. It is a work in progress.  It uses regex but needs to do a full SQL parse to do everything: sql_pretty

Concluding

These conventions have made my life easier. Easy to read code is good code is easy living. So go clean up your SQL scripts!  And have fun.

Data Layer is King

It is far better to have an excellent Database Design and a crappy Application Layer, than an excellent Application Layer with a crappy Database design.

Why is that?  Simply put, you can always skim your Application code off the top and write a new one.   But you can not simply swap out the Data Layer without completely undermining the Application.

The closer we can model our business logic in the database itself – the better it is for the business.  When new unforseen questions arise about the business, we can always run adhoc queries from a well designed database schema to get the answers.

For example:

  • Who are our top customers?
  • What is revenue month over month?  Year over year?
  • What is the distribution of sales by product types across all income channels?
  • What percentage of our customers pay late over 30 days?
  • What time of day do we have the most volume?
  • What is the percentage of repeat customers vs first time buyers?
  • Has revenue gone up since the last website revamp?

All of these questions are very easily answered with a little SQL magic, when the database is well designed.  And all of these questions can be horrendously hard to answer when it is not.

For all important systems, the Data Layer must be King.

 

 

csv as a database table

CSV files can be used on the Unix command line like database tables.  grep can act as a where-clause, and awk can be used as column selector.  However the csv utility from vbin makes it easier.

Here it is.

   $csv -p books.csv
   id, isbn,          name,                                    publicated, type
   1,  9781557427960, The Picture of Dorian Gray,              1891,       Novel
   2,  9780140283297, On the Road,                             1957,       Novel
   3,  9781851243969, Frankenstein; or, The Modern Prometheus, 1818,       Novel
   4,  9780345347954, Childhood's End,                         1966,       Novel
   5,  9780451457998, A Clockwork Orange,                      1962,       Novel
   6,  9780440184621, Tai-Pan,                                 1982,       Novel
   7,  9780486266848, Another Turn of the Screw,               1898,       Novel
   8,  9780486280615, Adventures of Huckleberry Finn,          1884,       Novel
   9,  9780143104889, A Princess of Mars,                      1917,       Novel
   13, 9781614270621, The Prophet,                             1923,       Poetry
   21, 9780374528379, Brothers Karamazov,                      1880,       Novel

The -p makes the output pretty and easy to read. Similar to MySQL’s desc output.  Here is another example.

   $ csv -s books.csv
   1. id
   2. isbn
   3. name
   4. publicated
   5. type

The -s shows header info. Useful for choosing or rearranging columns by number:

   $ csv -c4,3 books.csv |grep ^18 |sort -n
   1818,Frankenstein; or, The Modern Prometheus
   1880,Brothers Karamazov
   1884,Adventures of Huckleberry Finn
   1891,The Picture of Dorian Gray
   1898,Another Turn of the Screw

This example chooses “published” and “name” columns (switching their order – something Unix cut can not do), and selects only those in the 1800s.

Here’s a humdinger:

   ./providers.py list | grep -ve '^id' -e '^$' |csv - -c2| sort | while read p; do echo -n "$p,"; ./providers.py customers $c | wc -l ; done | csv - -p

This takes the list output of some providers.py script, remove lines not beginning with id, and blank lines, graps the second column, sorts them, and then sends them back into the providers.py script to get a count of customers for that provider. The output might look something like this:

   ACE,      96
   NYSE,     1300
   OPC,      1400
   PGEG,     560
   VERT,     131
   VERT-SCO, 1430

The dash (-) allows csv to process <STDIN> rather than a given filename, like Unix’s gzip does.