GIS burdGIS logo

SQL Masterclass - 1.0

Sequenced query language (SQL - pronounced "ess-que-el" or "sequel") is one of those things. It's useful, it's necessary, it's ubiquitous. But often it is underused. As Shachar Meir described in his SQL Masterclass (part of the #Datasciencefest program) SQL is often applied to extract data and once exported we can settle into our favourite method (R, Python, even Excel) of working with it further.

As Shachar pointed out, much if not all data transformation can be done database side. This weekly series covers Shachar's presentation and (on a personal level) helps me cement my SQL knowledge. The sample data, code and themes are all Shachar's and if you haven't yet checked out the #Datasciencefest program ... do.

Setting up

It's useful to have tables to practice SQL on. This can be achieved many ways and this practical will use SQL fiddle. We'll be using postgres 9.3 and to build a schema you can paste this code into the schema panel and then hit "build schema". If you prefer by all means make a local database using postGres.

Ethos

It is very easy to get bogged down in syntax with SQL. However, this workshop was aimed more at thinking about processes. And the principles (good to stick by for any programming-type work) are as follows:

  • Start very very small
  • Increment one step at a time
  • Always have an expected result in mind

Ethos in practice

We have our tables set up so let's start very very small and very very simple:

SELECT * FROM products;

The above SQL will select all columns from our products table. This is a simple as it gets in SQL. Our expected result would be all the columns from the products table and so it is. Excellent! Don't forget that starting small also means taking a subset of your data to work on. We're fine with the current tables as they are already a subset but testing code on millions of records will be time consuming and make it harder to predict outcomes.

Processes - Using CASE Statements

SQL has two main types of case statement; simple cases and searched cases. Looking at a simple case statement first let's imagine we'd like to create discounts for each product. The discounts will be as follows:

  • bananas: 10%
  • nuts: 12%
  • milk: 8%

For this we'll need to use the sales table (price in the products table is multiplied by quantity in sales table already to give the sales price). So let's start small:

SELECT * FROM sales;

Expected output is all the columns from sales. What columns do we actually need? Probably sale date, product and price will do:

SELECT sale_dt, product, price FROM sales;

Now we'd like to add a column to show the percentage discount for each product so we'll use a simple case statement:

SELECT sale_dt, product, price, CASE product WHEN 'Bananas' THEN 0.1 WHEN 'Nuts' THEN 0.12 WHEN 'Milk' THEN 0.08 ELSE 0 END As percentage FROM sales;

Here we're using the case statement to create an extra field called percentage dependent on the product. We just need to employ some simple maths to calculate the discounted price now:

SELECT sale_dt, product, price, price - CASE product WHEN 'Bananas' THEN 0.1 WHEN 'Nuts' THEN 0.12 WHEN 'Milk' THEN 0.08 ELSE 0 END * price As discounted_price FROM sales;

In the next installment we'll look at searched cases.

...archive

all articles
tutorials
map stories

sign-up...

Sign-up for our monthly newsletter.

What now?

If you enjoyed this post and would like to learn more about GIS or improve your skills check out our training courses. If you'd like to contribute a story, tutorial or anything else, please contact us.

Thanks for reading and if you'd like to support our content please feel free to buy us a coffee.