This document contains examples of SQL queries with explanations and formatting guidelines. The queries select data from tables to return item details, revenues, and ratios. Comments provide best practices for formatting, commenting code, and modeling data. The document also discusses data structures, common table expressions, views, functions and technologies.
3. select i_ITEM_id,i_item_desc,i_category,i_class,i_CURRENT_price, sum(cs_EXT_sales_price) as
itemrevenue, sum(cs_ext_sales_price) *100/sum(sum(cs_ext_sales_price)) over (partition by
I_class) as revenueratio
From catalog_sales,item,date_dim
where cs_item_sk = i_item_sk AND i_category in ('Jewelry', 'Sports', 'Books')
and cs_sold_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2001-01-12' as
timestamp)
and (cast('2001-01-12' as timestamp) + interval 30 days)
group by i_item_id,i_item_desc,i_category,i_class,i_current_price
order by i_category,i_class,i_item_id,i_item_desc,revenueratio
limit 100;
6. select
i_ITEM_id,
I_item_desc,
I_category,
I_class,
i_CURRENT_price,
sum(cs_EXT_sales_price) as itemrevenue,
sum(cs_ext_sales_price) *100/sum(sum(cs_ext_sales_price))
over (partition by I_class) as revenueratio
FROM catalog_sales,item,date_dim
where cs_item_sk = i_item_sk
AND i_category in ('Jewelry', 'Sports', 'Books')
and cs_sold_date_sk = d_date_sk
and cast(d_date as timestamp) between cast('2001-01-12' as timestamp)
and (cast('2001-01-12' as timestamp) + interval 30 days)
group by i_item_id,i_item_desc,i_category,i_class,i_current_price
order by i_category,i_class,i_item_id,i_item_desc,revenueratio
limit 100;
7. select i_ITEM_id,
I_item_desc,
I_category,
I_class,
i_CURRENT_price,
sum(cs_EXT_sales_price) as itemrevenue,
sum(cs_ext_sales_price) *100/sum(sum(cs_ext_sales_price))
over (partition by I_class) as revenueratio
FROM catalog_sales,item,date_dim
where cs_item_sk = i_item_sk
AND i_category in ('Jewelry', 'Sports', 'Books')
and cs_sold_date_sk = d_date_sk
and cast(d_date as timestamp)
between cast('2001-01-12' as timestamp)
and (cast('2001-01-12' as timestamp) + interval 30 days)
group by i_item_id,
I_item_desc,
I_category,
I_class,
i_current_price
order by i_category,
I_class,
I_item_id,
I_item_desc,
revenueratio
limit 100;
8. SELECT i_item_id,
,i_item_desc
,i_category
,i_class
,i_current_price
,SUM(cs_ext_sales_price) AS item_revenue
,SUM(cs_ext_sales_price) * 100
/ SUM(SUM(cs_ext_sales_price))
OVER (PARTITION BY i_class) AS revenue_ratio
FROM catalog_sales
JOIN item ON cs_item_sk = i_item_sk
JOIN date_dim ON cs_sold_date_sk = d_date_sk
WHERE i_category IN ('Jewelry', 'Sports', 'Books')
AND CAST(d_date AS TIMESTAMP)
BETWEEN CAST('2001-01-12' AS TIMESTAMP)
AND CAST('2001-01-12' AS TIMESTAMP) + INTERVAL 30 DAYS
GROUP BY 1, 2, 3, 4, 5
ORDER BY 3, 4, 1
LIMIT 100
9. SELECT item.i_item_id,
item.i_item_desc,
item.i_category,
item.i_class,
item.i_current_price,
SUM(catalog_sales.cs_ext_sales_price) AS item_revenue,
SUM(catalog_sales.cs_ext_sales_price) * 100
/ SUM(SUM(catalog_sales.cs_ext_sales_price))
OVER (PARTITION BY item.i_class) AS revenue_ratio
FROM catalog_sales
JOIN item
ON catalog_sales.cs_item_sk = item.i_item_sk
JOIN date_dim
ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
WHERE item.i_category IN ('Jewelry', 'Sports', 'Books')
AND CAST(date_dim.d_date AS TIMESTAMP)
BETWEEN CAST('2001-01-12' AS TIMESTAMP)
AND CAST('2001-01-12' AS TIMESTAMP) + INTERVAL 30 DAYS
GROUP BY 1, 2, 3, 4, 5
ORDER BY 3, 4, 1
LIMIT 100
10. Over commenting
-- get id, name and open only once
SELECT DISTINCT t1.id , t1.name , t3.open
-- select names from table 1, order them by date and return first 3
FROM (SELECT id , name FROM table1 ORDER BY date DESC LIMIT 3) AS t1
-- get open from table 2 IF id is there, open = 1 and type =2
LEFT JOIN (
( SELECT open , name_id FROM table2 WHERE open=1 AND type=2 ) AS t3 )
ON t1.id = t3.name_id
-- order by name from A-Z
ORDER BY t1.name ASC
11.
12.
13. Follow Patterns
Indentation
Don’t over comment - just clear code
Remove commented lines of code
Use alias on all column when joining
Know your PKs and Unique keys
Make easy future maintenance
Execution time
JUST select the tables/columns which be in USE
16. WITH
patient_data AS
(SELECT patient_id,
patient_name,
hospital,
drug_dosage
FROM hospital_registry
WHERE (COALESCE(last_visit,NOW()) > NOW() - INTERVAL '14 days')
AND city = "Los Angeles"
),
average_dosage AS
(SELECT hospital,
AVG(drug_dosage) AS Average
FROM patient_data
GROUP BY hospital
)
SELECT count(hospital)
FROM average_dosage
WHERE drug_dosage > 1000
17. Master the use of:
Functions
Window Functions (OLAP functions)
CTE - Common Table Expression
Views
UDF - User Defined Function
28. Minimise usage of non standard abbreviations
Don’t use too long names - you will need to type them one day
PK & FK should be a pattern
ID & table name, FK and link table
Maybe:
Data types definition on names like: price_amt, tax_pct