SlideShare uma empresa Scribd logo
1 de 39
Understanding
Databases and
Querying
USMAN SHARIF
History of databases?
 Need to structurally organize data.
 Various different models to fulfill this need.
 Most common technique is called Relational Modelling
 The databases supporting relational model are called Relational
Database Management Systems (RDBMS).
Relational Model
 All data is represented in terms of tuples.
 A tuple is an extension to a pair. A pair is between two items, and a
tuple is between N items where N is a countable number.
 Tuples are grouped into relations.
 In mathematical terms, a relational model is based on first-order
predicate logic.
Example of Tuples and Relations
 Assume a road repair company wants to track their activities on
different roads.
 Lets restrict their activities to ‘Patching’, ‘Overlay’ and ‘Crack Sealing’.
 The company had overlaid I-95 on 1/12/01 and I-66 on 2/8/01.
 How can we represent this information in a relational model using
tuples?
 First, we see that there are two distinct things here
 Activities
 Work
 Next we define tuples for both of these items as follows:
 Activities = {activityName}
 Works = {activity, date, routeNumber}
Example of Tuples and Relations
 We see a relation between Activities and Work – the activity that is
to be performed.
 In relational model we use the concept of ‘keys’ to describe the
relationship between different tuples.
 In our example, activityName can act as a ‘key’ to describe the
relation that can be named as ‘ActivityWorks’.
 For optimization reasons, keys are generally of numeric type.
Therefore we modify Activities and Works to add a numeric ID
 Activities = {activityId, activityName}
 Works = {activityId, date, routeNumber}
Describing the example graphically
Relational Databases
 Relational Modelling is a mathematical concept.
 When we translate this mathematical concept into RDBMS system we describe tuples as rows, items in tuples as
columns and a group of ruples as tables. Relations are called relations in RDBMS terminology as well.
 The example of our road repair company when translated into RDBMS would have two tables as follows:
 Table Name: Activities. Columns:
 activityId (Primary Key, number type)
 activityName (string type)
 Table Name: Works. Columns:
 activityId (Foreign Key, number type)
 date (date type)
 routeNumber (string type)
 It would have the following relation:
 Relation Name: ActivityWorks. Participating Columns:
 Primary Table: Activities. Key Column: activityId
 Secondary Table: Works. Key Column: activityId
More on Relations
 The relation in the previous example is commonly called a ‘one-to-
many’ or a ‘Master-Child’ relationship.
 There are a total of three relationships:
 One-to-One: For a row in primary table there can be at most one row in
secondary table. Commonly used to spread a single tuple across two
tables based on logical reasoning.
 One-to-Many: For a row in primary table there can be multiple rows in
secondary table. Commonly used to reduce redundancy or duplication
of same data.
 Many-to-Many: For multiple rows in primary table there can be multiple
rows in secondary table. Used to describe complex relationships.
 Relations are always directional.
Querying databases
 Databases provide an interface to define and manipulate data. It is
called queries.
 There are two types of queries:
 Data Describing Language (DDL) queries. They are used to create and
modify database structure. DB structure is called a schema definition.
 Data Manipulating Language (DML) queries. They are used to query the
data base for data.
 There are four major DML queries:
 SELECT
 INSERT
 UPDATE
 DELETE
SELECT Query
 A SELECT query is the way to fetch data from a database.
 At a minimum, it has two parts (called clauses):
 The SELECT clause
 The FROM clause
 For example:
SELECT activityId, activityName
FROM Activities;
 This query would return all rows in Activities table.
 Apart from SELECT and FROM clauses, there are a number of other clauses that
are optional. These include (but not limited to):
 WHERE
 ORDER BY
 GROUP BY
SELECT Query – The SELECT Clause
 It enables you to define the columns you want.
 Sometimes you want all columns, in those cases you can use the
wildcard operator (*). For example, the previous query can be
modified as:
SELECT *
FROM Activities;
 A good practice is to name the columns rather than using *
 The primary use of SELECT clause is to define a projection – a subset
of columns, so that the result can be restricted to such columns only.
SELECT Query – FROM Clause
 This is where you tell the database the name of table(s) where it should
look for the columns you named in the SELECT clause.
 When fetching data from multiple tables, list all tables and describe the
relation between them. For example, let us try to fetch data for all the
activities that have been performed on various routes along with dates.
SELECT Activities.activityName, Works.date, Works.routeNumber
FROM Activities INNER JOIN
Works ON Activities.activityId = Works.activityId
 Notice the keywords ‘INNER JOIN’ and the part ‘ON Activities.activityId
= Works.activityId’.
 The ON … part tells the database what are the columns to match
results on. It is also called the join condition. There can be more than
one joining condition depending on the underlying database schema.
SELECT Query – FROM Clause -
JOINs
 JOIN is a keyword that allows you to let the database know that
there are multiple tables you intend to fetch data from.
 There is a table mentioned before JOIN and another after it.
 The one before is called the left table and the one after is called the
right table.
 There are three types of joins:
 INNER JOIN
 LEFT OUTER JOIN
 RIGHT OUTER JOIN
INNER JOIN
 INNER JOIN is also sometimes called a ‘strict’ join.
 Some RDBMS systems support dropping the ‘INNER’ and implicitly
assume it.
 This type of join means that for each row in the left table find the
rows in the right table and skip if there is no match found.
 This type of joins helps in eliminating empty records.
 For example, in our road repair example, it would omit all such
Activities rows that don’t have records in Works table.
OUTER JOINs
 In case we don’t want to omit empty records, we can use OUTER JOINs.
 A LEFT OUTER JOIN suggests that for each row in left table find all rows in
right table.
 A RIGHT OUTER JOIN suggests that for each row in right table find all
rows in left table.
 For example, let us find all Activities and related Works. We can do this
by:
SELECT Activities.activityName, Works.date, Works.routeNumber
FROM Activities LEFT OUTER JOIN
Works ON Activities.activityId = Works.activityId
 This query would return all Activities along with their associated Works.
For the Activities that don’t have corresponding Works it would put
‘NULL’ under date and routeNumber columns.
The JOIN Conditions
 The ON … part is called the joining condition.
 It is essentially an assertion condition describing column on the left
and right tables and the way they are to be evaluated.
 In most circumstances, there are columns (from left and right tables)
that are matched with an = operator, however, in some cases that
might not be true.
 Other conditional operators such as not equal, greater than, less
than, etc. are also supported.
 There can be more than one JOINing conditions.
 QUESTION: What would happen if you skip the ON … part?
SELECT Query – WHERE clause
 WHERE clause allows you to describe conditions on the data you want fetched.
 For example, if we are interested in all Overlaying Works we’ll write a query:
SELECT *
FROM Works
WHERE activityId = 24
 Another way to do the same without using an ID is:
SELECT Works.*
FROM Works INNER JOIN
Activities ON Works.activityId = Activities.activityId
WHERE Activities.activityName = ‘Overlay’
 However, the second example would be a bit slow and non-optimal because
there is a certain overhead of joining and matching on string columns.
SELECT Query - ORDER BY Clause
 Theoretically speaking, the records in a table are unordered. However, most
RDBMS usually store them in some kind of ordering (usually in the order of Primary
key column).
 In any case, there might be a requirement to order the results in a particular
way.
 ORDER BY clause allows you to describe data ordering and the direction of
ordering.
 For example, if we want all Activities along with their associated Works ordered
alphabetically and sorted by date in a descending order, we can do that by:
SELECT Activities.activityName, Works.date, Works.routeNumber
FROM Activities INNER JOIN
Works ON Activities.activityId = Works.activityId
ORDER BY Activities.activityName ASC, Works.date DESC
 The ASC keyword is implicit and can be skipped.
Aggregating Results
 Sometimes we want to fetch aggregated results. For example, we
want to find out the number of times each Activity has been carried
out from the road repair example.
 The GROUP BY clause provides this functionality.
SELECT Activities.activityName, COUNT(Works.routeNumber) AS
countActivity
FROM Activities INNER JOIN
Works ON Activities.activityId = Works.activityId
GROUP BY Activities.activityName
 COUNT is an aggregate function. Others commonly used
aggregate functions are SUM, AVG, MIN and MAX.
SELECT Query – GROUP BY Clause
 When a GROUP BY clause is defined then every column in the
SELECT and ORDER BY clauses either need to be part of an
aggregate function or mentioned in the GROUP BY clause.
 For example, the following query is invalid:
SELECT Activities.activityName, Works.date,
COUNT(Works.routeNumber) AS countActivity
FROM Activities INNER JOIN
Works ON Activities.activityId = Works.activityId
GROUP BY Activities.activityName
Sub-queries
 A SELECT query works on a table or a group of tables, meaning
tables are the operands for a SELECT operation.
 The output of a SELECT query is (a kind of) a table.
 Therefore, an output of a SELECT query can act as an
input/operand for another SELECT query.
Why use sub-queries?
 Query optimization by breaking a large/complex query into smaller
queries that use WHERE clauses to reduce the data size.
 Retrieving single valued records for related tables based on values
on some other columns in another query. Such as retrieving most
recent (or oldest) record in a table that holds data for single record
with updates over a period of time.
 The above point is a reference to a common data warehousing use
case of storing data that changes over time and you want to
preserve these over the time changes.
 Sometimes also referred to as Slowly Changing Dimension (SCD)
 Using a sub-query in a WHERE clause to specify a match on a range
of values.
Sub-queries for optimization
 Assume that we have a
service with one million
users.
 There are only about
100,000 users that have
spent money on our
service.
 Of the 100,000 users, only
about 1,000 users have
ever spent 100 dollars or
more in one go.
 We would most likely have
a database with the
tables as shown in the
diagram
Sub-queries for optimization
 You are required to analyze transcations with amount greater than
100 dollars.
 Write down the query that fetches users (userId, name, gender,
country) and their transactions (transactionDate, amount).
 A sub-optimal query follows on the next slide but don’t peak ahead.
Write down one yourself and compare with it later.
Sub-queries for optimization
SELECT users.userId, users.username, users.gender, users.country,
transactions.transactionDate, transactions.transactionAmount
FROM users INNER OUTER JOIN
transactions ON users.userId = transactions.userId
WHERE transactions.transactionAmount > 100;
 Problems:
 There were 100,000 users that had spent money. Of those there were only a 1,000 instances
where a the amount spent was greater than 100.
 Assume that on average there are 2 transactions per user.
 The query above would result in retrieval of 200,000 records and then the WHERE clause
would be applied to it to pick out the 1,000 such records where the amount was greater than
100.
 This means that 99.5% of data fetched initially was of no use and wasted server resources
(time and memory).
Sub-queries for optimization
 First, we know that we are only interested in transactions worth more than 100 dollars.
Following query gets use only these transactions:
SELECT transactions.userId, transactions.transactionDate, transactions.transactionAmount
FROM transactions
WHERE transactions.transactionAmount > 100
 Since, the output of the above query would be a table, we’ll use this one to JOIN
with users table. The resulting query would be:
SELECT users.userId, users.username, users.gender, users.country,
t1.transactionDate, t1.transactionAmount
FROM users INNER OUTER JOIN
(SELECT transactions.userId, transactions.transactionDate,
transactions.transactionAmount
FROM transactions
WHERE transactions.transactionAmount > 100) AS t1 ON users.userId = t1.userId
Sub-queries for Retrieving SCD
 From the previous example, assume that now we’re interested in
knowing when was the last time each of our users spent money
along with their gender and country.
 How can we go about doing this?
 The query that does that is on the next slide, but first try thinking out
how you can do that.
Sub-queries for Retrieving SCD
 First, lets write a query that retrieves the latest transaction.
SELECT MAX(transactions.transactionDate) AS lastTransactionDate
FROM transactions
OR
SELECT transactions.transactionDate
FROM transactions
ORDER BY transactions.transactionDate DESC
LIMIT 1
 But we want to know the last transaction for each user. We can modify the first example as:
SELECT transactions.userId, MAX(transactions.transactionDate) AS lastTransactionDate
FROM transactions
GROUP BY transactions.userId
 The second one cannot be modified in a way that would give us the desired because??
SELECT transactions.userId, transactions.transactionDate
FROM transactions
ORDER BY transactions.transactionDate DESC
LIMIT 1
Sub-queries for Retrieving SCD
 Now, we need to combine the result with user’s gender and
country.
SELECT users.userId, users.gender, users.country,
MAX(transactions.transactionDate) AS lastTransactionDate
FROM users LEFT OUTER JOIN
transactions ON users.userId = transactions.userId
GROUP BY users.userId, users.gender, users.location
 The query above gives us the desired result, but it has one problem.
What?
Sub-queries for Retrieving SCD
 We can use the discarded query two slides back if we can parameterize it somehow so that it
evaluates for each user and gives us the last date. The following query does that:
SELECT users.userId, users.gender, users.country,
(SELECT transactions.transactionDate
FROM transactions
WHERE transactions.userId = users.userId
ORDER BY transactionDate DESC
LIMIT 1) AS lastTransactionDate
FROM users
 The query above does not have a join.
 It does not use an aggregate function in the main query and enables us to easily add more
columns without worrying about the GROUP BY clause.
 Modify the query above (or the one on previous slide) so that we now get the last transaction
dates for transactions worth more than 50 dollars for each user. (Answer on next slide)
Sub-queries for Retrieving SCD
SELECT users.userId, users.gender, users.country,
(SELECT transactions.transactionDate
FROM transactions
WHERE transactions.userId = users.userId
AND transactions.transactionAmount > 50
LIMIT 1) AS lastTransactionDate
FROM users
Handling NULL Values
 The query on previous slide would return rows for all one million users
with most of them having lastTransactionDate as NULL.
 NULLs don’t look good on a result set and are of no value for further
analysis. We can resolve this situation in two ways.
 Assume that we do need to see all one million users and would like
to put a default value for the users that don’t have a transaction
(such as 1.Jan.1900). Such values are called ‘sentinels’.
 To replace a NULL, we can use a function ISNULL to replace the
NULL with a sentinel value.
Handling NULL Values
SELECT users.userId, users.gender, users.country,
ISNULL((SELECT transactions.transactionDate
FROM transactions
WHERE transactions.userId = users.userId
AND transactions.transactionAmount > 50
LIMIT 1), ‘1.Jan.1900’) AS lastTransactionDate
FROM users
Sub-queries in WHERE clause
 Or, we can modify the same query as:
SELECT users.userId, users.gender, users.country,
(SELECT transactions.transactionDate
FROM transactions
WHERE transactions.userId = users.userId
AND transactions.transactionAmount > 50
LIMIT 1) AS lastTransactionDate
FROM users
WHERE users.userId IN (SELECT transactions.userId
FROM transactions
WHERE transactions.transactionAmount > 50)
 However, this is (and in general queries that user a sub-query in WHERE clause
are) sub-optimal to the point that it is quite a bad query.
Many-to-Many Relation Example
 We are tasked to design a system for a college.
 There are students and there are courses.
 We need to provide a basic model that can store data for students,
courses and enrollment of students in courses over years and
semesters.
 A student may have enrolled in multiple courses.
 A course may have enrollment of multiple students.
 A student may enroll in a course only once in a give semester of a
year.
 Try modelling the above scenario. The slide following this shows a
common way to go about doing this.
Many-to-Many Relation Example
Many-to-Many Relation Example
 Write a query that retrieves records of enrollment for all students
ordered chronologically.
 Write a query that retrieves semester-wise enrollment count for all
courses
 Write a query that displays students that have enrolled in the same
course more than once along with the number of times they had
enrolled.
 Write a query to display last enrollment for all students.
Questions

Mais conteúdo relacionado

Mais procurados

Relational algebra
Relational algebraRelational algebra
Relational algebrashynajain
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relationalJafar Nesargi
 
Database : Relational Data Model
Database : Relational Data ModelDatabase : Relational Data Model
Database : Relational Data ModelSmriti Jain
 
Additional Relational Algebra Operations
Additional Relational Algebra OperationsAdditional Relational Algebra Operations
Additional Relational Algebra OperationsA. S. M. Shafi
 
Excel spreadsheet
Excel spreadsheetExcel spreadsheet
Excel spreadsheetByju Antony
 
Relational database
Relational databaseRelational database
Relational databaseDucat
 
Assignment 4
Assignment 4Assignment 4
Assignment 4SneaK3
 
Referential integrity
Referential integrityReferential integrity
Referential integrityJubin Raju
 
linked lists in data structures
linked lists in data structureslinked lists in data structures
linked lists in data structuresDurgaDeviCbit
 
What is Link list? explained with animations
What is Link list? explained with animationsWhat is Link list? explained with animations
What is Link list? explained with animationsPratikNaik41
 

Mais procurados (20)

Join
JoinJoin
Join
 
Relational algebra
Relational algebraRelational algebra
Relational algebra
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relational
 
Lect11
Lect11Lect11
Lect11
 
Database : Relational Data Model
Database : Relational Data ModelDatabase : Relational Data Model
Database : Relational Data Model
 
Linked lists 1
Linked lists 1Linked lists 1
Linked lists 1
 
Link List
Link ListLink List
Link List
 
joins in database
 joins in database joins in database
joins in database
 
Additional Relational Algebra Operations
Additional Relational Algebra OperationsAdditional Relational Algebra Operations
Additional Relational Algebra Operations
 
Excel spreadsheet
Excel spreadsheetExcel spreadsheet
Excel spreadsheet
 
Advance excel
Advance excelAdvance excel
Advance excel
 
Relational database
Relational databaseRelational database
Relational database
 
Assignment 4
Assignment 4Assignment 4
Assignment 4
 
Linked List
Linked ListLinked List
Linked List
 
Referential integrity
Referential integrityReferential integrity
Referential integrity
 
linked lists in data structures
linked lists in data structureslinked lists in data structures
linked lists in data structures
 
SQL Joins
SQL JoinsSQL Joins
SQL Joins
 
Linked list
Linked listLinked list
Linked list
 
What is Link list? explained with animations
What is Link list? explained with animationsWhat is Link list? explained with animations
What is Link list? explained with animations
 
Linked Lists
Linked ListsLinked Lists
Linked Lists
 

Semelhante a Understanding databases and querying

Ben Finkel- Using the order by clause.pptx
Ben Finkel- Using the order by clause.pptxBen Finkel- Using the order by clause.pptx
Ben Finkel- Using the order by clause.pptxStephenEfange3
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersBRIJESH KUMAR
 
DATABASE MANAGMENT SYSTEM (DBMS) AND SQL
DATABASE MANAGMENT SYSTEM (DBMS) AND SQLDATABASE MANAGMENT SYSTEM (DBMS) AND SQL
DATABASE MANAGMENT SYSTEM (DBMS) AND SQLDev Chauhan
 
Database Modeling presentation
Database Modeling  presentationDatabase Modeling  presentation
Database Modeling presentationBhavishya Tyagi
 
Ms sql server ii
Ms sql server  iiMs sql server  ii
Ms sql server iiIblesoft
 
Data Manipulation Language.pptx
Data Manipulation Language.pptxData Manipulation Language.pptx
Data Manipulation Language.pptxEllenGracePorras
 
SQLSERVERQUERIES.pptx
SQLSERVERQUERIES.pptxSQLSERVERQUERIES.pptx
SQLSERVERQUERIES.pptxssuser6bf2d1
 
Introduction to oracle functions
Introduction to oracle functionsIntroduction to oracle functions
Introduction to oracle functionsNitesh Singh
 
Excel Database Function
Excel Database FunctionExcel Database Function
Excel Database FunctionAnita Shah
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09guest9d79e073
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Mark Ginnebaugh
 
Dbms question
Dbms questionDbms question
Dbms questionRicky Dky
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdfKalyankumarVenkat1
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresSteven Johnson
 

Semelhante a Understanding databases and querying (20)

Ben Finkel- Using the order by clause.pptx
Ben Finkel- Using the order by clause.pptxBen Finkel- Using the order by clause.pptx
Ben Finkel- Using the order by clause.pptx
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for Developers
 
DATABASE MANAGMENT SYSTEM (DBMS) AND SQL
DATABASE MANAGMENT SYSTEM (DBMS) AND SQLDATABASE MANAGMENT SYSTEM (DBMS) AND SQL
DATABASE MANAGMENT SYSTEM (DBMS) AND SQL
 
Database Modeling presentation
Database Modeling  presentationDatabase Modeling  presentation
Database Modeling presentation
 
Introduction to Oracle Functions--(SQL)--Abhishek Sharma
Introduction to Oracle Functions--(SQL)--Abhishek SharmaIntroduction to Oracle Functions--(SQL)--Abhishek Sharma
Introduction to Oracle Functions--(SQL)--Abhishek Sharma
 
Ms sql server ii
Ms sql server  iiMs sql server  ii
Ms sql server ii
 
Lists
ListsLists
Lists
 
Data Manipulation Language.pptx
Data Manipulation Language.pptxData Manipulation Language.pptx
Data Manipulation Language.pptx
 
SQLSERVERQUERIES.pptx
SQLSERVERQUERIES.pptxSQLSERVERQUERIES.pptx
SQLSERVERQUERIES.pptx
 
Fg d
Fg dFg d
Fg d
 
Introduction to oracle functions
Introduction to oracle functionsIntroduction to oracle functions
Introduction to oracle functions
 
Excel Database Function
Excel Database FunctionExcel Database Function
Excel Database Function
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Dbms question
Dbms questionDbms question
Dbms question
 
Ds notes
Ds notesDs notes
Ds notes
 
T-SQL Overview
T-SQL OverviewT-SQL Overview
T-SQL Overview
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdf
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
Relational model
Relational modelRelational model
Relational model
 

Último

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Understanding databases and querying

  • 2. History of databases?  Need to structurally organize data.  Various different models to fulfill this need.  Most common technique is called Relational Modelling  The databases supporting relational model are called Relational Database Management Systems (RDBMS).
  • 3. Relational Model  All data is represented in terms of tuples.  A tuple is an extension to a pair. A pair is between two items, and a tuple is between N items where N is a countable number.  Tuples are grouped into relations.  In mathematical terms, a relational model is based on first-order predicate logic.
  • 4. Example of Tuples and Relations  Assume a road repair company wants to track their activities on different roads.  Lets restrict their activities to ‘Patching’, ‘Overlay’ and ‘Crack Sealing’.  The company had overlaid I-95 on 1/12/01 and I-66 on 2/8/01.  How can we represent this information in a relational model using tuples?  First, we see that there are two distinct things here  Activities  Work  Next we define tuples for both of these items as follows:  Activities = {activityName}  Works = {activity, date, routeNumber}
  • 5. Example of Tuples and Relations  We see a relation between Activities and Work – the activity that is to be performed.  In relational model we use the concept of ‘keys’ to describe the relationship between different tuples.  In our example, activityName can act as a ‘key’ to describe the relation that can be named as ‘ActivityWorks’.  For optimization reasons, keys are generally of numeric type. Therefore we modify Activities and Works to add a numeric ID  Activities = {activityId, activityName}  Works = {activityId, date, routeNumber}
  • 7. Relational Databases  Relational Modelling is a mathematical concept.  When we translate this mathematical concept into RDBMS system we describe tuples as rows, items in tuples as columns and a group of ruples as tables. Relations are called relations in RDBMS terminology as well.  The example of our road repair company when translated into RDBMS would have two tables as follows:  Table Name: Activities. Columns:  activityId (Primary Key, number type)  activityName (string type)  Table Name: Works. Columns:  activityId (Foreign Key, number type)  date (date type)  routeNumber (string type)  It would have the following relation:  Relation Name: ActivityWorks. Participating Columns:  Primary Table: Activities. Key Column: activityId  Secondary Table: Works. Key Column: activityId
  • 8. More on Relations  The relation in the previous example is commonly called a ‘one-to- many’ or a ‘Master-Child’ relationship.  There are a total of three relationships:  One-to-One: For a row in primary table there can be at most one row in secondary table. Commonly used to spread a single tuple across two tables based on logical reasoning.  One-to-Many: For a row in primary table there can be multiple rows in secondary table. Commonly used to reduce redundancy or duplication of same data.  Many-to-Many: For multiple rows in primary table there can be multiple rows in secondary table. Used to describe complex relationships.  Relations are always directional.
  • 9. Querying databases  Databases provide an interface to define and manipulate data. It is called queries.  There are two types of queries:  Data Describing Language (DDL) queries. They are used to create and modify database structure. DB structure is called a schema definition.  Data Manipulating Language (DML) queries. They are used to query the data base for data.  There are four major DML queries:  SELECT  INSERT  UPDATE  DELETE
  • 10. SELECT Query  A SELECT query is the way to fetch data from a database.  At a minimum, it has two parts (called clauses):  The SELECT clause  The FROM clause  For example: SELECT activityId, activityName FROM Activities;  This query would return all rows in Activities table.  Apart from SELECT and FROM clauses, there are a number of other clauses that are optional. These include (but not limited to):  WHERE  ORDER BY  GROUP BY
  • 11. SELECT Query – The SELECT Clause  It enables you to define the columns you want.  Sometimes you want all columns, in those cases you can use the wildcard operator (*). For example, the previous query can be modified as: SELECT * FROM Activities;  A good practice is to name the columns rather than using *  The primary use of SELECT clause is to define a projection – a subset of columns, so that the result can be restricted to such columns only.
  • 12. SELECT Query – FROM Clause  This is where you tell the database the name of table(s) where it should look for the columns you named in the SELECT clause.  When fetching data from multiple tables, list all tables and describe the relation between them. For example, let us try to fetch data for all the activities that have been performed on various routes along with dates. SELECT Activities.activityName, Works.date, Works.routeNumber FROM Activities INNER JOIN Works ON Activities.activityId = Works.activityId  Notice the keywords ‘INNER JOIN’ and the part ‘ON Activities.activityId = Works.activityId’.  The ON … part tells the database what are the columns to match results on. It is also called the join condition. There can be more than one joining condition depending on the underlying database schema.
  • 13. SELECT Query – FROM Clause - JOINs  JOIN is a keyword that allows you to let the database know that there are multiple tables you intend to fetch data from.  There is a table mentioned before JOIN and another after it.  The one before is called the left table and the one after is called the right table.  There are three types of joins:  INNER JOIN  LEFT OUTER JOIN  RIGHT OUTER JOIN
  • 14. INNER JOIN  INNER JOIN is also sometimes called a ‘strict’ join.  Some RDBMS systems support dropping the ‘INNER’ and implicitly assume it.  This type of join means that for each row in the left table find the rows in the right table and skip if there is no match found.  This type of joins helps in eliminating empty records.  For example, in our road repair example, it would omit all such Activities rows that don’t have records in Works table.
  • 15. OUTER JOINs  In case we don’t want to omit empty records, we can use OUTER JOINs.  A LEFT OUTER JOIN suggests that for each row in left table find all rows in right table.  A RIGHT OUTER JOIN suggests that for each row in right table find all rows in left table.  For example, let us find all Activities and related Works. We can do this by: SELECT Activities.activityName, Works.date, Works.routeNumber FROM Activities LEFT OUTER JOIN Works ON Activities.activityId = Works.activityId  This query would return all Activities along with their associated Works. For the Activities that don’t have corresponding Works it would put ‘NULL’ under date and routeNumber columns.
  • 16. The JOIN Conditions  The ON … part is called the joining condition.  It is essentially an assertion condition describing column on the left and right tables and the way they are to be evaluated.  In most circumstances, there are columns (from left and right tables) that are matched with an = operator, however, in some cases that might not be true.  Other conditional operators such as not equal, greater than, less than, etc. are also supported.  There can be more than one JOINing conditions.
  • 17.  QUESTION: What would happen if you skip the ON … part?
  • 18. SELECT Query – WHERE clause  WHERE clause allows you to describe conditions on the data you want fetched.  For example, if we are interested in all Overlaying Works we’ll write a query: SELECT * FROM Works WHERE activityId = 24  Another way to do the same without using an ID is: SELECT Works.* FROM Works INNER JOIN Activities ON Works.activityId = Activities.activityId WHERE Activities.activityName = ‘Overlay’  However, the second example would be a bit slow and non-optimal because there is a certain overhead of joining and matching on string columns.
  • 19. SELECT Query - ORDER BY Clause  Theoretically speaking, the records in a table are unordered. However, most RDBMS usually store them in some kind of ordering (usually in the order of Primary key column).  In any case, there might be a requirement to order the results in a particular way.  ORDER BY clause allows you to describe data ordering and the direction of ordering.  For example, if we want all Activities along with their associated Works ordered alphabetically and sorted by date in a descending order, we can do that by: SELECT Activities.activityName, Works.date, Works.routeNumber FROM Activities INNER JOIN Works ON Activities.activityId = Works.activityId ORDER BY Activities.activityName ASC, Works.date DESC  The ASC keyword is implicit and can be skipped.
  • 20. Aggregating Results  Sometimes we want to fetch aggregated results. For example, we want to find out the number of times each Activity has been carried out from the road repair example.  The GROUP BY clause provides this functionality. SELECT Activities.activityName, COUNT(Works.routeNumber) AS countActivity FROM Activities INNER JOIN Works ON Activities.activityId = Works.activityId GROUP BY Activities.activityName  COUNT is an aggregate function. Others commonly used aggregate functions are SUM, AVG, MIN and MAX.
  • 21. SELECT Query – GROUP BY Clause  When a GROUP BY clause is defined then every column in the SELECT and ORDER BY clauses either need to be part of an aggregate function or mentioned in the GROUP BY clause.  For example, the following query is invalid: SELECT Activities.activityName, Works.date, COUNT(Works.routeNumber) AS countActivity FROM Activities INNER JOIN Works ON Activities.activityId = Works.activityId GROUP BY Activities.activityName
  • 22. Sub-queries  A SELECT query works on a table or a group of tables, meaning tables are the operands for a SELECT operation.  The output of a SELECT query is (a kind of) a table.  Therefore, an output of a SELECT query can act as an input/operand for another SELECT query.
  • 23. Why use sub-queries?  Query optimization by breaking a large/complex query into smaller queries that use WHERE clauses to reduce the data size.  Retrieving single valued records for related tables based on values on some other columns in another query. Such as retrieving most recent (or oldest) record in a table that holds data for single record with updates over a period of time.  The above point is a reference to a common data warehousing use case of storing data that changes over time and you want to preserve these over the time changes.  Sometimes also referred to as Slowly Changing Dimension (SCD)  Using a sub-query in a WHERE clause to specify a match on a range of values.
  • 24. Sub-queries for optimization  Assume that we have a service with one million users.  There are only about 100,000 users that have spent money on our service.  Of the 100,000 users, only about 1,000 users have ever spent 100 dollars or more in one go.  We would most likely have a database with the tables as shown in the diagram
  • 25. Sub-queries for optimization  You are required to analyze transcations with amount greater than 100 dollars.  Write down the query that fetches users (userId, name, gender, country) and their transactions (transactionDate, amount).  A sub-optimal query follows on the next slide but don’t peak ahead. Write down one yourself and compare with it later.
  • 26. Sub-queries for optimization SELECT users.userId, users.username, users.gender, users.country, transactions.transactionDate, transactions.transactionAmount FROM users INNER OUTER JOIN transactions ON users.userId = transactions.userId WHERE transactions.transactionAmount > 100;  Problems:  There were 100,000 users that had spent money. Of those there were only a 1,000 instances where a the amount spent was greater than 100.  Assume that on average there are 2 transactions per user.  The query above would result in retrieval of 200,000 records and then the WHERE clause would be applied to it to pick out the 1,000 such records where the amount was greater than 100.  This means that 99.5% of data fetched initially was of no use and wasted server resources (time and memory).
  • 27. Sub-queries for optimization  First, we know that we are only interested in transactions worth more than 100 dollars. Following query gets use only these transactions: SELECT transactions.userId, transactions.transactionDate, transactions.transactionAmount FROM transactions WHERE transactions.transactionAmount > 100  Since, the output of the above query would be a table, we’ll use this one to JOIN with users table. The resulting query would be: SELECT users.userId, users.username, users.gender, users.country, t1.transactionDate, t1.transactionAmount FROM users INNER OUTER JOIN (SELECT transactions.userId, transactions.transactionDate, transactions.transactionAmount FROM transactions WHERE transactions.transactionAmount > 100) AS t1 ON users.userId = t1.userId
  • 28. Sub-queries for Retrieving SCD  From the previous example, assume that now we’re interested in knowing when was the last time each of our users spent money along with their gender and country.  How can we go about doing this?  The query that does that is on the next slide, but first try thinking out how you can do that.
  • 29. Sub-queries for Retrieving SCD  First, lets write a query that retrieves the latest transaction. SELECT MAX(transactions.transactionDate) AS lastTransactionDate FROM transactions OR SELECT transactions.transactionDate FROM transactions ORDER BY transactions.transactionDate DESC LIMIT 1  But we want to know the last transaction for each user. We can modify the first example as: SELECT transactions.userId, MAX(transactions.transactionDate) AS lastTransactionDate FROM transactions GROUP BY transactions.userId  The second one cannot be modified in a way that would give us the desired because?? SELECT transactions.userId, transactions.transactionDate FROM transactions ORDER BY transactions.transactionDate DESC LIMIT 1
  • 30. Sub-queries for Retrieving SCD  Now, we need to combine the result with user’s gender and country. SELECT users.userId, users.gender, users.country, MAX(transactions.transactionDate) AS lastTransactionDate FROM users LEFT OUTER JOIN transactions ON users.userId = transactions.userId GROUP BY users.userId, users.gender, users.location  The query above gives us the desired result, but it has one problem. What?
  • 31. Sub-queries for Retrieving SCD  We can use the discarded query two slides back if we can parameterize it somehow so that it evaluates for each user and gives us the last date. The following query does that: SELECT users.userId, users.gender, users.country, (SELECT transactions.transactionDate FROM transactions WHERE transactions.userId = users.userId ORDER BY transactionDate DESC LIMIT 1) AS lastTransactionDate FROM users  The query above does not have a join.  It does not use an aggregate function in the main query and enables us to easily add more columns without worrying about the GROUP BY clause.  Modify the query above (or the one on previous slide) so that we now get the last transaction dates for transactions worth more than 50 dollars for each user. (Answer on next slide)
  • 32. Sub-queries for Retrieving SCD SELECT users.userId, users.gender, users.country, (SELECT transactions.transactionDate FROM transactions WHERE transactions.userId = users.userId AND transactions.transactionAmount > 50 LIMIT 1) AS lastTransactionDate FROM users
  • 33. Handling NULL Values  The query on previous slide would return rows for all one million users with most of them having lastTransactionDate as NULL.  NULLs don’t look good on a result set and are of no value for further analysis. We can resolve this situation in two ways.  Assume that we do need to see all one million users and would like to put a default value for the users that don’t have a transaction (such as 1.Jan.1900). Such values are called ‘sentinels’.  To replace a NULL, we can use a function ISNULL to replace the NULL with a sentinel value.
  • 34. Handling NULL Values SELECT users.userId, users.gender, users.country, ISNULL((SELECT transactions.transactionDate FROM transactions WHERE transactions.userId = users.userId AND transactions.transactionAmount > 50 LIMIT 1), ‘1.Jan.1900’) AS lastTransactionDate FROM users
  • 35. Sub-queries in WHERE clause  Or, we can modify the same query as: SELECT users.userId, users.gender, users.country, (SELECT transactions.transactionDate FROM transactions WHERE transactions.userId = users.userId AND transactions.transactionAmount > 50 LIMIT 1) AS lastTransactionDate FROM users WHERE users.userId IN (SELECT transactions.userId FROM transactions WHERE transactions.transactionAmount > 50)  However, this is (and in general queries that user a sub-query in WHERE clause are) sub-optimal to the point that it is quite a bad query.
  • 36. Many-to-Many Relation Example  We are tasked to design a system for a college.  There are students and there are courses.  We need to provide a basic model that can store data for students, courses and enrollment of students in courses over years and semesters.  A student may have enrolled in multiple courses.  A course may have enrollment of multiple students.  A student may enroll in a course only once in a give semester of a year.  Try modelling the above scenario. The slide following this shows a common way to go about doing this.
  • 38. Many-to-Many Relation Example  Write a query that retrieves records of enrollment for all students ordered chronologically.  Write a query that retrieves semester-wise enrollment count for all courses  Write a query that displays students that have enrolled in the same course more than once along with the number of times they had enrolled.  Write a query to display last enrollment for all students.