SlideShare uma empresa Scribd logo
1 de 44
CS 542 Database Management Systems Relational Database Programming J Singh  January 24, 2011
Simple SQL Queries (p1) Relation BROWSER_TABLE SELECT * FROM BROWSER_TABLE WHERE ENGINE = 'Gecko' Start with the Relation Select () Rows
Simple SQL Queries (p2) Relation BROWSER_TABLE SELECT BROWSER, PLATFORM FROM BROWSER_TABLE  WHERE ENGINE = 'Gecko' Start with the Relation Select () Rows Project () Columns
Simple SQL Queries (p3) Relation BROWSER_TABLE SELECT BROWSER, PLATFORM AS OS FROM BROWSER_TABLE  WHERE ENGINE = 'Gecko' Start with the Relation Select () Rows Project () Columns Rename () Columns
SQL Conditions In WHERE clause: String1 = String2, String1 > String2 and other comparison operators Comparisons are controlled by ‘collations’, e.g., COLLATE Latin1_General_CI_AS (Latin1 collation, case insensitive, accent sensitive) For other available collations, check your database Collations can be specified at three levels For the entire database For an attribute during in CREATE TABLE In the WHERE clause LIKE String (pattern matching), e.g., 'John Wayne' LIKE 'John%' 'John Wayne' LIKE ‘% W_yne'
SQL Special Data Types (p1) Dates and Times (look them up) NULL values ( in Relational Algebra) Can mean one of three things: Value is unknown Value is inapplicable (e.g., spouse name for a single person) Value not shown – perhaps because of security concerns Regardless of the cause, NULL can not be treated as a constant Operations with NULLs NULL + number  NULL NULL  number  NULL NULL = NULL  UNKNOWN X IS NULL  TRUE or FALSE (depending on X) NULL  0  NULL - NULL  NULL NULL
SQL Special Data Types (p2) UNKNOWN values Result from comparison with NULLs Other comparisons yield TRUE or FALSE  UNKNOWN means neither TRUE nor FALSE Operations when combined with other logical values UNKNOWN AND TRUE  UNKNOWN UNKNOWN AND FALSE  FALSE UNKNOWN OR TRUE  TRUE UNKNOWN OR FALSE  UNKNOWN NOT UNKNOWN  UNKNOWN
Ordering Results Relation BROWSER_TABLE SELECT BROWSER, PLATFORM FROM BROWSER_TABLE  WHERE ENGINE = 'Gecko' ORDER BY ENGINE_VERSION, BROWSER Start with the Relation Select () Rows Order Rows Project () Columns
Detour: World Database A sample MySQL database downloadable from the web Has 3 tables: City, Country, CountryLanguage City  ID, Name,CountryCode, District, Population Country  Code, Name, Continent, Region, SurfaceArea, IndepYear, Population, LifeExpectancy, GNP, GNPOld, LocalName, GovernmentForm, HeadOfState, Capital, Code2 CountryLanguage CountryCode, Language, IsOfficial, Percentage The three tables are ‘connected’ by the CountryCode attribute.
Joins Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia'     AND City.CountryCode = Country.Code ; Find all countries where Dutch is the official language SELECT Country.Name FROM Country, CountryLanguage WHERE CountryLanguage.CountryCode = Country.Code     AND CountryLanguage.Language = 'Dutch'     AND CountryLanguage.isOfficial = 'T' ;
Join Semantics – Nested Loops Find all cities in Estonia SELECT City.Name    FROM City, Country WHERE Country.Name = 'Estonia’  AND City.CountryCode = Country.Code Is equivalent to For each tuple t1 in City:   For each tuple t2 in Country:     If the WHERE clause is satisfied:       Accumulate <t1, t2> into a result set Project City.Name from the accumulated result set
Join Semantics – Relational Algebra Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia'     AND City.CountryCode = Country.Code Is equivalent to A1( B1='Estonia'ANDA2= B2(A  B) ) Where A = City, B = Country, 	A1 = City.Name, A2 = City.CountryCode, A3 = Country.Code
Self-Joins Find all districts in Kenya that have more than one city SELECT distinct c1.district  FROM city c1, city c2, country  WHERE c1.name != c2.name AND country.code = c1.countrycode  AND country.code = c2.countrycode  AND country.name = 'kenya'; The same table (city) gets used with two names, c1 and c2
Set Operators Find all districts in Kenya that have exactly one city (	SELECT distinct city.district 	FROM city, country  	WHERE country.code = city.countrycode 	AND country.name = 'kenya' ) EXCEPT (	SELECT distinct c1.district  	FROM city c1, city c2, country  	WHERE c1.name != c2.name 	AND country.code = c1.countrycode  	AND country.code = c2.countrycode  	AND country.name = 'kenya' ); Both sides must yield the same tuples Or UNION or INTERSECT
Subqueries A different way to structure queries (without using joins) SELECT	___________________ FROM	_____Subquery 3____ WHERE	_____Subquery 1____ 			_____Subquery 2____
Subqueries Returning Scalars Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia'     AND City.CountryCode = Country.Code Can also be written as SELECT Name FROM City WHERE CountryCode =  	(SELECT Code FROM Country WHERE Name = 'Estonia') The two forms are equivalent except when…
Conditions Returning Relations Find all countries where Dutch is the official language SELECT Country.Name FROM Country, CountryLanguage WHERE CountryLanguage.CountryCode = Country.Code     AND CountryLanguage.Language = 'Dutch'     AND isOfficial = 'T' ; Can also be written as SELECT Name FROM Country  WHERE Code IN  	(	SELECT CountryCode IN CountryLanguage 		WHERE Language = 'Dutch' AND isOfficial = 'T' );
Conditions Returning Tuples Find all countries where Dutch is the official language SELECT Name FROM Country  WHERE Code IN  	(	SELECT CountryCode IN CountryLanguage 		WHERE Language = 'Dutch' AND isOfficial = 'T' ); Can also be written as SELECT Name FROM Country  WHERE (Code, 'T') IN  	(	SELECT CountryCode, isOfficial FROM CountryLanguage 		WHERE Language = 'Dutch' );
Subqueries in FROM clauses Total population of all countries with Dutch as the official language SELECT Name FROM Country  WHERE Code IN  	(	SELECT CountryCode IN CountryLanguage 		WHERE Language = 'Dutch' AND isOfficial = 'T' );
Cross Joins Populations of cities in Finland relative to Aruba & Singapore SELECT  	city.name as City,  city.population as Population,  	cntry.name as Country, 	(city.population * 100 / cntry.population) as 'Percent'  FROM  (SELECT * FROM CITY WHERE CountryCode = 'fin') AS city  CROSS JOIN (SELECT * FROM Country WHERE Code='abw' OR Code=‘sgp') AS cntry;
Theta Joins Cross Join with a condition The most common form of JOIN All cities in Finland with a population at least double of Aruba  SELECT   	cty.name as City,   cty.population as Population,   	cntry.name as Country,  	(cty.population * 100 / cntry.population) as 'Percent'  FROM   	(	SELECT * FROM City WHERE CountryCode = 'fin') AS cty JOIN (SELECT * FROM Country WHERE Code='abw') AS cntry ONcty.population > 2*cntry.population;
Outer Joins Selecting elements of a table regardless of whether they are present in the other table. Cities starting with 'TOK' and countries starting with 'J' SELECT c.*, r.name as Country  FROM  (select * from city where city.name like 'tok%') as c  LEFT OUTER JOIN  (select * from country where country.code like 'j%') as r  ON (c.countrycode=r.code); Yields 6 cities, 5 in Japan and Tokat in Turkey What if we had done RIGHT OUTER JOIN?
Review and Contrast Joins MySQL does not implement FULL OUTER JOIN How can we get it if we need it? Are CROSS JOIN and FULL OUTER JOIN the same thing? Table A has 3 rows, table B has 5 rows. How many rows does A CROSS JOIN B have? How many rows does A LEFT OUTER JOIN B have? How about A RIGHT OUTER JOIN B? A FULL OUTER JOIN B? A INNER JOIN B?
Reading Assignment Section 6.4 Section 6.5 Keep timing considerations in mind SQL completely evaluates the query before affecting changes
Transactions ACID Atomicity Sets of database operations that need to be accomplished atomically, either they all get done or none do. E.g., during money transfer, If money is taken out of one account, it must be added to the other Consistency Enforce constraints on types, values, foreign keys Maintain relationships among data elements (see Atomicity) Isolation Each transaction must appear to be executed as if no other transaction is executing at the same time. Durability Once committed, the change is permanent.
Detour: Transaction Scenario Real Time Bank (RTB) is an on-line bank. RTB executes money transfers as soon as requests are entered RTB shows up-to-the-minute account balances Transactions that would create a negative balances are denied Scenario Initially, Alice has $250, Bob has $100, Cathy has $150 Transactions:  Alice pays Bob $200 Bob pays Cathy $150 Cathy pays Alice $250 Interesting aside: only transaction order 1, 2, 3 will succeed At a Nightly Processing Bank, transaction order would be irrelevant
Transaction Atomicity Work by example: Alice pays Bob $200 BEGIN TRANSACTION UPDATE Accounts SET balance = balance – 200 WHERE Owner = 'Alice' IF (0 > SELECT balance FROM Accounts WHERE Owner = 'Alice‘, ROLLBACK TRANSACTION )	-- Note: Pidgin SQL Syntax UPDATE Accounts SET balance = balance + 200 WHERE Owner = 'Bob‘ COMMIT TRANSACTION
Transaction Isolation Isolation levels and the problems they leave behind: READ UNCOMMITTED Dirty Read – data of an uncommitted transaction visible to others READ COMMITTED: only committed data is visible Non-repeatable Read – re-reads some data and find that it has changed due to another transaction committing REPEATABLE READ: place locks on all data that are used in the transaction Phantom Read – re-execute a subquery returning a set of rows and find a different set of rows SERIALIZABLE: As if all transactions occur in a completely isolated fashion Too restrictive, not able to support enough transaction volume Note: Not every database offers each isolation level. Choose the isolation level with care!
CS 542 Database Management Systems Database Logic – The Foundation of Datalog
AboutDatalog Intellectual debt to Prolog, the logic programming language Responsible for addition of recursion to SQL-99. Extends SQL but still leaves it Turing-incomplete Introductory example: Facts: Par(sally, john), Par(martha, mary), Par(mary, peter), Par(john, peter) Rules: Sib(x, y)  Par(x, p) AND Par(y, p) AND x <> y Cousin(x, y)  Sib(x, y) Cousin(x, y)  Par(x, xp) AND Par(y, yp) AND Cousin(xp, yp)   Cousin(sally, martha)
Why Data Logic? Why is SQL not sufficient? Deductive rules express things that go in both FROM and WHERE clauses Allow for stating general requirements that are more difficult to state correctly in SQL Allow us to take advantage of research in logic programming and AI
The Formalism of Rules The Head is true if all the subgoals are true The rule applies for all values of its arguments A variable appearing in the head is distinguished ; otherwise it is nondistinguished. Ancestor(x, y)  Head = consequent, a single subgoal Read this symbol “if” Parent(x, z) AND Ancestor(z, y) Body = antecedent = AND of subgoals.
Interpreting Rules ,[object Object],For the head to be true, all variables must appear in some non-negated subgoal of the body Unsafe examples:
IDB/EDB Convention: Predicates begin with a capital, variables begin with lowercase e.g., Ancestor (x, y) Fact predicates are atoms represented as relations If a tuple exists, that fact is true Otherwise, false A predicate representing a stored relation is called an extensional database (EDB). Subgoals of a rule may be facts or may themselves be rules EDB when it is a fact Intensional database (IDB) when it is a “derived relation” Rule heads are always IDBs
Computing IDB Relations Bottom-up empty out all IDB relations REPEAT 	FOR (each IDB predicate p) DO 	    evaluate p using current 	        values of all relations; UNTIL (no IDB relation is changed) As long as there is no negation of IDB subgoals, each IDB relation grows with each iteration At least, it does not shrink Since relations are finite, the loop eventually terminates Some rules make it impossible to predict that the loop has a chance to terminate.  Considered unsafe
Computing IDB Relations Top-Down (p1) EDB: Par(c,p) = p  is a parent of c. Generalized cousins: people with common ancestors one or more generations back: Sib(x,y) <- Par(x,p) AND Par(y,p) AND x<>y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) 				AND Cousin(xp,yp) Form a dependency graph  whose nodes = IDB predicates. Arc X ->Y  if and only if there is a rule with X  in the head and Y  in the body. Cycle = recursion; no cycle = no recursion.
Computing IDB Relations Top-down (p2) for IDB predicate p(x,y, …) 	FOR EACH subgoal of p DO 	  IF subgoal is IDB, recursive call; 	  IF subgoal is EDB, look up The recursion eventually terminates unless: A distinguished variable  does not appear in a subgoal only appears in a negated subgoal only appears in an arithmetic subgoal Same 3 conditions for variables in an arithmetic subgoal Same 3 conditions for variables in a negated subgoal
Safe Rules A rule is safe  if: Each distinguished variable, Each variable in an arithmetic subgoal, and Each variable in a negated subgoal, 	also appears in a nonnegated, 	relational subgoal. Safe rules prevent infinite results.
Evaluating Datalog Programs As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated. If an IDB predicate has more than one rule, each rule contributes tuples to its relation.
Expressive Power of Datalog Without recursion, Datalog can express all and only the queries of core relational algebra. The same as SQL select-from-where, without aggregation and grouping. But with recursion, Datalog can express more than these languages. Yet still not Turing-complete.
SQL Rule Definitions & Usage Definition of Datalog Rules: WITH [RECURSIVE] <RuleName> (<arguments>) AS <query>; Invocation of Datalog Rules: <SQL query about EDB, IDB>
SQL Recursion Example (p1) Find Sally’s cousins Using Recursive definition introduced earlier Par (child, parent) is the EDB Expected SQL Query SELECT y FROM Cousin WHERE x = ‘Sally’; But first, we need to define the IDB Cousin
SQL Recursion Example (p2) WITH Clause (non-recursive) WITH Sib(x, y) AS 	FROM Par p1, Par p2 	WHERE p1.parent = p2.parent 	AND p1.child <> p2.child; WITH Clause (recursive) RECURSIVE Cousin(x, y) AS 	(SELECT * FROM Sib) 		UNION 	(SELECT p1.child, p2.child 	 FROM Par p1, Par p2, Cousin 	 WHERE p1.parent = Cousin.x 	 AND p2.parent = Cousin.y);
Next meeting January 31 Sections 7.1 – 7.3 Sections 8.1, 8.3 – 8.4 Discussion of presentationtopic proposals

Mais conteúdo relacionado

Semelhante a CS 542 Overview of query processing

CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index StructuresJ Singh
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceJ Singh
 
Learning sql from w3schools
Learning sql from w3schoolsLearning sql from w3schools
Learning sql from w3schoolsfarhan516
 
Into to DBI with DBD::Oracle
Into to DBI with DBD::OracleInto to DBI with DBD::Oracle
Into to DBI with DBD::Oraclebyterock
 
The ultimate-guide-to-sql
The ultimate-guide-to-sqlThe ultimate-guide-to-sql
The ultimate-guide-to-sqlMcNamaraChiwaye
 
Sql basics
Sql basicsSql basics
Sql basicsKumar
 
php basic sql
php basic sqlphp basic sql
php basic sqltumetr1
 
Language Integrated Query By Nyros Developer
Language Integrated Query By Nyros DeveloperLanguage Integrated Query By Nyros Developer
Language Integrated Query By Nyros DeveloperNyros Technologies
 
Relational Database to Apache Spark (and sometimes back again)
Relational Database to Apache Spark (and sometimes back again)Relational Database to Apache Spark (and sometimes back again)
Relational Database to Apache Spark (and sometimes back again)Ed Thewlis
 

Semelhante a CS 542 Overview of query processing (20)

CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index Structures
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and Performance
 
Crash course in sql
Crash course in sqlCrash course in sql
Crash course in sql
 
Learning sql from w3schools
Learning sql from w3schoolsLearning sql from w3schools
Learning sql from w3schools
 
SQL
SQLSQL
SQL
 
Into to DBI with DBD::Oracle
Into to DBI with DBD::OracleInto to DBI with DBD::Oracle
Into to DBI with DBD::Oracle
 
The ultimate-guide-to-sql
The ultimate-guide-to-sqlThe ultimate-guide-to-sql
The ultimate-guide-to-sql
 
Introduction to sql
Introduction to sqlIntroduction to sql
Introduction to sql
 
Ch3rerevised
Ch3rerevisedCh3rerevised
Ch3rerevised
 
Sql General
Sql General Sql General
Sql General
 
SQL Basics
SQL BasicsSQL Basics
SQL Basics
 
Sql basics
Sql basicsSql basics
Sql basics
 
Sql basics
Sql basicsSql basics
Sql basics
 
Sql basics
Sql basicsSql basics
Sql basics
 
php basic sql
php basic sqlphp basic sql
php basic sql
 
PDBC
PDBCPDBC
PDBC
 
Dbms
DbmsDbms
Dbms
 
Language Integrated Query By Nyros Developer
Language Integrated Query By Nyros DeveloperLanguage Integrated Query By Nyros Developer
Language Integrated Query By Nyros Developer
 
Relational Database to Apache Spark (and sometimes back again)
Relational Database to Apache Spark (and sometimes back again)Relational Database to Apache Spark (and sometimes back again)
Relational Database to Apache Spark (and sometimes back again)
 
Lecture5-SQL.docx
Lecture5-SQL.docxLecture5-SQL.docx
Lecture5-SQL.docx
 

Mais de J Singh

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashingJ Singh
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big dataJ Singh
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 updateJ Singh
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engineJ Singh
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)J Singh
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsJ Singh
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceJ Singh
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceJ Singh
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data AnalysisJ Singh
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitJ Singh
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlJ Singh
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query OptimizationJ Singh
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
CS 542 Introduction
CS 542 IntroductionCS 542 Introduction
CS 542 IntroductionJ Singh
 
Cloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's ViewpointCloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's ViewpointJ Singh
 

Mais de J Singh (20)

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big data
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 update
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engine
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and Tradeoffs
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map Reduce
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data Analysis
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
CS 542 Introduction
CS 542 IntroductionCS 542 Introduction
CS 542 Introduction
 
Cloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's ViewpointCloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's Viewpoint
 

CS 542 Overview of query processing

  • 1. CS 542 Database Management Systems Relational Database Programming J Singh January 24, 2011
  • 2. Simple SQL Queries (p1) Relation BROWSER_TABLE SELECT * FROM BROWSER_TABLE WHERE ENGINE = 'Gecko' Start with the Relation Select () Rows
  • 3. Simple SQL Queries (p2) Relation BROWSER_TABLE SELECT BROWSER, PLATFORM FROM BROWSER_TABLE WHERE ENGINE = 'Gecko' Start with the Relation Select () Rows Project () Columns
  • 4. Simple SQL Queries (p3) Relation BROWSER_TABLE SELECT BROWSER, PLATFORM AS OS FROM BROWSER_TABLE WHERE ENGINE = 'Gecko' Start with the Relation Select () Rows Project () Columns Rename () Columns
  • 5. SQL Conditions In WHERE clause: String1 = String2, String1 > String2 and other comparison operators Comparisons are controlled by ‘collations’, e.g., COLLATE Latin1_General_CI_AS (Latin1 collation, case insensitive, accent sensitive) For other available collations, check your database Collations can be specified at three levels For the entire database For an attribute during in CREATE TABLE In the WHERE clause LIKE String (pattern matching), e.g., 'John Wayne' LIKE 'John%' 'John Wayne' LIKE ‘% W_yne'
  • 6. SQL Special Data Types (p1) Dates and Times (look them up) NULL values ( in Relational Algebra) Can mean one of three things: Value is unknown Value is inapplicable (e.g., spouse name for a single person) Value not shown – perhaps because of security concerns Regardless of the cause, NULL can not be treated as a constant Operations with NULLs NULL + number  NULL NULL  number  NULL NULL = NULL  UNKNOWN X IS NULL  TRUE or FALSE (depending on X) NULL  0  NULL - NULL  NULL NULL
  • 7. SQL Special Data Types (p2) UNKNOWN values Result from comparison with NULLs Other comparisons yield TRUE or FALSE UNKNOWN means neither TRUE nor FALSE Operations when combined with other logical values UNKNOWN AND TRUE  UNKNOWN UNKNOWN AND FALSE  FALSE UNKNOWN OR TRUE  TRUE UNKNOWN OR FALSE  UNKNOWN NOT UNKNOWN  UNKNOWN
  • 8. Ordering Results Relation BROWSER_TABLE SELECT BROWSER, PLATFORM FROM BROWSER_TABLE WHERE ENGINE = 'Gecko' ORDER BY ENGINE_VERSION, BROWSER Start with the Relation Select () Rows Order Rows Project () Columns
  • 9. Detour: World Database A sample MySQL database downloadable from the web Has 3 tables: City, Country, CountryLanguage City ID, Name,CountryCode, District, Population Country Code, Name, Continent, Region, SurfaceArea, IndepYear, Population, LifeExpectancy, GNP, GNPOld, LocalName, GovernmentForm, HeadOfState, Capital, Code2 CountryLanguage CountryCode, Language, IsOfficial, Percentage The three tables are ‘connected’ by the CountryCode attribute.
  • 10. Joins Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia' AND City.CountryCode = Country.Code ; Find all countries where Dutch is the official language SELECT Country.Name FROM Country, CountryLanguage WHERE CountryLanguage.CountryCode = Country.Code AND CountryLanguage.Language = 'Dutch' AND CountryLanguage.isOfficial = 'T' ;
  • 11. Join Semantics – Nested Loops Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia’ AND City.CountryCode = Country.Code Is equivalent to For each tuple t1 in City: For each tuple t2 in Country: If the WHERE clause is satisfied: Accumulate <t1, t2> into a result set Project City.Name from the accumulated result set
  • 12. Join Semantics – Relational Algebra Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia' AND City.CountryCode = Country.Code Is equivalent to A1( B1='Estonia'ANDA2= B2(A  B) ) Where A = City, B = Country, A1 = City.Name, A2 = City.CountryCode, A3 = Country.Code
  • 13. Self-Joins Find all districts in Kenya that have more than one city SELECT distinct c1.district FROM city c1, city c2, country WHERE c1.name != c2.name AND country.code = c1.countrycode AND country.code = c2.countrycode AND country.name = 'kenya'; The same table (city) gets used with two names, c1 and c2
  • 14. Set Operators Find all districts in Kenya that have exactly one city ( SELECT distinct city.district FROM city, country WHERE country.code = city.countrycode AND country.name = 'kenya' ) EXCEPT ( SELECT distinct c1.district FROM city c1, city c2, country WHERE c1.name != c2.name AND country.code = c1.countrycode AND country.code = c2.countrycode AND country.name = 'kenya' ); Both sides must yield the same tuples Or UNION or INTERSECT
  • 15. Subqueries A different way to structure queries (without using joins) SELECT ___________________ FROM _____Subquery 3____ WHERE _____Subquery 1____ _____Subquery 2____
  • 16. Subqueries Returning Scalars Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia' AND City.CountryCode = Country.Code Can also be written as SELECT Name FROM City WHERE CountryCode = (SELECT Code FROM Country WHERE Name = 'Estonia') The two forms are equivalent except when…
  • 17. Conditions Returning Relations Find all countries where Dutch is the official language SELECT Country.Name FROM Country, CountryLanguage WHERE CountryLanguage.CountryCode = Country.Code AND CountryLanguage.Language = 'Dutch' AND isOfficial = 'T' ; Can also be written as SELECT Name FROM Country WHERE Code IN ( SELECT CountryCode IN CountryLanguage WHERE Language = 'Dutch' AND isOfficial = 'T' );
  • 18. Conditions Returning Tuples Find all countries where Dutch is the official language SELECT Name FROM Country WHERE Code IN ( SELECT CountryCode IN CountryLanguage WHERE Language = 'Dutch' AND isOfficial = 'T' ); Can also be written as SELECT Name FROM Country WHERE (Code, 'T') IN ( SELECT CountryCode, isOfficial FROM CountryLanguage WHERE Language = 'Dutch' );
  • 19. Subqueries in FROM clauses Total population of all countries with Dutch as the official language SELECT Name FROM Country WHERE Code IN ( SELECT CountryCode IN CountryLanguage WHERE Language = 'Dutch' AND isOfficial = 'T' );
  • 20. Cross Joins Populations of cities in Finland relative to Aruba & Singapore SELECT city.name as City, city.population as Population, cntry.name as Country, (city.population * 100 / cntry.population) as 'Percent' FROM (SELECT * FROM CITY WHERE CountryCode = 'fin') AS city CROSS JOIN (SELECT * FROM Country WHERE Code='abw' OR Code=‘sgp') AS cntry;
  • 21. Theta Joins Cross Join with a condition The most common form of JOIN All cities in Finland with a population at least double of Aruba SELECT cty.name as City, cty.population as Population, cntry.name as Country, (cty.population * 100 / cntry.population) as 'Percent' FROM ( SELECT * FROM City WHERE CountryCode = 'fin') AS cty JOIN (SELECT * FROM Country WHERE Code='abw') AS cntry ONcty.population > 2*cntry.population;
  • 22. Outer Joins Selecting elements of a table regardless of whether they are present in the other table. Cities starting with 'TOK' and countries starting with 'J' SELECT c.*, r.name as Country FROM (select * from city where city.name like 'tok%') as c LEFT OUTER JOIN (select * from country where country.code like 'j%') as r ON (c.countrycode=r.code); Yields 6 cities, 5 in Japan and Tokat in Turkey What if we had done RIGHT OUTER JOIN?
  • 23. Review and Contrast Joins MySQL does not implement FULL OUTER JOIN How can we get it if we need it? Are CROSS JOIN and FULL OUTER JOIN the same thing? Table A has 3 rows, table B has 5 rows. How many rows does A CROSS JOIN B have? How many rows does A LEFT OUTER JOIN B have? How about A RIGHT OUTER JOIN B? A FULL OUTER JOIN B? A INNER JOIN B?
  • 24. Reading Assignment Section 6.4 Section 6.5 Keep timing considerations in mind SQL completely evaluates the query before affecting changes
  • 25. Transactions ACID Atomicity Sets of database operations that need to be accomplished atomically, either they all get done or none do. E.g., during money transfer, If money is taken out of one account, it must be added to the other Consistency Enforce constraints on types, values, foreign keys Maintain relationships among data elements (see Atomicity) Isolation Each transaction must appear to be executed as if no other transaction is executing at the same time. Durability Once committed, the change is permanent.
  • 26. Detour: Transaction Scenario Real Time Bank (RTB) is an on-line bank. RTB executes money transfers as soon as requests are entered RTB shows up-to-the-minute account balances Transactions that would create a negative balances are denied Scenario Initially, Alice has $250, Bob has $100, Cathy has $150 Transactions: Alice pays Bob $200 Bob pays Cathy $150 Cathy pays Alice $250 Interesting aside: only transaction order 1, 2, 3 will succeed At a Nightly Processing Bank, transaction order would be irrelevant
  • 27. Transaction Atomicity Work by example: Alice pays Bob $200 BEGIN TRANSACTION UPDATE Accounts SET balance = balance – 200 WHERE Owner = 'Alice' IF (0 > SELECT balance FROM Accounts WHERE Owner = 'Alice‘, ROLLBACK TRANSACTION ) -- Note: Pidgin SQL Syntax UPDATE Accounts SET balance = balance + 200 WHERE Owner = 'Bob‘ COMMIT TRANSACTION
  • 28. Transaction Isolation Isolation levels and the problems they leave behind: READ UNCOMMITTED Dirty Read – data of an uncommitted transaction visible to others READ COMMITTED: only committed data is visible Non-repeatable Read – re-reads some data and find that it has changed due to another transaction committing REPEATABLE READ: place locks on all data that are used in the transaction Phantom Read – re-execute a subquery returning a set of rows and find a different set of rows SERIALIZABLE: As if all transactions occur in a completely isolated fashion Too restrictive, not able to support enough transaction volume Note: Not every database offers each isolation level. Choose the isolation level with care!
  • 29. CS 542 Database Management Systems Database Logic – The Foundation of Datalog
  • 30. AboutDatalog Intellectual debt to Prolog, the logic programming language Responsible for addition of recursion to SQL-99. Extends SQL but still leaves it Turing-incomplete Introductory example: Facts: Par(sally, john), Par(martha, mary), Par(mary, peter), Par(john, peter) Rules: Sib(x, y)  Par(x, p) AND Par(y, p) AND x <> y Cousin(x, y)  Sib(x, y) Cousin(x, y)  Par(x, xp) AND Par(y, yp) AND Cousin(xp, yp)  Cousin(sally, martha)
  • 31. Why Data Logic? Why is SQL not sufficient? Deductive rules express things that go in both FROM and WHERE clauses Allow for stating general requirements that are more difficult to state correctly in SQL Allow us to take advantage of research in logic programming and AI
  • 32. The Formalism of Rules The Head is true if all the subgoals are true The rule applies for all values of its arguments A variable appearing in the head is distinguished ; otherwise it is nondistinguished. Ancestor(x, y)  Head = consequent, a single subgoal Read this symbol “if” Parent(x, z) AND Ancestor(z, y) Body = antecedent = AND of subgoals.
  • 33.
  • 34. IDB/EDB Convention: Predicates begin with a capital, variables begin with lowercase e.g., Ancestor (x, y) Fact predicates are atoms represented as relations If a tuple exists, that fact is true Otherwise, false A predicate representing a stored relation is called an extensional database (EDB). Subgoals of a rule may be facts or may themselves be rules EDB when it is a fact Intensional database (IDB) when it is a “derived relation” Rule heads are always IDBs
  • 35. Computing IDB Relations Bottom-up empty out all IDB relations REPEAT FOR (each IDB predicate p) DO evaluate p using current values of all relations; UNTIL (no IDB relation is changed) As long as there is no negation of IDB subgoals, each IDB relation grows with each iteration At least, it does not shrink Since relations are finite, the loop eventually terminates Some rules make it impossible to predict that the loop has a chance to terminate. Considered unsafe
  • 36. Computing IDB Relations Top-Down (p1) EDB: Par(c,p) = p is a parent of c. Generalized cousins: people with common ancestors one or more generations back: Sib(x,y) <- Par(x,p) AND Par(y,p) AND x<>y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp) Form a dependency graph whose nodes = IDB predicates. Arc X ->Y if and only if there is a rule with X in the head and Y in the body. Cycle = recursion; no cycle = no recursion.
  • 37. Computing IDB Relations Top-down (p2) for IDB predicate p(x,y, …) FOR EACH subgoal of p DO IF subgoal is IDB, recursive call; IF subgoal is EDB, look up The recursion eventually terminates unless: A distinguished variable does not appear in a subgoal only appears in a negated subgoal only appears in an arithmetic subgoal Same 3 conditions for variables in an arithmetic subgoal Same 3 conditions for variables in a negated subgoal
  • 38. Safe Rules A rule is safe if: Each distinguished variable, Each variable in an arithmetic subgoal, and Each variable in a negated subgoal, also appears in a nonnegated, relational subgoal. Safe rules prevent infinite results.
  • 39. Evaluating Datalog Programs As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated. If an IDB predicate has more than one rule, each rule contributes tuples to its relation.
  • 40. Expressive Power of Datalog Without recursion, Datalog can express all and only the queries of core relational algebra. The same as SQL select-from-where, without aggregation and grouping. But with recursion, Datalog can express more than these languages. Yet still not Turing-complete.
  • 41. SQL Rule Definitions & Usage Definition of Datalog Rules: WITH [RECURSIVE] <RuleName> (<arguments>) AS <query>; Invocation of Datalog Rules: <SQL query about EDB, IDB>
  • 42. SQL Recursion Example (p1) Find Sally’s cousins Using Recursive definition introduced earlier Par (child, parent) is the EDB Expected SQL Query SELECT y FROM Cousin WHERE x = ‘Sally’; But first, we need to define the IDB Cousin
  • 43. SQL Recursion Example (p2) WITH Clause (non-recursive) WITH Sib(x, y) AS FROM Par p1, Par p2 WHERE p1.parent = p2.parent AND p1.child <> p2.child; WITH Clause (recursive) RECURSIVE Cousin(x, y) AS (SELECT * FROM Sib) UNION (SELECT p1.child, p2.child FROM Par p1, Par p2, Cousin WHERE p1.parent = Cousin.x AND p2.parent = Cousin.y);
  • 44. Next meeting January 31 Sections 7.1 – 7.3 Sections 8.1, 8.3 – 8.4 Discussion of presentationtopic proposals