1. CS 542 Database Management Systems Relational Database Programming J Singh January 24, 2011
2. Simple SQL Queries (p1) Relation BROWSER_TABLE SELECT * FROM BROWSER_TABLE WHERE ENGINE = 'Gecko' Start with the Relation Select () Rows
3. Simple SQL Queries (p2) Relation BROWSER_TABLE SELECT BROWSER, PLATFORM FROM BROWSER_TABLE WHERE ENGINE = 'Gecko' Start with the Relation Select () Rows Project () Columns
4. Simple SQL Queries (p3) Relation BROWSER_TABLE SELECT BROWSER, PLATFORM AS OS FROM BROWSER_TABLE WHERE ENGINE = 'Gecko' Start with the Relation Select () Rows Project () Columns Rename () Columns
5. SQL Conditions In WHERE clause: String1 = String2, String1 > String2 and other comparison operators Comparisons are controlled by ‘collations’, e.g., COLLATE Latin1_General_CI_AS (Latin1 collation, case insensitive, accent sensitive) For other available collations, check your database Collations can be specified at three levels For the entire database For an attribute during in CREATE TABLE In the WHERE clause LIKE String (pattern matching), e.g., 'John Wayne' LIKE 'John%' 'John Wayne' LIKE ‘% W_yne'
6. SQL Special Data Types (p1) Dates and Times (look them up) NULL values ( in Relational Algebra) Can mean one of three things: Value is unknown Value is inapplicable (e.g., spouse name for a single person) Value not shown – perhaps because of security concerns Regardless of the cause, NULL can not be treated as a constant Operations with NULLs NULL + number NULL NULL number NULL NULL = NULL UNKNOWN X IS NULL TRUE or FALSE (depending on X) NULL 0 NULL - NULL NULL NULL
7. SQL Special Data Types (p2) UNKNOWN values Result from comparison with NULLs Other comparisons yield TRUE or FALSE UNKNOWN means neither TRUE nor FALSE Operations when combined with other logical values UNKNOWN AND TRUE UNKNOWN UNKNOWN AND FALSE FALSE UNKNOWN OR TRUE TRUE UNKNOWN OR FALSE UNKNOWN NOT UNKNOWN UNKNOWN
8. Ordering Results Relation BROWSER_TABLE SELECT BROWSER, PLATFORM FROM BROWSER_TABLE WHERE ENGINE = 'Gecko' ORDER BY ENGINE_VERSION, BROWSER Start with the Relation Select () Rows Order Rows Project () Columns
9. Detour: World Database A sample MySQL database downloadable from the web Has 3 tables: City, Country, CountryLanguage City ID, Name,CountryCode, District, Population Country Code, Name, Continent, Region, SurfaceArea, IndepYear, Population, LifeExpectancy, GNP, GNPOld, LocalName, GovernmentForm, HeadOfState, Capital, Code2 CountryLanguage CountryCode, Language, IsOfficial, Percentage The three tables are ‘connected’ by the CountryCode attribute.
10. Joins Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia' AND City.CountryCode = Country.Code ; Find all countries where Dutch is the official language SELECT Country.Name FROM Country, CountryLanguage WHERE CountryLanguage.CountryCode = Country.Code AND CountryLanguage.Language = 'Dutch' AND CountryLanguage.isOfficial = 'T' ;
11. Join Semantics – Nested Loops Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia’ AND City.CountryCode = Country.Code Is equivalent to For each tuple t1 in City: For each tuple t2 in Country: If the WHERE clause is satisfied: Accumulate <t1, t2> into a result set Project City.Name from the accumulated result set
12. Join Semantics – Relational Algebra Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia' AND City.CountryCode = Country.Code Is equivalent to A1( B1='Estonia'ANDA2= B2(A B) ) Where A = City, B = Country, A1 = City.Name, A2 = City.CountryCode, A3 = Country.Code
13. Self-Joins Find all districts in Kenya that have more than one city SELECT distinct c1.district FROM city c1, city c2, country WHERE c1.name != c2.name AND country.code = c1.countrycode AND country.code = c2.countrycode AND country.name = 'kenya'; The same table (city) gets used with two names, c1 and c2
14. Set Operators Find all districts in Kenya that have exactly one city ( SELECT distinct city.district FROM city, country WHERE country.code = city.countrycode AND country.name = 'kenya' ) EXCEPT ( SELECT distinct c1.district FROM city c1, city c2, country WHERE c1.name != c2.name AND country.code = c1.countrycode AND country.code = c2.countrycode AND country.name = 'kenya' ); Both sides must yield the same tuples Or UNION or INTERSECT
15. Subqueries A different way to structure queries (without using joins) SELECT ___________________ FROM _____Subquery 3____ WHERE _____Subquery 1____ _____Subquery 2____
16. Subqueries Returning Scalars Find all cities in Estonia SELECT City.Name FROM City, Country WHERE Country.Name = 'Estonia' AND City.CountryCode = Country.Code Can also be written as SELECT Name FROM City WHERE CountryCode = (SELECT Code FROM Country WHERE Name = 'Estonia') The two forms are equivalent except when…
17. Conditions Returning Relations Find all countries where Dutch is the official language SELECT Country.Name FROM Country, CountryLanguage WHERE CountryLanguage.CountryCode = Country.Code AND CountryLanguage.Language = 'Dutch' AND isOfficial = 'T' ; Can also be written as SELECT Name FROM Country WHERE Code IN ( SELECT CountryCode IN CountryLanguage WHERE Language = 'Dutch' AND isOfficial = 'T' );
18. Conditions Returning Tuples Find all countries where Dutch is the official language SELECT Name FROM Country WHERE Code IN ( SELECT CountryCode IN CountryLanguage WHERE Language = 'Dutch' AND isOfficial = 'T' ); Can also be written as SELECT Name FROM Country WHERE (Code, 'T') IN ( SELECT CountryCode, isOfficial FROM CountryLanguage WHERE Language = 'Dutch' );
19. Subqueries in FROM clauses Total population of all countries with Dutch as the official language SELECT Name FROM Country WHERE Code IN ( SELECT CountryCode IN CountryLanguage WHERE Language = 'Dutch' AND isOfficial = 'T' );
20. Cross Joins Populations of cities in Finland relative to Aruba & Singapore SELECT city.name as City, city.population as Population, cntry.name as Country, (city.population * 100 / cntry.population) as 'Percent' FROM (SELECT * FROM CITY WHERE CountryCode = 'fin') AS city CROSS JOIN (SELECT * FROM Country WHERE Code='abw' OR Code=‘sgp') AS cntry;
21. Theta Joins Cross Join with a condition The most common form of JOIN All cities in Finland with a population at least double of Aruba SELECT cty.name as City, cty.population as Population, cntry.name as Country, (cty.population * 100 / cntry.population) as 'Percent' FROM ( SELECT * FROM City WHERE CountryCode = 'fin') AS cty JOIN (SELECT * FROM Country WHERE Code='abw') AS cntry ONcty.population > 2*cntry.population;
22. Outer Joins Selecting elements of a table regardless of whether they are present in the other table. Cities starting with 'TOK' and countries starting with 'J' SELECT c.*, r.name as Country FROM (select * from city where city.name like 'tok%') as c LEFT OUTER JOIN (select * from country where country.code like 'j%') as r ON (c.countrycode=r.code); Yields 6 cities, 5 in Japan and Tokat in Turkey What if we had done RIGHT OUTER JOIN?
23. Review and Contrast Joins MySQL does not implement FULL OUTER JOIN How can we get it if we need it? Are CROSS JOIN and FULL OUTER JOIN the same thing? Table A has 3 rows, table B has 5 rows. How many rows does A CROSS JOIN B have? How many rows does A LEFT OUTER JOIN B have? How about A RIGHT OUTER JOIN B? A FULL OUTER JOIN B? A INNER JOIN B?
24. Reading Assignment Section 6.4 Section 6.5 Keep timing considerations in mind SQL completely evaluates the query before affecting changes
25. Transactions ACID Atomicity Sets of database operations that need to be accomplished atomically, either they all get done or none do. E.g., during money transfer, If money is taken out of one account, it must be added to the other Consistency Enforce constraints on types, values, foreign keys Maintain relationships among data elements (see Atomicity) Isolation Each transaction must appear to be executed as if no other transaction is executing at the same time. Durability Once committed, the change is permanent.
26. Detour: Transaction Scenario Real Time Bank (RTB) is an on-line bank. RTB executes money transfers as soon as requests are entered RTB shows up-to-the-minute account balances Transactions that would create a negative balances are denied Scenario Initially, Alice has $250, Bob has $100, Cathy has $150 Transactions: Alice pays Bob $200 Bob pays Cathy $150 Cathy pays Alice $250 Interesting aside: only transaction order 1, 2, 3 will succeed At a Nightly Processing Bank, transaction order would be irrelevant
27. Transaction Atomicity Work by example: Alice pays Bob $200 BEGIN TRANSACTION UPDATE Accounts SET balance = balance – 200 WHERE Owner = 'Alice' IF (0 > SELECT balance FROM Accounts WHERE Owner = 'Alice‘, ROLLBACK TRANSACTION ) -- Note: Pidgin SQL Syntax UPDATE Accounts SET balance = balance + 200 WHERE Owner = 'Bob‘ COMMIT TRANSACTION
28. Transaction Isolation Isolation levels and the problems they leave behind: READ UNCOMMITTED Dirty Read – data of an uncommitted transaction visible to others READ COMMITTED: only committed data is visible Non-repeatable Read – re-reads some data and find that it has changed due to another transaction committing REPEATABLE READ: place locks on all data that are used in the transaction Phantom Read – re-execute a subquery returning a set of rows and find a different set of rows SERIALIZABLE: As if all transactions occur in a completely isolated fashion Too restrictive, not able to support enough transaction volume Note: Not every database offers each isolation level. Choose the isolation level with care!
29. CS 542 Database Management Systems Database Logic – The Foundation of Datalog
30. AboutDatalog Intellectual debt to Prolog, the logic programming language Responsible for addition of recursion to SQL-99. Extends SQL but still leaves it Turing-incomplete Introductory example: Facts: Par(sally, john), Par(martha, mary), Par(mary, peter), Par(john, peter) Rules: Sib(x, y) Par(x, p) AND Par(y, p) AND x <> y Cousin(x, y) Sib(x, y) Cousin(x, y) Par(x, xp) AND Par(y, yp) AND Cousin(xp, yp) Cousin(sally, martha)
31. Why Data Logic? Why is SQL not sufficient? Deductive rules express things that go in both FROM and WHERE clauses Allow for stating general requirements that are more difficult to state correctly in SQL Allow us to take advantage of research in logic programming and AI
32. The Formalism of Rules The Head is true if all the subgoals are true The rule applies for all values of its arguments A variable appearing in the head is distinguished ; otherwise it is nondistinguished. Ancestor(x, y) Head = consequent, a single subgoal Read this symbol “if” Parent(x, z) AND Ancestor(z, y) Body = antecedent = AND of subgoals.
33.
34. IDB/EDB Convention: Predicates begin with a capital, variables begin with lowercase e.g., Ancestor (x, y) Fact predicates are atoms represented as relations If a tuple exists, that fact is true Otherwise, false A predicate representing a stored relation is called an extensional database (EDB). Subgoals of a rule may be facts or may themselves be rules EDB when it is a fact Intensional database (IDB) when it is a “derived relation” Rule heads are always IDBs
35. Computing IDB Relations Bottom-up empty out all IDB relations REPEAT FOR (each IDB predicate p) DO evaluate p using current values of all relations; UNTIL (no IDB relation is changed) As long as there is no negation of IDB subgoals, each IDB relation grows with each iteration At least, it does not shrink Since relations are finite, the loop eventually terminates Some rules make it impossible to predict that the loop has a chance to terminate. Considered unsafe
36. Computing IDB Relations Top-Down (p1) EDB: Par(c,p) = p is a parent of c. Generalized cousins: people with common ancestors one or more generations back: Sib(x,y) <- Par(x,p) AND Par(y,p) AND x<>y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp) Form a dependency graph whose nodes = IDB predicates. Arc X ->Y if and only if there is a rule with X in the head and Y in the body. Cycle = recursion; no cycle = no recursion.
37. Computing IDB Relations Top-down (p2) for IDB predicate p(x,y, …) FOR EACH subgoal of p DO IF subgoal is IDB, recursive call; IF subgoal is EDB, look up The recursion eventually terminates unless: A distinguished variable does not appear in a subgoal only appears in a negated subgoal only appears in an arithmetic subgoal Same 3 conditions for variables in an arithmetic subgoal Same 3 conditions for variables in a negated subgoal
38. Safe Rules A rule is safe if: Each distinguished variable, Each variable in an arithmetic subgoal, and Each variable in a negated subgoal, also appears in a nonnegated, relational subgoal. Safe rules prevent infinite results.
39. Evaluating Datalog Programs As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated. If an IDB predicate has more than one rule, each rule contributes tuples to its relation.
40. Expressive Power of Datalog Without recursion, Datalog can express all and only the queries of core relational algebra. The same as SQL select-from-where, without aggregation and grouping. But with recursion, Datalog can express more than these languages. Yet still not Turing-complete.
41. SQL Rule Definitions & Usage Definition of Datalog Rules: WITH [RECURSIVE] <RuleName> (<arguments>) AS <query>; Invocation of Datalog Rules: <SQL query about EDB, IDB>
42. SQL Recursion Example (p1) Find Sally’s cousins Using Recursive definition introduced earlier Par (child, parent) is the EDB Expected SQL Query SELECT y FROM Cousin WHERE x = ‘Sally’; But first, we need to define the IDB Cousin
43. SQL Recursion Example (p2) WITH Clause (non-recursive) WITH Sib(x, y) AS FROM Par p1, Par p2 WHERE p1.parent = p2.parent AND p1.child <> p2.child; WITH Clause (recursive) RECURSIVE Cousin(x, y) AS (SELECT * FROM Sib) UNION (SELECT p1.child, p2.child FROM Par p1, Par p2, Cousin WHERE p1.parent = Cousin.x AND p2.parent = Cousin.y);
44. Next meeting January 31 Sections 7.1 – 7.3 Sections 8.1, 8.3 – 8.4 Discussion of presentationtopic proposals