SlideShare a Scribd company logo
1 of 20
CS 542 Database Management Systems Controlling Database Integrity and Performance J Singh  January 31, 2011
Today’s Topics Database Integrity Primary Key Constraints – Prevent Duplicates Foreign Key Constraints – Prevent Dangling References Attribute Constraints – Prevent Inconsistent Attribute Values TupleConstraints – More vigilant checking of attribute values Assertions – Paranoid integrity checking Views Performance Topics Indexes Discussion of presentation topic proposals
Primary Key Constraints What are Primary Keys good for? Uniquely identify the subject of each tuple Ensure that there are no duplicates Cannot be null – that would imply a NULL subject. A table may not have more than one primary key A Primary Key may consist of one or more columns Multiple Unique keys are OK For Table R, <P1, P2, …, Pm> together constitute a primary key if for each tuple in R, <P1, P2, …, Pm> are unique P1, P2, …, Pm are non-null <U1, U2, …, Um> together constitute a unique key if for each tuple in R, <U1, U2, …, Um> are unique But U1, U2, …, Umcan be null
Foreign Key Constraints (p1) Main Idea: Prevent Dangling Tuples Foreign Key Key Reference Foreign Key Must point to a Key Reference CREATE TABLE City (   :: CountryCode char(3)   REFERENCES Country(Code) ) Key Reference Must be unique or primary key Try: INSERT INTO city (Name, CountryCode) value ('xyzzy', 'XYZ'); Try: UPDATE city set CountryCode='XYZ' where CountryCode='FIN'; Key reference must already exist before a referencing tuple can be added
Foreign Key Constraints (p2) Alternative methods of defining a foreign key CREATE TABLE City ( CountryCode char(3) REFERENCES COUNTRY(Code), …) CREATE TABLE City ( CountryCode char(3), …,      FOREIGN KEY CountryCode        [CONSTRAINT [ctyREFcntry]] REFERENCES COUNTRY(Code)) CREATE TABLE City ( CountryCode char(3), …)     Then, later,      ALTER TABLE City ADD [CONSTRAINT [ctyREFcntry]]         FOREIGN KEY CountryCode REFERENCES COUNTRY(Code); Notation: [] signifies optional
Foreign Key Constraints (p3) Foreign Key Key Reference Referential Integrity Options Restrict (default) Reject request Cascade Reflect changes back Set Null Set the foreign key to NULL Changes to Key References Try: DELETE FROM country        WHERE code=‘FIN’; Try: UPDATE country        SET Code='XYZ'         WHERE Code='FIN‘;
Foreign Key Constraints (p4) Chicken and Egg definitions CREATE TABLE chicken ( cID INT PRIMARY KEY,  eID INT      REFERENCES egg(eID)); CREATE TABLE egg( eID INT PRIMARY KEY, cID INT   REFERENCES chicken(cID)); Consistently fails Can’t define a foreign key to a table before it has been defined Solution Define the tables w/o constraints CREATE TABLE chicken( cID INT PRIMARY KEY, eID INT);  CREATE TABLE egg( eID INT PRIMARY KEY, cID INT); And then add foreign keys ALTER TABLE chicken    ADD CONSTRAINT c_e     FOREIGN KEY (eID)     REFERENCES egg(eID); ALTER TABLE egg    ADD CONSTRAINT e_c     FOREIGN KEY (cID)     REFERENCES chicken(cID);
Foreign Key Constraints (p5) Chicken and Egg insertion INSERT INTO chicken   VALUES(1, 1001); INSERT INTO egg    VALUES(1001, 1); Still consistently fails Need a way to postpone constraint checking How long to postpone? Until transaction commit   Solution Define the tables with deferred constraint-checking ALTER TABLE chicken   ADD CONSTRAINT c_e     FOREIGN KEY (eID)      REFERENCES egg(eID)   INITIALLY DEFERRED DEFERRABLE; ALTER TABLE egg    ADD CONSTRAINT e_c     FOREIGN KEY (cID)     REFERENCES chicken(cID)   INITIALLY DEFERRED DEFERRABLE; And then INSERT INTO chicken VALUES(1, 1001); INSERT INTO egg VALUES(1001, 1); COMMIT;
Attribute-Based Constraints NOT NULL The most common Reasonability Constraints Validate incoming data? e.g., Population Density < 30000 Specification: Population INT(11) NOT NULL  CHECK (Population <= 30000 * SurfaceArea), The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can take Including subqueries The attribute constraint is checked when assigned Can be violated underneath as long as it is not re-evaluated For example, if we update SurfaceArea, the violation won’t be flagged Not implemented in all databases, e.g., MySQL
Tuple-Based Constraints Validate the entire tuple whenever anything in that tuple is updated More integrity enforcement than with attribute-based constraints e.g., Population Density <= 30000 Specification: Population INT(11) NOT NULL, CHECK (Population <= 30000 * SurfaceArea), The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can take Including subqueries The attribute constraint is checked when tuple is updated If we update SurfaceArea, the violation will be flagged But the violation of CHECK (Population > (       SELECT SUM(Population)          FROM City WHERE City.CountryCode = Code)) which specifies a subquery involving another table, will not be flagged Not implemented in all databases, e.g., MySQL
Assertions Validate the entire database whenever anything in the database is updated Part of the database, not any specific table Specification: Table-like CREATE ASSERTION CountryPop CHECK (  NOT EXISTS     (SELECT * FROM Country      WHERE Population <         (SELECT SUM(Population)         FROM City WHERE City.CountryCode = Code))) Difficult to implement efficiently Often not implemented I don’t know of any implementations Can be implemented for specific cases using Triggers, see Section 7.5
Views Also called Virtual Views Don’t actually exist in the database but behave as if they do Can be subsets of the data or joins – actually, arbitrary queries Subset example, CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS cntry FROM city c WHERE population > 0 Join example CREATE VIEW CityLanguage as   SELECT city.name, city.countrycode, lang.languageas Language   FROM city, countrylanguage as lang  WHERE city.countrycode = lang.countrycode  AND lang.isOfficial= ‘T‘;
Operations on Views (p1) SELECT    SELECT * FROM CityLanguage WHERE Language='Dutch'; Shouldn’t ‘temporarily’ create the table and SELECT from it. Should use the definition of CityLanguage to make a query, i.e., SELECT *      FROM        (SELECT …blabla…      FROM city, countrylanguage as lang      WHERE city.countrycode = lang.countrycode      AND lang.isOfficial = 'T')      WHERE Language='Dutch';
Operations on Views (p2) UPDATE, INSERT not always possible, except Can sometimes be implemented using INSTEAD OF triggers Modifications are permitted when the view is derived from a single table R and The WHERE clause does not involve R in a Subquery The FROM clause can only consist of one occurrence of R The valued of all attributes not specified in the view definition can be ‘manufactured’ by the database Example. For the view ct CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycodeAS cntry FROM city c WHERE population > 0      the query INSERT INTO ct (nm, cntry) values ('FirSPA', 'FIN')       can be automatically rewritten as  INSERT INTO CITY (Name, CountryCode) values ('FirSPA', 'FIN')
Top-Down Datalog Recursion Revisited IDB’s are conceptualized (and implemented) as Views for IDB predicate p(x,y, …) 	FOR EACH subgoal of p DO 	  IF subgoal is IDB, recursive call; 	  IF subgoal is EDB, look up
Indexes Main Idea: Data Structures for Fast Search Motivation: Preventing the need for linear search through a big table Example query:  SELECT * FROM City WHERE CountryCode = 'FIN'; Another:   SELECT * FROM City    WHERE Population > (0.4 * (     SELECT Population FROM Country      WHERE CountryCode= Code)); Expected time for first example: O(n). For the second, O(n2) Declaration CREATE INDEX CityIndex ON City(CountryCode); CREATE INDEX CityPopIndex ON City(Population); CREATE INDEX CountryPopIndex ON Country(Population);
Selection of Indexes (p1) Why not create an index for every attribute? Useful indexes, and not so useful ones Primary key? Unique key? From previous examples,  CityIndex? CityPopIndex? CountryPopIndex?
Selection of Indexes (p2) The Mantra: Don’t define indexes too early: know your workload first Be as empirical as is practical The Greedy approach to index selection: Start with no indexes Evaluate candidate indexes, choose the one potentially most effective Repeat Query execution will take advantage of defined indexes
CS 542 Database Management Systems Report Proposals J Singh  January 31, 2011
Next meeting February 7 Index Structures, Chapter 14

More Related Content

What's hot

Climbing the Abstract Syntax Tree (CodeiD PHP Odessa 2017)
Climbing the Abstract Syntax Tree (CodeiD PHP Odessa 2017)Climbing the Abstract Syntax Tree (CodeiD PHP Odessa 2017)
Climbing the Abstract Syntax Tree (CodeiD PHP Odessa 2017)James Titcumb
 
Climbing the Abstract Syntax Tree (Midwest PHP 2020)
Climbing the Abstract Syntax Tree (Midwest PHP 2020)Climbing the Abstract Syntax Tree (Midwest PHP 2020)
Climbing the Abstract Syntax Tree (Midwest PHP 2020)James Titcumb
 
I18n with PHP 5.3
I18n with PHP 5.3I18n with PHP 5.3
I18n with PHP 5.3ZendCon
 
Climbing the Abstract Syntax Tree (PHP Developer Days Dresden 2018)
Climbing the Abstract Syntax Tree (PHP Developer Days Dresden 2018)Climbing the Abstract Syntax Tree (PHP Developer Days Dresden 2018)
Climbing the Abstract Syntax Tree (PHP Developer Days Dresden 2018)James Titcumb
 
20190330 immutable data
20190330 immutable data20190330 immutable data
20190330 immutable dataChiwon Song
 
Creating own language made easy
Creating own language made easyCreating own language made easy
Creating own language made easyIngvar Stepanyan
 
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)James Titcumb
 
Climbing the Abstract Syntax Tree (php[world] 2019)
Climbing the Abstract Syntax Tree (php[world] 2019)Climbing the Abstract Syntax Tree (php[world] 2019)
Climbing the Abstract Syntax Tree (php[world] 2019)James Titcumb
 
Interpret this... (PHPem 2016)
Interpret this... (PHPem 2016)Interpret this... (PHPem 2016)
Interpret this... (PHPem 2016)James Titcumb
 
Creating a compiler in Perl 6
Creating a compiler in Perl 6Creating a compiler in Perl 6
Creating a compiler in Perl 6Andrew Shitov
 
Climbing the Abstract Syntax Tree (phpDay 2017)
Climbing the Abstract Syntax Tree (phpDay 2017)Climbing the Abstract Syntax Tree (phpDay 2017)
Climbing the Abstract Syntax Tree (phpDay 2017)James Titcumb
 
How to write code you won't hate tomorrow
How to write code you won't hate tomorrowHow to write code you won't hate tomorrow
How to write code you won't hate tomorrowPete McFarlane
 
Building and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CBuilding and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CDavid Wheeler
 
Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)
Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)
Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)James Titcumb
 
Diving into HHVM Extensions (php[tek] 2016)
Diving into HHVM Extensions (php[tek] 2016)Diving into HHVM Extensions (php[tek] 2016)
Diving into HHVM Extensions (php[tek] 2016)James Titcumb
 
Perl Xpath Lightning Talk
Perl Xpath Lightning TalkPerl Xpath Lightning Talk
Perl Xpath Lightning Talkddn123456
 
Functional Structures in PHP
Functional Structures in PHPFunctional Structures in PHP
Functional Structures in PHPMarcello Duarte
 
Perl 6 for Concurrency and Parallel Computing
Perl 6 for Concurrency and Parallel ComputingPerl 6 for Concurrency and Parallel Computing
Perl 6 for Concurrency and Parallel ComputingAndrew Shitov
 

What's hot (20)

Climbing the Abstract Syntax Tree (CodeiD PHP Odessa 2017)
Climbing the Abstract Syntax Tree (CodeiD PHP Odessa 2017)Climbing the Abstract Syntax Tree (CodeiD PHP Odessa 2017)
Climbing the Abstract Syntax Tree (CodeiD PHP Odessa 2017)
 
Climbing the Abstract Syntax Tree (Midwest PHP 2020)
Climbing the Abstract Syntax Tree (Midwest PHP 2020)Climbing the Abstract Syntax Tree (Midwest PHP 2020)
Climbing the Abstract Syntax Tree (Midwest PHP 2020)
 
I18n with PHP 5.3
I18n with PHP 5.3I18n with PHP 5.3
I18n with PHP 5.3
 
Climbing the Abstract Syntax Tree (PHP Developer Days Dresden 2018)
Climbing the Abstract Syntax Tree (PHP Developer Days Dresden 2018)Climbing the Abstract Syntax Tree (PHP Developer Days Dresden 2018)
Climbing the Abstract Syntax Tree (PHP Developer Days Dresden 2018)
 
20190330 immutable data
20190330 immutable data20190330 immutable data
20190330 immutable data
 
Creating own language made easy
Creating own language made easyCreating own language made easy
Creating own language made easy
 
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
 
Climbing the Abstract Syntax Tree (php[world] 2019)
Climbing the Abstract Syntax Tree (php[world] 2019)Climbing the Abstract Syntax Tree (php[world] 2019)
Climbing the Abstract Syntax Tree (php[world] 2019)
 
Interpret this... (PHPem 2016)
Interpret this... (PHPem 2016)Interpret this... (PHPem 2016)
Interpret this... (PHPem 2016)
 
Creating a compiler in Perl 6
Creating a compiler in Perl 6Creating a compiler in Perl 6
Creating a compiler in Perl 6
 
Climbing the Abstract Syntax Tree (phpDay 2017)
Climbing the Abstract Syntax Tree (phpDay 2017)Climbing the Abstract Syntax Tree (phpDay 2017)
Climbing the Abstract Syntax Tree (phpDay 2017)
 
How to write code you won't hate tomorrow
How to write code you won't hate tomorrowHow to write code you won't hate tomorrow
How to write code you won't hate tomorrow
 
Building and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CBuilding and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning C
 
Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)
Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)
Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)
 
Diving into HHVM Extensions (php[tek] 2016)
Diving into HHVM Extensions (php[tek] 2016)Diving into HHVM Extensions (php[tek] 2016)
Diving into HHVM Extensions (php[tek] 2016)
 
Perl6 one-liners
Perl6 one-linersPerl6 one-liners
Perl6 one-liners
 
Perl Xpath Lightning Talk
Perl Xpath Lightning TalkPerl Xpath Lightning Talk
Perl Xpath Lightning Talk
 
Wakanday JS201 Best Practices
Wakanday JS201 Best PracticesWakanday JS201 Best Practices
Wakanday JS201 Best Practices
 
Functional Structures in PHP
Functional Structures in PHPFunctional Structures in PHP
Functional Structures in PHP
 
Perl 6 for Concurrency and Parallel Computing
Perl 6 for Concurrency and Parallel ComputingPerl 6 for Concurrency and Parallel Computing
Perl 6 for Concurrency and Parallel Computing
 

Similar to CS 542 Database Management Systems Controlling Database Integrity and Performance

CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index StructuresJ Singh
 
Writeable ct es_pgcon_may_2011
Writeable ct es_pgcon_may_2011Writeable ct es_pgcon_may_2011
Writeable ct es_pgcon_may_2011David Fetter
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processingJ Singh
 
DConf 2016 std.database (a proposed interface & implementation)
DConf 2016 std.database (a proposed interface & implementation)DConf 2016 std.database (a proposed interface & implementation)
DConf 2016 std.database (a proposed interface & implementation)cruisercoder
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?alexbrasetvik
 
Old Oracle Versions
Old Oracle VersionsOld Oracle Versions
Old Oracle VersionsJeffrey Kemp
 
Functional Principles for OO Developers
Functional Principles for OO DevelopersFunctional Principles for OO Developers
Functional Principles for OO Developersjessitron
 
Kotlin for Android Developers
Kotlin for Android DevelopersKotlin for Android Developers
Kotlin for Android DevelopersHassan Abid
 
Language Integrated Query By Nyros Developer
Language Integrated Query By Nyros DeveloperLanguage Integrated Query By Nyros Developer
Language Integrated Query By Nyros DeveloperNyros Technologies
 
Embedded Typesafe Domain Specific Languages for Java
Embedded Typesafe Domain Specific Languages for JavaEmbedded Typesafe Domain Specific Languages for Java
Embedded Typesafe Domain Specific Languages for JavaJevgeni Kabanov
 
Addmi 10.5-basic query-language
Addmi 10.5-basic query-languageAddmi 10.5-basic query-language
Addmi 10.5-basic query-languageodanyboy
 

Similar to CS 542 Database Management Systems Controlling Database Integrity and Performance (20)

CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index Structures
 
PostThis
PostThisPostThis
PostThis
 
Writeable ct es_pgcon_may_2011
Writeable ct es_pgcon_may_2011Writeable ct es_pgcon_may_2011
Writeable ct es_pgcon_may_2011
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processing
 
Linq intro
Linq introLinq intro
Linq intro
 
DConf 2016 std.database (a proposed interface & implementation)
DConf 2016 std.database (a proposed interface & implementation)DConf 2016 std.database (a proposed interface & implementation)
DConf 2016 std.database (a proposed interface & implementation)
 
Sql
SqlSql
Sql
 
SQL
SQLSQL
SQL
 
Writeable CTEs: The Next Big Thing
Writeable CTEs: The Next Big ThingWriteable CTEs: The Next Big Thing
Writeable CTEs: The Next Big Thing
 
Antlr V3
Antlr V3Antlr V3
Antlr V3
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Old Oracle Versions
Old Oracle VersionsOld Oracle Versions
Old Oracle Versions
 
PDBC
PDBCPDBC
PDBC
 
Functional Principles for OO Developers
Functional Principles for OO DevelopersFunctional Principles for OO Developers
Functional Principles for OO Developers
 
Kotlin for Android Developers
Kotlin for Android DevelopersKotlin for Android Developers
Kotlin for Android Developers
 
Language Integrated Query By Nyros Developer
Language Integrated Query By Nyros DeveloperLanguage Integrated Query By Nyros Developer
Language Integrated Query By Nyros Developer
 
SQL -PHP Tutorial
SQL -PHP TutorialSQL -PHP Tutorial
SQL -PHP Tutorial
 
Embedded Typesafe Domain Specific Languages for Java
Embedded Typesafe Domain Specific Languages for JavaEmbedded Typesafe Domain Specific Languages for Java
Embedded Typesafe Domain Specific Languages for Java
 
Addmi 10.5-basic query-language
Addmi 10.5-basic query-languageAddmi 10.5-basic query-language
Addmi 10.5-basic query-language
 
Sql 2006
Sql 2006Sql 2006
Sql 2006
 

More from J Singh

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashingJ Singh
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big dataJ Singh
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 updateJ Singh
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engineJ Singh
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)J Singh
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsJ Singh
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceJ Singh
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceJ Singh
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data AnalysisJ Singh
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitJ Singh
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlJ Singh
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query OptimizationJ Singh
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
CS 542 Introduction
CS 542 IntroductionCS 542 Introduction
CS 542 IntroductionJ Singh
 
Cloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's ViewpointCloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's ViewpointJ Singh
 

More from J Singh (20)

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big data
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 update
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engine
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and Tradeoffs
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map Reduce
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data Analysis
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
CS 542 Introduction
CS 542 IntroductionCS 542 Introduction
CS 542 Introduction
 
Cloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's ViewpointCloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's Viewpoint
 

CS 542 Database Management Systems Controlling Database Integrity and Performance

  • 1. CS 542 Database Management Systems Controlling Database Integrity and Performance J Singh January 31, 2011
  • 2. Today’s Topics Database Integrity Primary Key Constraints – Prevent Duplicates Foreign Key Constraints – Prevent Dangling References Attribute Constraints – Prevent Inconsistent Attribute Values TupleConstraints – More vigilant checking of attribute values Assertions – Paranoid integrity checking Views Performance Topics Indexes Discussion of presentation topic proposals
  • 3. Primary Key Constraints What are Primary Keys good for? Uniquely identify the subject of each tuple Ensure that there are no duplicates Cannot be null – that would imply a NULL subject. A table may not have more than one primary key A Primary Key may consist of one or more columns Multiple Unique keys are OK For Table R, <P1, P2, …, Pm> together constitute a primary key if for each tuple in R, <P1, P2, …, Pm> are unique P1, P2, …, Pm are non-null <U1, U2, …, Um> together constitute a unique key if for each tuple in R, <U1, U2, …, Um> are unique But U1, U2, …, Umcan be null
  • 4. Foreign Key Constraints (p1) Main Idea: Prevent Dangling Tuples Foreign Key Key Reference Foreign Key Must point to a Key Reference CREATE TABLE City ( :: CountryCode char(3) REFERENCES Country(Code) ) Key Reference Must be unique or primary key Try: INSERT INTO city (Name, CountryCode) value ('xyzzy', 'XYZ'); Try: UPDATE city set CountryCode='XYZ' where CountryCode='FIN'; Key reference must already exist before a referencing tuple can be added
  • 5. Foreign Key Constraints (p2) Alternative methods of defining a foreign key CREATE TABLE City ( CountryCode char(3) REFERENCES COUNTRY(Code), …) CREATE TABLE City ( CountryCode char(3), …, FOREIGN KEY CountryCode [CONSTRAINT [ctyREFcntry]] REFERENCES COUNTRY(Code)) CREATE TABLE City ( CountryCode char(3), …) Then, later, ALTER TABLE City ADD [CONSTRAINT [ctyREFcntry]] FOREIGN KEY CountryCode REFERENCES COUNTRY(Code); Notation: [] signifies optional
  • 6. Foreign Key Constraints (p3) Foreign Key Key Reference Referential Integrity Options Restrict (default) Reject request Cascade Reflect changes back Set Null Set the foreign key to NULL Changes to Key References Try: DELETE FROM country WHERE code=‘FIN’; Try: UPDATE country SET Code='XYZ' WHERE Code='FIN‘;
  • 7. Foreign Key Constraints (p4) Chicken and Egg definitions CREATE TABLE chicken ( cID INT PRIMARY KEY, eID INT REFERENCES egg(eID)); CREATE TABLE egg( eID INT PRIMARY KEY, cID INT REFERENCES chicken(cID)); Consistently fails Can’t define a foreign key to a table before it has been defined Solution Define the tables w/o constraints CREATE TABLE chicken( cID INT PRIMARY KEY, eID INT); CREATE TABLE egg( eID INT PRIMARY KEY, cID INT); And then add foreign keys ALTER TABLE chicken ADD CONSTRAINT c_e FOREIGN KEY (eID) REFERENCES egg(eID); ALTER TABLE egg ADD CONSTRAINT e_c FOREIGN KEY (cID) REFERENCES chicken(cID);
  • 8. Foreign Key Constraints (p5) Chicken and Egg insertion INSERT INTO chicken VALUES(1, 1001); INSERT INTO egg VALUES(1001, 1); Still consistently fails Need a way to postpone constraint checking How long to postpone? Until transaction commit Solution Define the tables with deferred constraint-checking ALTER TABLE chicken ADD CONSTRAINT c_e FOREIGN KEY (eID) REFERENCES egg(eID) INITIALLY DEFERRED DEFERRABLE; ALTER TABLE egg ADD CONSTRAINT e_c FOREIGN KEY (cID) REFERENCES chicken(cID) INITIALLY DEFERRED DEFERRABLE; And then INSERT INTO chicken VALUES(1, 1001); INSERT INTO egg VALUES(1001, 1); COMMIT;
  • 9. Attribute-Based Constraints NOT NULL The most common Reasonability Constraints Validate incoming data? e.g., Population Density < 30000 Specification: Population INT(11) NOT NULL CHECK (Population <= 30000 * SurfaceArea), The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can take Including subqueries The attribute constraint is checked when assigned Can be violated underneath as long as it is not re-evaluated For example, if we update SurfaceArea, the violation won’t be flagged Not implemented in all databases, e.g., MySQL
  • 10. Tuple-Based Constraints Validate the entire tuple whenever anything in that tuple is updated More integrity enforcement than with attribute-based constraints e.g., Population Density <= 30000 Specification: Population INT(11) NOT NULL, CHECK (Population <= 30000 * SurfaceArea), The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can take Including subqueries The attribute constraint is checked when tuple is updated If we update SurfaceArea, the violation will be flagged But the violation of CHECK (Population > ( SELECT SUM(Population) FROM City WHERE City.CountryCode = Code)) which specifies a subquery involving another table, will not be flagged Not implemented in all databases, e.g., MySQL
  • 11. Assertions Validate the entire database whenever anything in the database is updated Part of the database, not any specific table Specification: Table-like CREATE ASSERTION CountryPop CHECK ( NOT EXISTS (SELECT * FROM Country WHERE Population < (SELECT SUM(Population) FROM City WHERE City.CountryCode = Code))) Difficult to implement efficiently Often not implemented I don’t know of any implementations Can be implemented for specific cases using Triggers, see Section 7.5
  • 12. Views Also called Virtual Views Don’t actually exist in the database but behave as if they do Can be subsets of the data or joins – actually, arbitrary queries Subset example, CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS cntry FROM city c WHERE population > 0 Join example CREATE VIEW CityLanguage as SELECT city.name, city.countrycode, lang.languageas Language FROM city, countrylanguage as lang WHERE city.countrycode = lang.countrycode AND lang.isOfficial= ‘T‘;
  • 13. Operations on Views (p1) SELECT SELECT * FROM CityLanguage WHERE Language='Dutch'; Shouldn’t ‘temporarily’ create the table and SELECT from it. Should use the definition of CityLanguage to make a query, i.e., SELECT * FROM (SELECT …blabla… FROM city, countrylanguage as lang WHERE city.countrycode = lang.countrycode AND lang.isOfficial = 'T') WHERE Language='Dutch';
  • 14. Operations on Views (p2) UPDATE, INSERT not always possible, except Can sometimes be implemented using INSTEAD OF triggers Modifications are permitted when the view is derived from a single table R and The WHERE clause does not involve R in a Subquery The FROM clause can only consist of one occurrence of R The valued of all attributes not specified in the view definition can be ‘manufactured’ by the database Example. For the view ct CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycodeAS cntry FROM city c WHERE population > 0 the query INSERT INTO ct (nm, cntry) values ('FirSPA', 'FIN') can be automatically rewritten as INSERT INTO CITY (Name, CountryCode) values ('FirSPA', 'FIN')
  • 15. Top-Down Datalog Recursion Revisited IDB’s are conceptualized (and implemented) as Views for IDB predicate p(x,y, …) FOR EACH subgoal of p DO IF subgoal is IDB, recursive call; IF subgoal is EDB, look up
  • 16. Indexes Main Idea: Data Structures for Fast Search Motivation: Preventing the need for linear search through a big table Example query: SELECT * FROM City WHERE CountryCode = 'FIN'; Another: SELECT * FROM City WHERE Population > (0.4 * ( SELECT Population FROM Country WHERE CountryCode= Code)); Expected time for first example: O(n). For the second, O(n2) Declaration CREATE INDEX CityIndex ON City(CountryCode); CREATE INDEX CityPopIndex ON City(Population); CREATE INDEX CountryPopIndex ON Country(Population);
  • 17. Selection of Indexes (p1) Why not create an index for every attribute? Useful indexes, and not so useful ones Primary key? Unique key? From previous examples, CityIndex? CityPopIndex? CountryPopIndex?
  • 18. Selection of Indexes (p2) The Mantra: Don’t define indexes too early: know your workload first Be as empirical as is practical The Greedy approach to index selection: Start with no indexes Evaluate candidate indexes, choose the one potentially most effective Repeat Query execution will take advantage of defined indexes
  • 19. CS 542 Database Management Systems Report Proposals J Singh January 31, 2011
  • 20. Next meeting February 7 Index Structures, Chapter 14