Mais conteúdo relacionado

Apresentações para você(20)

Revision

  1. ADVANCED DATABASES REVISION 1
  2. REVISION TACTICS • Watch the videos from emails and moodle • Take notes • Follow these slides • Visit the web resources • Learn the keywords and concepts • Learn SQL • Use Cmap tools to link concepts /t erms • Revisit your patchwork
  3. THE BUILDING BLOCKS TERMS AND CONCEPTS YOU SHOULD KNOW BY NOW… • XML • NoSQL • Graph • ODBC • Relational Database • MySQL • SQL • Linked Data • RDF • Trigger • Database Index
  4. REVISION TOOL: CMAP TOOLS • http://cmap.ihmc.us/download/ • Tool for creating concept maps
  5. TOPICS • Relational Databases (MySQL) • SQL • Triggers • Transactions • Webservices • XML • NoSQL / Alternative Database systems
  6. RELATIONAL DATABASES MYSQL
  7. RELATIONAL DATABASES ‘FORMALLY DESCRIBED TABLES’ • This module focused on MySQL: an Open source implementation of a relational database • Oracle, PostgreSQL, SQLite • Most patchworks should be done in MySQL (Triggers, indexs) • ODBC Component • Looked at alternatives: NoSQL (Not Only SQL), Graph Database, triplestore
  8. RELATIONAL DATABASES: SQL (STRUCTURED QUERY LANGUAGE) • Language to manage data in relational management systems • Should be examples in your patchwork CREATE TABLE example_autoincrement ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, data VARCHAR(100) );
  9. RELATIONAL DATABASES TRANSACTIONS
  10. TRANSACTIONS ‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’ • A transaction is a unit of work • Treated independently of each other
  11. TRANSACTIONS ‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’ • In a relational database each transaction must have ACID properties • Proposed in 1970s • Key idea in relational databases • Atomicity • Consistency • Isolation • Durability • A transaction need to reach these 4 goals to be reliable
  12. TRANSACTIONS ‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’ • Atomicity • All or Nothing • both pay for and reserve a seat; OR neither pay for nor reserve a seat. • Consistency • Only ever writes valid data • Isolation • Transactions will not interfere with each other • Durability • Once a transaction is complete it will always remain. Even in the event of a powerloss
  13. TRANSACTIONS ‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’ • Sometimes we can’t use ACID • CAP THEORY • A theory by Eric Brewer in 2000 It is only possible to have 2 of the following in a distributed computer system • Consistency • Availability • Partition Tolerance
  14. TRANSACTIONS ‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’ • Consistency All the nodes in the distributed system have the same system • Availability A guarantee that every requests get a response (even if it fails) • Partition tolerance If a node fails then the whole system will continue to operate
  15. TRANSACTIONS ‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’ • So what do large companies/distributed computer systems do? • Use alternatives to ACID • Most popular alternative to ACID is BASE • Basic Availability • Soft State • Eventual Consistency For when it’s OK to use stale data, and it’s OK to give approximate answers. http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
  16. TRANSACTIONS ‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’ • Basically Available • Availability is achieved through multiple data stores rather than one fault tolerant system • Soft state • Consistency is abandoned, or at least is the problem of the application and not the database • Eventual Consistency • At some point in the future data will converge so that data on nodes is in a consistent state
  17. RELATIONAL DATABASES: ODBC OPEN DATABASE CONNECTIVITY
  18. RELATIONAL DATABASES: ODBC OPEN DATABASE CONNECTIVITY • Standard database access method • SQL Access group • Independent of database system http://shivasoft.in/blog/microsoft/csharp/what-is-odbc-and-oledb-interview- question/
  19. RELATIONAL DATABASES: TRIGGERS • SQL statement or SET of statements fired when an event occurs. (for example INSERT, UPDATE and DELETE) CREATE TRIGGER `event_name` BEFORE/AFTER INSERT/UPDATE/DELETE ON `database`.`table` FOR EACH ROW BEGIN -- trigger body -- this code is applied to every -- inserted/updated/deleted row END; http://www.sitepoint.com/how-to-create-mysql-triggers/
  20. DATABASE INDEX • improves the speed of data retrieval operations • Stops searching through each row one by one • Created on columns • Most Common • B-tree (MySQL default?) • Hash Really good -> http://20bits.com/article/interview-questions-database-indexes http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
  21. B TREE INDEXING • B-Tree • Stores data in logical way • We want people younger than 13.. Look left
  22. INDEXS • Hash Tables • Speeds up = or <=> • Not > or < B-tree vs Hash Tables http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
  23. WEBSERVICE
  24. WEBSERVICE • A way to communicate between systems (machine to machine interaction) • Service Provider • Service Requester
  25. WEB SERVICES • 3 types of nodes • Registries (Service Broker) • Providers • Requesters
  26. XML • XML: • EXtensible Markup Language • Designed to store and transport data • (whereas html was designed to display data) http://www.w3schools.com/xml/xml_whatis.asp
  27. WEB SERVICES ADVANTAGES • Advantages • Work outside of private networks • Interoperability • Could be the content processing/logic module in Three-tier architecture?
  28. WEB SERVICES DISADVANTAGES • Availability? • Based in a stateless (unreliable?) protocol :http • Security?
  29. NOSQL
  30. NOSQL • Not Only SQL • Databases that are not like relational database management systems • Not built around the idea of tables • Not likely to use SQL • Usually built around BASE style principles (not ACID) • Examples : Graph Databases
  31. GRAPH DATABASE • Every Element has a pointer to another element
  32. TRIPLE STORE • Similar to Graph Data • Built to store and retrieve triples (David eats chocolate bars, Mars is a chocolate bar, etc etc) • Data is stored in a standardized way (such as RDF/XML) • Has a querying service (sparql)
  33. LINKED DATA • Method of publishing structured data • Different datasets can be interlinked • Built on the following technologies • URI’s • HTTP • Structured formats RDF/XML • Sometimes this data is stored in triplestores • Served by website (content negotiation) • Like prod.cetis.ac.uk • Could have a relational database behind it • Example: dbpedia
  34. LINKED DATA • Linked Data is made up of triples! • Subject, predicate object • David -> eats -> cake • David (Subject) Eats (Predicate) Cake
  35. DATA JOURNALISM • Explosion of visual analytic tools • Gephi • Visualise a network/graph • Visually Identify complex patterns / markets