REVISION TACTICS
• Watch the videos from emails and moodle
• Take notes
• Follow these slides
• Visit the web resources
• Learn the keywords and concepts
• Learn SQL
• Use Cmap tools to link concepts /t erms
• Revisit your patchwork
THE BUILDING BLOCKS
TERMS AND CONCEPTS YOU SHOULD KNOW BY NOW…
• XML • NoSQL
• Graph • ODBC
• Relational Database • MySQL
• SQL
• Linked Data
• RDF
• Trigger
• Database Index
RELATIONAL DATABASES
‘FORMALLY DESCRIBED TABLES’
• This module focused on MySQL: an Open source
implementation of a relational database
• Oracle, PostgreSQL, SQLite
• Most patchworks should be done in MySQL
(Triggers, indexs)
• ODBC Component
• Looked at alternatives: NoSQL (Not Only
SQL), Graph Database, triplestore
RELATIONAL DATABASES: SQL
(STRUCTURED QUERY LANGUAGE)
• Language to manage data in relational
management systems
• Should be examples in your patchwork
CREATE TABLE example_autoincrement (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, data VARCHAR(100)
);
TRANSACTIONS
‘UNIT OF WORK PERFORMED WITHIN A DATABASE
MANAGEMENT SYSTEM’
• A transaction is a unit of work
• Treated independently of each other
TRANSACTIONS
‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’
• In a relational database each transaction must have
ACID properties
• Proposed in 1970s
• Key idea in relational databases
• Atomicity
• Consistency
• Isolation
• Durability
• A transaction need to reach these 4 goals to be reliable
TRANSACTIONS
‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’
• Atomicity
• All or Nothing
• both pay for and reserve a seat; OR neither pay for nor
reserve a seat.
• Consistency
• Only ever writes valid data
• Isolation
• Transactions will not interfere with each other
• Durability
• Once a transaction is complete it will always remain. Even
in the event of a powerloss
TRANSACTIONS
‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT SYSTEM’
• Sometimes we can’t use ACID
• CAP THEORY
• A theory by Eric Brewer in 2000
It is only possible to have 2 of the following in a
distributed computer system
• Consistency
• Availability
• Partition Tolerance
TRANSACTIONS
‘UNIT OF WORK PERFORMED WITHIN A DATABASE
MANAGEMENT SYSTEM’
• Consistency
All the nodes in the distributed system have the same
system
• Availability
A guarantee that every requests get a response
(even if it fails)
• Partition tolerance
If a node fails then the whole system will continue to
operate
TRANSACTIONS
‘UNIT OF WORK PERFORMED WITHIN A DATABASE MANAGEMENT
SYSTEM’
• So what do large companies/distributed computer
systems do?
• Use alternatives to ACID
• Most popular alternative to ACID is BASE
• Basic Availability
• Soft State
• Eventual Consistency
For when it’s OK to use stale data, and it’s OK to give
approximate answers.
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
TRANSACTIONS
‘UNIT OF WORK PERFORMED WITHIN A DATABASE
MANAGEMENT SYSTEM’
• Basically Available
• Availability is achieved through multiple data stores rather
than one fault tolerant system
• Soft state
• Consistency is abandoned, or at least is the problem of the
application and not the database
• Eventual Consistency
• At some point in the future data will converge so that data
on nodes is in a consistent state
RELATIONAL DATABASES: ODBC
OPEN DATABASE CONNECTIVITY
• Standard database access method
• SQL Access group
• Independent of database system
http://shivasoft.in/blog/microsoft/csharp/what-is-odbc-and-oledb-interview-
question/
RELATIONAL DATABASES:
TRIGGERS
• SQL statement or SET of statements fired when an event
occurs. (for example INSERT, UPDATE and DELETE)
CREATE
TRIGGER `event_name` BEFORE/AFTER
INSERT/UPDATE/DELETE
ON `database`.`table`
FOR EACH ROW BEGIN
-- trigger body
-- this code is applied to every
-- inserted/updated/deleted row
END;
http://www.sitepoint.com/how-to-create-mysql-triggers/
DATABASE INDEX
• improves the speed of data retrieval operations
• Stops searching through each row one by one
• Created on columns
• Most Common
• B-tree (MySQL default?)
• Hash
Really good -> http://20bits.com/article/interview-questions-database-indexes
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
B TREE INDEXING
• B-Tree
• Stores data in logical way
• We want people younger than 13.. Look left
INDEXS
• Hash Tables
• Speeds up = or <=>
• Not > or <
B-tree vs Hash Tables
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
WEBSERVICE
• A way to communicate between systems (machine
to machine interaction)
• Service Provider
• Service Requester
WEB SERVICES
• 3 types of nodes
• Registries (Service Broker)
• Providers
• Requesters
XML
• XML:
• EXtensible Markup Language
• Designed to store and transport data
• (whereas html was designed to display data)
http://www.w3schools.com/xml/xml_whatis.asp
WEB SERVICES
ADVANTAGES
• Advantages
• Work outside of private networks
• Interoperability
• Could be the content processing/logic module in Three-tier
architecture?
WEB SERVICES
DISADVANTAGES
• Availability?
• Based in a stateless (unreliable?) protocol :http
• Security?
NOSQL
• Not Only SQL
• Databases that are not like relational database
management systems
• Not built around the idea of tables
• Not likely to use SQL
• Usually built around BASE style principles (not ACID)
• Examples : Graph Databases
TRIPLE STORE
• Similar to Graph Data
• Built to store and retrieve triples (David eats
chocolate bars, Mars is a chocolate bar, etc etc)
• Data is stored in a standardized way (such as
RDF/XML)
• Has a querying service (sparql)
LINKED DATA
• Method of publishing structured data
• Different datasets can be interlinked
• Built on the following technologies
• URI’s
• HTTP
• Structured formats RDF/XML
• Sometimes this data is stored in triplestores
• Served by website (content negotiation)
• Like prod.cetis.ac.uk
• Could have a relational database behind it
• Example: dbpedia
LINKED DATA
• Linked Data is made up of triples!
• Subject, predicate object
• David -> eats -> cake
• David (Subject) Eats (Predicate) Cake
DATA JOURNALISM
• Explosion of visual analytic tools
• Gephi
• Visualise a network/graph
• Visually Identify complex patterns / markets