Invited Lecture on NoSQL databases and modern web-development frameworks.
JavaScript + JSON = easy parsing, less verbose code
NodeJS = asynchronous everything. Needs precise flow control
ElasticSearch = Scalable indexing, easy to use JSON API
GridFS = Transparent scaling for huge numbers of large files; querying using JSON-based API
Graph Databases = Model certain problems better than their
•
relational counterparts. Simpler queries using SPARQL. Less mature than RDBMs. No transactions.
Socket.io = Real-time library for client-server-client push communication
2. Contents
• Modeling limits of relational databases
• Entities with variable attributes
• Time-variant values
• Inheritance
• Hierarchies (parents of parents of parents…)
3. Contents (cont’d)
• Modeling problems in a graph
• Ontologies and SPARQL
• OpenLink Virtuoso
• Scalable file storage: GridFS within MongoDB
• Scalable document indexing : ElasticSearch
4. • NodeJS and asynchronous flow control
• AngularJS for dynamic web interfaces
• BONUS : Socket.io sneak peek
Contents (cont’d)
5. Relational databases
• Good when you know everything about the
problem at the time of modeling
• A column can only be of a single type (VARCHAR,
int, etc)
• Hard to document
• Model can become too attached to the code
6. Relational databases
• Handling historical values = complex SQL
• Hierarchies = Foreign Key loops
• Variable attributes, inheritance = [null + if Hell] or
many JOINs
13. Graph databases
• Represent entities (Users, Products, Places…) as
vertexes (entity types are called classes)
• Connections between them are directed graph
edges (edge types are called properties)
!
• The meaning of these connections is expressed in
ontologies that can be shared and reused
16. Getting all the students
SELECT ?uri ?attribute ?value
FROM <http://myorganization.com/data>
WHERE
{
?uri rdfs:type up:Student.
?uri ?attribute ?value
}
• Will fetch all the students, regardless of their type
• Will also return their attributes (“database columns”)
• Different types of students will have different attributes
18. Nothing comes for free
• Aggregation operators slow
• Transactions are not supported in standard
SPARQL
• (“SPARQL 1.1 Query/Update Services should be atomic but that they are
not required to be atomic.”)
• Graph DBMS Solutions are in early stages (many
bugs, many “beta”s, many mailing lists…)
24. Conclusions
• JavaScript + JSON = easy parsing, less verbose code
• NodeJS = asynchronous everything. Needs precise flow control
• ElasticSearch = Scalable indexing, easy to use JSON API
• GridFS = Transparent scaling for huge numbers of large files;
querying using JSON-based API
• Graph Databases = Model certain problems better than their
relational counterparts. Simpler queries using SPARQL. Less
mature than RDBMs. No transactions.
• Socket.io = Real-time library for client-server-client push
communication
25. João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of
Engineering of the University of Porto. He specializes on research data management,
applying the latest Semantic Web Technologies to the adequate preservation and
discovery of research data assets.
!
He is experienced in many programming languages (Javascript-Node, PHP with MVC
frameworks, Ruby on Rails, J2EE, etc etc) running on the major operating systems
(everyday Mac user). Regardless of language, he is a quick learner that can adapt to any
new technology quickly and effectively.
!
He is also an experienced freelancer iOS Developer with several Apps published on the
App Store, and a self-taught DIY mechanic with a special interest in classic cars,
particularly his 1987 Toyota Corolla GT Twin Cam, also known as Hachi-Roku or AE86.
!
Research Data Management and Semantic
Web Researcher, Web & iPhone Developer
João Rocha da Silva!
joaorosilva@gmail.com