Data as a Service (DaaS) is the new paradigm where data will be delivered on-demand to be consumed within platform and third-party services. As Open Data gains mainstream adoption at every level of government around the world, public sector organizations are increasingly looking to participate in data ecosystems and drive adoption of their data as fuel for innovation. In this context open data speed up economics combining not only government's open data but heterogeneous, large and rapidly changing dataset from every public sources like social networks, DBpedia (Wikipedia) and many more. The business idea is to design and implement an effective platform able to collect, aggregate and interlink data accessible via APIs in order to enable the creation of the new apps and services for customers. The main pillars of the entire construction is the ability to abstract both the data store and the data access patterns combining a big data architecture with traditional one depending on the data type and volume. Large datasets may want to live on a Hadoop or HBASE cluster, real-time data may have a Cassandra truth store and relax the transactional guarantees but all keep the same APIs.
4. 5
Public sector organizations are
increasingly looking to
participate in data ecosystems
and drive adoption of their
data as fuel for innovation
Data is everywhere
5. 6
Open data is the idea that
certain data should be freely
available to everyone to use and
republish as they wish, without
restrictions from copyright, patents
or other mechanisms of control.
In this context open data speed up
economics combining not only
government's open data but
heterogeneous, large and rapidly
changing dataset from every public
sources like social
networks, DBpedia (Wikipedia) and
many more.
Open Data as main source
6. 7
Under the UK presidency during the recent G8 Summit
(17-18 June) an Open Data Charter has been ratified
Open Data is the global drive :
• To enforce Transparency, Innovation, exchange
between pepole and countries
• To fuel better outcomes in public services such as
health, education, public safety, environmental
protection, governance, etc.
• To provide a catalyst for innovation in the private
sector, supporting the creation of new markets,
businesses, and jobs.
[2013-2015] time for planning and implementation
G8 Open Data Charter
https://www.gov.uk/government/publications/open-data-charter
7. 8
Where Open Data is
http://census.okfn.org/
https://nycopendata.socrata.com/
https://dati.lombardia.it
..and counting
9. 10
Multiple legal or regulatory restrictions on the use of the data.
Legal Restrictions, Privacy, Licenses
10. 11
Third parties offers public data as valuable services
APIs freely available under certain
usage quota
Data owner and APIs
Source: Jonhn Musser, Programmable web
11. 12
5★ Open Data
★
make PUBLIC stuff available on the Web (whatever
format, .jpeg .pdf) under an open license
★★
make it available as structured data (e.g., Excel
instead of image scan of a table)
★★★
use non-proprietary formats
(e.g., CSV instead of Excel)
★★★★
use URIs to denote things, so that people
can point at your stuff
★★★★★ link your data to other data to provide context
Tim Berners-Lee, the inventor of the Web and Linked Data
initiator, suggested a 5 star deployment scheme for Open Data.
12. 13
Recommended best practice for exposing, sharing, and connecting
pieces of data, information, and knowledge on the Semantic Web
using URIs OWL and RDF.
Linked Open Data
1. Requires Ontologies to be applied to
data
2. Allows heterogeneous Nodes to be
traversed in a semantically coherent
fashion
http://live.dbpedia.org/page/Adoration_of_the_Magi_of_1475_(Botticelli)
http://live.dbpedia.org/page/Primavera_(painting)
http://live.dbpedia.org/page/Sandro_Botticelli
15. 16
Data modeling
Identify Integrate Store Process Visualize
• Unstructured source:
Forum, Blog, Social
Network, Web Data
from which to extract
the discussions.
• Structured source:
Operational database,
CRM, SCM, ERP and
other tools from which
informations is
collected.
• Metadata ingestion:
The selected
information enriched
with metadata can
create relationships
between the authors,
websites, forums etc.
• Information
acquisition:
The information is
collected without any
structure or filtration
mechanisms with
several connectors.
• Data organization:
The metadata and
information are stored
in a distributed
environment
• KPI Generation:
The data are
elaborated to produce
KPI summary.
• Organization:
The Data is
categorized through
KPI calculations.
• Data calculation:
Through the
calculation and
statistical
instruments the data
is modeled
• Analysis of data:
Application of
statistical models
enhance the
information in terms
of quality
• Use:
The data are
aggregated to create
and summarize the
results of the
analysis.
• Display:
Trough a report
environment data are
displayed to visualize
the results
16. 17
Data as a Service (DaaS)
Develop an easy to use Platform that offers data sets management
(collect, aggregate and interlink data accessible via APIs) in order to
enable the creation of the new apps and services for customers
17. 18
Google index and search information from web, we are able to
index, collect and expose data with APIs.
Business case
Business case example
19. 20
Data source aggregations
Mashup Diverse Data Sets
After shaping a table to the
form you want, easily join it
with another to uncover the
hidden relationships
between them.
Integrate
heterogeneous data
sources
Many open data could
be the source of a
complex big data
system
Develop connectors
The connectors allow to
ingest the data into the
system
…and so forth
20. 21
Data models and technologies
Document
• Document-
Oriented
Storage
• Full Index
Support
• Replication &
High
Availability
• Auto-Sharding
• Querying
• GridFS
LinkedData
• Graph model
for data
representation
• Full ACID
transactions
• Native storage
engine
• Massively
scalable
• Multiple graph
query
language
MapReduceandHDFS
• Distributed
Files System
• JobTracker
• TaskTracker
• Log and file
stream
• Real time
analysis
• Fast access
data
• Sensors and
IOT
Relational
• Transanctional
operation
• ACID based
• Entity-
Relationship
• Legacy
system
• User
administration
21. 22
Architecture overview
STORAGE
DATA SOURCE
Open DataPrivate DataPublic Data
CRMERPDWH RDF
PROCESS and
ANALYTICS
DATA and APIs
PROVIDER
APIs
Users
API clients
APIs and DATA
CONNECTORS Spring Data
JDBC
23. 24
We may recognize few contingencies in our scenario:
• Exponential growth in data volumes
• Rise of connectedness
• Increase in degrees of semi-structure
• Structures and Schemes emerge rather than having a pre-defined
upfront
Key facts:
• Volume: the size of the stored data
• Velocity: the rate at which data changes over time
• Variety: the degree to which data is regularly or irregularly
structured, dense or sparse, and importantly connected or
disconnected
Enriching Open Linked Data
24. 25
Graph theory was pioneered by Euler in the 18th century, received
multidisciplinary contributes across centuries
Graph is an ordered pair G = (V, E) comprising set V of vertices or
nodes together with a set E of edges or lines, which are 2-element
subsets of V .
Graph Theory
One trick is to search for “graph based approach to” and your problem.
27. 28
• Facebook, Google and Twitter have centered their business models
around their own proprietary distributed graph technologies
Graph databases store information in ways that much more closely
resemble the ways the world is organized and the humans “think
about” data.
Top 10 Gartner IT technologies in 2013 “[..] are designed to support
new transaction, interaction and observation use cases involving web
scale, mobile, cloud and clustered environments”
Storing Data in Graphs
• Facebook The Association and Objects (TAO) Data Store
https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920
• Twitter FlockDB
https://github.com/twitter/flockdb
29. 30
Graph DB place relationships as first-class abstractions of the data
model
A Graph –[:RECORDS_DATA_IN] Nodes –[:WHICH_HAVE]
Properties.
Nodes –[:LINKED_BY] Relationships
From Relational to Graph based Modeling
• It contains nodes and relationships
• Nodes contain properties (key-
value pairs)
• Relationships are named, directed
and always have a start and end
node
• Relationships can also contain
properties
30. 31
Shake RDBMS while keeping all the relationships, and you’ll see a
graph
Where RDBMS are optimized for aggregated data, Graph Database
are optimized for highly connected data
From Relational to Graph based Modeling
31. 32
Graph -directed Infrastructure
DATA STORAGE
AND TRAVERSING
DATA ACCESS
AND
PROCESSING
DATA IMPORT Batch Import Neo4j
ENTERPRISE
MANAGEMENT
VISUALIZATION
API
Connector
API
Provider
32. 33
It is possible to derive queries for domain entities from finder method
names like Iterable<T>
@Indexed fields will be converted into index-lookups of the start
clause, navigation along relationships will be reflected in the match
clause properties with operators will end up as expressions in the
where clause
Spring Data Neo4j
37. 38
Open Linked Graph
Document
User
[:OWNS]
[:INCLUDES] [:INCLUDES]
[:INCLUDES]
Document
[:INCLUDES]
[:INCLUDES]
[:INCLUDES]
[:OWNS]
[:DBP_LINKED][:LOCATED]
Node
Node
Node
Node
NodeNode
[:LOCATED]
[:DBP_LINKED]
Venue
VenueDBPedia URI
DBPedia URI
[:DBP_LINKED]
[:LOCATED]
[:LOCATED]
[:DBP_LINKED]
Venue
VenueDBPedia URI
DBPedia URI
Open API