SlideShare uma empresa Scribd logo
1 de 38
Online available data services: a primer
Silvano Galasso
Michele Piunti
3
• Where is the data?
• The way to use Data
• APP-ify the Data
• Technological perspective
• Prototype example
Agenda
Where is the Data?
5
Public sector organizations are
increasingly looking to
participate in data ecosystems
and drive adoption of their
data as fuel for innovation
Data is everywhere
6
Open data is the idea that
certain data should be freely
available to everyone to use and
republish as they wish, without
restrictions from copyright, patents
or other mechanisms of control.
In this context open data speed up
economics combining not only
government's open data but
heterogeneous, large and rapidly
changing dataset from every public
sources like social
networks, DBpedia (Wikipedia) and
many more.
Open Data as main source
7
Under the UK presidency during the recent G8 Summit
(17-18 June) an Open Data Charter has been ratified
Open Data is the global drive :
• To enforce Transparency, Innovation, exchange
between pepole and countries
• To fuel better outcomes in public services such as
health, education, public safety, environmental
protection, governance, etc.
• To provide a catalyst for innovation in the private
sector, supporting the creation of new markets,
businesses, and jobs.
[2013-2015] time for planning and implementation
G8 Open Data Charter
https://www.gov.uk/government/publications/open-data-charter
8
Where Open Data is
http://census.okfn.org/
https://nycopendata.socrata.com/
https://dati.lombardia.it
..and counting
The way to use Data
10
Multiple legal or regulatory restrictions on the use of the data.
Legal Restrictions, Privacy, Licenses
11
Third parties offers public data as valuable services
APIs freely available under certain
usage quota
Data owner and APIs
Source: Jonhn Musser, Programmable web
12
5★ Open Data
★
make PUBLIC stuff available on the Web (whatever
format, .jpeg .pdf) under an open license
★★
make it available as structured data (e.g., Excel
instead of image scan of a table)
★★★
use non-proprietary formats
(e.g., CSV instead of Excel)
★★★★
use URIs to denote things, so that people
can point at your stuff
★★★★★ link your data to other data to provide context
Tim Berners-Lee, the inventor of the Web and Linked Data
initiator, suggested a 5 star deployment scheme for Open Data.
13
Recommended best practice for exposing, sharing, and connecting
pieces of data, information, and knowledge on the Semantic Web
using URIs OWL and RDF.
Linked Open Data
1. Requires Ontologies to be applied to
data
2. Allows heterogeneous Nodes to be
traversed in a semantically coherent
fashion
http://live.dbpedia.org/page/Adoration_of_the_Magi_of_1475_(Botticelli)
http://live.dbpedia.org/page/Primavera_(painting)
http://live.dbpedia.org/page/Sandro_Botticelli
14
Open
government data
Municipal
Regional
National
Community data
Geographic
Media
Scientific
Encyclopedic
Data from third
parties
Facebook
Twitter
LinkedIn
Google
Data could be linked
Linked Open
Data
Could be linked under
certain conditions
APP-ify the Data
16
Data modeling
Identify Integrate Store Process Visualize
• Unstructured source:
Forum, Blog, Social
Network, Web Data
from which to extract
the discussions.
• Structured source:
Operational database,
CRM, SCM, ERP and
other tools from which
informations is
collected.
• Metadata ingestion:
The selected
information enriched
with metadata can
create relationships
between the authors,
websites, forums etc.
• Information
acquisition:
The information is
collected without any
structure or filtration
mechanisms with
several connectors.
• Data organization:
The metadata and
information are stored
in a distributed
environment
• KPI Generation:
The data are
elaborated to produce
KPI summary.
• Organization:
The Data is
categorized through
KPI calculations.
• Data calculation:
Through the
calculation and
statistical
instruments the data
is modeled
• Analysis of data:
Application of
statistical models
enhance the
information in terms
of quality
• Use:
The data are
aggregated to create
and summarize the
results of the
analysis.
• Display:
Trough a report
environment data are
displayed to visualize
the results
17
Data as a Service (DaaS)
Develop an easy to use Platform that offers data sets management
(collect, aggregate and interlink data accessible via APIs) in order to
enable the creation of the new apps and services for customers
18
Google index and search information from web, we are able to
index, collect and expose data with APIs.
Business case
Business case example
Technological perspective
collect data and produce APIs
20
Data source aggregations
Mashup Diverse Data Sets
After shaping a table to the
form you want, easily join it
with another to uncover the
hidden relationships
between them.
Integrate
heterogeneous data
sources
Many open data could
be the source of a
complex big data
system
Develop connectors
The connectors allow to
ingest the data into the
system
…and so forth
21
Data models and technologies
Document
• Document-
Oriented
Storage
• Full Index
Support
• Replication &
High
Availability
• Auto-Sharding
• Querying
• GridFS
LinkedData
• Graph model
for data
representation
• Full ACID
transactions
• Native storage
engine
• Massively
scalable
• Multiple graph
query
language
MapReduceandHDFS
• Distributed
Files System
• JobTracker
• TaskTracker
• Log and file
stream
• Real time
analysis
• Fast access
data
• Sensors and
IOT
Relational
• Transanctional
operation
• ACID based
• Entity-
Relationship
• Legacy
system
• User
administration
22
Architecture overview
STORAGE
DATA SOURCE
Open DataPrivate DataPublic Data
CRMERPDWH RDF
PROCESS and
ANALYTICS
DATA and APIs
PROVIDER
APIs
Users
API clients
APIs and DATA
CONNECTORS Spring Data
JDBC
Prototype example
A first POC
24
We may recognize few contingencies in our scenario:
• Exponential growth in data volumes
• Rise of connectedness
• Increase in degrees of semi-structure
• Structures and Schemes emerge rather than having a pre-defined
upfront
Key facts:
• Volume: the size of the stored data
• Velocity: the rate at which data changes over time
• Variety: the degree to which data is regularly or irregularly
structured, dense or sparse, and importantly connected or
disconnected
Enriching Open Linked Data
25
Graph theory was pioneered by Euler in the 18th century, received
multidisciplinary contributes across centuries
Graph is an ordered pair G = (V, E) comprising set V of vertices or
nodes together with a set E of edges or lines, which are 2-element
subsets of V .
Graph Theory
One trick is to search for “graph based approach to” and your problem.
26
Six Degrees
27
28
• Facebook, Google and Twitter have centered their business models
around their own proprietary distributed graph technologies
Graph databases store information in ways that much more closely
resemble the ways the world is organized and the humans “think
about” data.
Top 10 Gartner IT technologies in 2013 “[..] are designed to support
new transaction, interaction and observation use cases involving web
scale, mobile, cloud and clustered environments”
Storing Data in Graphs
• Facebook The Association and Objects (TAO) Data Store
https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920
• Twitter FlockDB
https://github.com/twitter/flockdb
29
Neo4j Stack
DATA STORAGE
AND TRAVERSING
DATA ACCESS
AND
PROCESSING
DATA IMPORT Batch Import Neo4j
30
Graph DB place relationships as first-class abstractions of the data
model
A Graph –[:RECORDS_DATA_IN] Nodes –[:WHICH_HAVE]
Properties.
Nodes –[:LINKED_BY] Relationships
From Relational to Graph based Modeling
• It contains nodes and relationships
• Nodes contain properties (key-
value pairs)
• Relationships are named, directed
and always have a start and end
node
• Relationships can also contain
properties
31
Shake RDBMS while keeping all the relationships, and you’ll see a
graph
Where RDBMS are optimized for aggregated data, Graph Database
are optimized for highly connected data
From Relational to Graph based Modeling
32
Graph -directed Infrastructure
DATA STORAGE
AND TRAVERSING
DATA ACCESS
AND
PROCESSING
DATA IMPORT Batch Import Neo4j
ENTERPRISE
MANAGEMENT
VISUALIZATION
API
Connector
API
Provider
33
It is possible to derive queries for domain entities from finder method
names like Iterable<T>
@Indexed fields will be converted into index-lookups of the start
clause, navigation along relationships will be reflected in the match
clause properties with operators will end up as expressions in the
where clause
Spring Data Neo4j
34
Linking Data
35
Open Linked Graph
User
36
Open Linked Graph
Document
User
[:OWNS]
Document
[:OWNS]
37
Open Linked Graph
Document
User
[:OWNS]
[:INCLUDES] [:INCLUDES]
[:INCLUDES]
Document
[:INCLUDES]
[:INCLUDES]
[:INCLUDES]
[:OWNS]
Node
Node
Node
Node
NodeNode
38
Open Linked Graph
Document
User
[:OWNS]
[:INCLUDES] [:INCLUDES]
[:INCLUDES]
Document
[:INCLUDES]
[:INCLUDES]
[:INCLUDES]
[:OWNS]
[:DBP_LINKED][:LOCATED]
Node
Node
Node
Node
NodeNode
[:LOCATED]
[:DBP_LINKED]
Venue
VenueDBPedia URI
DBPedia URI
[:DBP_LINKED]
[:LOCATED]
[:LOCATED]
[:DBP_LINKED]
Venue
VenueDBPedia URI
DBPedia URI
Open API
Thanks
Silvano Galasso
Michele Piunti

Mais conteúdo relacionado

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Online Available Data Services: a Primer

  • 1. Online available data services: a primer Silvano Galasso Michele Piunti
  • 2. 3 • Where is the data? • The way to use Data • APP-ify the Data • Technological perspective • Prototype example Agenda
  • 3. Where is the Data?
  • 4. 5 Public sector organizations are increasingly looking to participate in data ecosystems and drive adoption of their data as fuel for innovation Data is everywhere
  • 5. 6 Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. In this context open data speed up economics combining not only government's open data but heterogeneous, large and rapidly changing dataset from every public sources like social networks, DBpedia (Wikipedia) and many more. Open Data as main source
  • 6. 7 Under the UK presidency during the recent G8 Summit (17-18 June) an Open Data Charter has been ratified Open Data is the global drive : • To enforce Transparency, Innovation, exchange between pepole and countries • To fuel better outcomes in public services such as health, education, public safety, environmental protection, governance, etc. • To provide a catalyst for innovation in the private sector, supporting the creation of new markets, businesses, and jobs. [2013-2015] time for planning and implementation G8 Open Data Charter https://www.gov.uk/government/publications/open-data-charter
  • 7. 8 Where Open Data is http://census.okfn.org/ https://nycopendata.socrata.com/ https://dati.lombardia.it ..and counting
  • 8. The way to use Data
  • 9. 10 Multiple legal or regulatory restrictions on the use of the data. Legal Restrictions, Privacy, Licenses
  • 10. 11 Third parties offers public data as valuable services APIs freely available under certain usage quota Data owner and APIs Source: Jonhn Musser, Programmable web
  • 11. 12 5★ Open Data ★ make PUBLIC stuff available on the Web (whatever format, .jpeg .pdf) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ use non-proprietary formats (e.g., CSV instead of Excel) ★★★★ use URIs to denote things, so that people can point at your stuff ★★★★★ link your data to other data to provide context Tim Berners-Lee, the inventor of the Web and Linked Data initiator, suggested a 5 star deployment scheme for Open Data.
  • 12. 13 Recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs OWL and RDF. Linked Open Data 1. Requires Ontologies to be applied to data 2. Allows heterogeneous Nodes to be traversed in a semantically coherent fashion http://live.dbpedia.org/page/Adoration_of_the_Magi_of_1475_(Botticelli) http://live.dbpedia.org/page/Primavera_(painting) http://live.dbpedia.org/page/Sandro_Botticelli
  • 13. 14 Open government data Municipal Regional National Community data Geographic Media Scientific Encyclopedic Data from third parties Facebook Twitter LinkedIn Google Data could be linked Linked Open Data Could be linked under certain conditions
  • 15. 16 Data modeling Identify Integrate Store Process Visualize • Unstructured source: Forum, Blog, Social Network, Web Data from which to extract the discussions. • Structured source: Operational database, CRM, SCM, ERP and other tools from which informations is collected. • Metadata ingestion: The selected information enriched with metadata can create relationships between the authors, websites, forums etc. • Information acquisition: The information is collected without any structure or filtration mechanisms with several connectors. • Data organization: The metadata and information are stored in a distributed environment • KPI Generation: The data are elaborated to produce KPI summary. • Organization: The Data is categorized through KPI calculations. • Data calculation: Through the calculation and statistical instruments the data is modeled • Analysis of data: Application of statistical models enhance the information in terms of quality • Use: The data are aggregated to create and summarize the results of the analysis. • Display: Trough a report environment data are displayed to visualize the results
  • 16. 17 Data as a Service (DaaS) Develop an easy to use Platform that offers data sets management (collect, aggregate and interlink data accessible via APIs) in order to enable the creation of the new apps and services for customers
  • 17. 18 Google index and search information from web, we are able to index, collect and expose data with APIs. Business case Business case example
  • 19. 20 Data source aggregations Mashup Diverse Data Sets After shaping a table to the form you want, easily join it with another to uncover the hidden relationships between them. Integrate heterogeneous data sources Many open data could be the source of a complex big data system Develop connectors The connectors allow to ingest the data into the system …and so forth
  • 20. 21 Data models and technologies Document • Document- Oriented Storage • Full Index Support • Replication & High Availability • Auto-Sharding • Querying • GridFS LinkedData • Graph model for data representation • Full ACID transactions • Native storage engine • Massively scalable • Multiple graph query language MapReduceandHDFS • Distributed Files System • JobTracker • TaskTracker • Log and file stream • Real time analysis • Fast access data • Sensors and IOT Relational • Transanctional operation • ACID based • Entity- Relationship • Legacy system • User administration
  • 21. 22 Architecture overview STORAGE DATA SOURCE Open DataPrivate DataPublic Data CRMERPDWH RDF PROCESS and ANALYTICS DATA and APIs PROVIDER APIs Users API clients APIs and DATA CONNECTORS Spring Data JDBC
  • 23. 24 We may recognize few contingencies in our scenario: • Exponential growth in data volumes • Rise of connectedness • Increase in degrees of semi-structure • Structures and Schemes emerge rather than having a pre-defined upfront Key facts: • Volume: the size of the stored data • Velocity: the rate at which data changes over time • Variety: the degree to which data is regularly or irregularly structured, dense or sparse, and importantly connected or disconnected Enriching Open Linked Data
  • 24. 25 Graph theory was pioneered by Euler in the 18th century, received multidisciplinary contributes across centuries Graph is an ordered pair G = (V, E) comprising set V of vertices or nodes together with a set E of edges or lines, which are 2-element subsets of V . Graph Theory One trick is to search for “graph based approach to” and your problem.
  • 26. 27
  • 27. 28 • Facebook, Google and Twitter have centered their business models around their own proprietary distributed graph technologies Graph databases store information in ways that much more closely resemble the ways the world is organized and the humans “think about” data. Top 10 Gartner IT technologies in 2013 “[..] are designed to support new transaction, interaction and observation use cases involving web scale, mobile, cloud and clustered environments” Storing Data in Graphs • Facebook The Association and Objects (TAO) Data Store https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920 • Twitter FlockDB https://github.com/twitter/flockdb
  • 28. 29 Neo4j Stack DATA STORAGE AND TRAVERSING DATA ACCESS AND PROCESSING DATA IMPORT Batch Import Neo4j
  • 29. 30 Graph DB place relationships as first-class abstractions of the data model A Graph –[:RECORDS_DATA_IN] Nodes –[:WHICH_HAVE] Properties. Nodes –[:LINKED_BY] Relationships From Relational to Graph based Modeling • It contains nodes and relationships • Nodes contain properties (key- value pairs) • Relationships are named, directed and always have a start and end node • Relationships can also contain properties
  • 30. 31 Shake RDBMS while keeping all the relationships, and you’ll see a graph Where RDBMS are optimized for aggregated data, Graph Database are optimized for highly connected data From Relational to Graph based Modeling
  • 31. 32 Graph -directed Infrastructure DATA STORAGE AND TRAVERSING DATA ACCESS AND PROCESSING DATA IMPORT Batch Import Neo4j ENTERPRISE MANAGEMENT VISUALIZATION API Connector API Provider
  • 32. 33 It is possible to derive queries for domain entities from finder method names like Iterable<T> @Indexed fields will be converted into index-lookups of the start clause, navigation along relationships will be reflected in the match clause properties with operators will end up as expressions in the where clause Spring Data Neo4j
  • 36. 37 Open Linked Graph Document User [:OWNS] [:INCLUDES] [:INCLUDES] [:INCLUDES] Document [:INCLUDES] [:INCLUDES] [:INCLUDES] [:OWNS] Node Node Node Node NodeNode
  • 37. 38 Open Linked Graph Document User [:OWNS] [:INCLUDES] [:INCLUDES] [:INCLUDES] Document [:INCLUDES] [:INCLUDES] [:INCLUDES] [:OWNS] [:DBP_LINKED][:LOCATED] Node Node Node Node NodeNode [:LOCATED] [:DBP_LINKED] Venue VenueDBPedia URI DBPedia URI [:DBP_LINKED] [:LOCATED] [:LOCATED] [:DBP_LINKED] Venue VenueDBPedia URI DBPedia URI Open API