Biopesticide (2).pptx .This slides helps to know the different types of biop...
BDE SC6-hang out - technology part-SWC - Martin
1. BIG DATA EUROPE
PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL
EUROPE IN A CHANGING WORLD - INCLUSIVE, INNOVATIVE AND REFLECTIVE SOCIETIES
HANG OUT
28 SEPTEMBER 2016
MARTIN KALTENBOECK (CFO, SEMANTIC WEB COMPANY)
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal ChallengesBDE SC6 Hangout
2. Big Data Europe (CSA: 2015-17)
Show societal value of Big Data: 7 Domains
Lower barrier for using big data technologies
o Required effort and resources
o Limited data science skills
Help establishing cross-
lingual/organizational/domain Data Value
Chains 26-oct.-16
3. Big Data Europe
26-oct.-16
COORDINATION
Stakeholder Engagement
(Requirements Elicitation)
SUPPORT
Design, Realise, Evaluate
Big Data Aggregator
Platform
Create and Manage Societal
Big Data Interest Groups
Cloud-deployment ready
Big Data Aggregator
Platform
CSA
Measures
Results
4. THE BDE PLATFORM
ARCHITECTURE & COMPONENTS
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges
7. Adding a Semantic Layer to Data
Lakes Manufacturing Marketing Sales SupportAccounting
Semantic Data Lake
• central place for
model, schema and
data historization
• Combination of Scale
Out (cost reduction)
and semantics
(increased control &
flexibility)
• grows incrementally
(pay-as-you-go)
Inbound
Data Sources
Outbound and
Consumption
Inbound Raw Data Store
Data Lake (order of magnitude cheaper scalable data store)
Knowledge Graph for Relationship Definition and Meta Data
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
JSON-LD CSVW R2RMLXML2RDF
8. Why to use BDE Technology?
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight virtualization
Plug & play components
(no rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure recovery
(yarn)
Multiple Failure recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom
9. SC6 PILOT
CITIZENS BUDGET ON MUNICIPAL LEVEL
ARCHITECTURE & COMPONENTS
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges
11. SC6: Social Sciences
26-oct.-16www.big-data-europe.eu
Pilot focus area:
Citizens budget spending
on municipal level
Big Data Focus area: Statistical
and research data linking &
integration
Selected Key Data assets: Detailed
budget execution data in city level,
statistical data from public data
portals and statistical offices,
federated social sciences data
12. 4 Vs of Big Data in SC6 Pilot
Variety: requirement based on the harvesting of budget data and
budget execution data from several sources, available in different
structures and formats.
Volume: requirement regarding the growing amount of open budget
data available as well as of budget execution data
Velocity: requirements regarding budget execution data that is
provided on continuous basis by the publisher (daily, weekly, monthly).
Veracity: Veracity refers to the biases, noise and abnormality in data.
Even for within the same country there are differences on the published
data because often are coming from different systems or public
accounting standards are not enforced absolutely uniformly (e.g.
different municipal departments) 26-oct.-16www.big-data-europe.eu
13. SC6 Pilot - Architecture
26-oct.-16www.big-data-europe.eu
14. SC6 Pilot: Technical
Components
Apache Flume, https://flume.apache.org/ (data ingestion)
Apache Kafka, http://kafka.apache.org (messaging service)
Apache Spark, http://spark.apache.org (distributed analysis, transformation)
Apache HDFS, http://hadoop.apache.org (raw data storage)
SWCs’ PoolParty Semantic Suite, http://poolparty.biz (data consolidation, curation,
mapping)
OpenLink s’ Virtuoso, http://virtuoso.openlinksw.com (triple store – data storage)
Apache HTTP, http://httpd.apache.org (linked data serving)
Apache Avro, http://avro.apache.org/docs/current/ (intermediate data schema)
D3 JS Library, https://d3js.org/ (visualisation of RDF data using SPARQL queries)
SWCs’ PoolParty GraphSearch (SPARQL based interface component for filter & faceted
search)
26-oct.-16www.big-data-europe.eu
16. SC6 Pilot: Pilot Evaluation
Evaluation Approach SC6 Pilot:
Invite municipalities to evaluate and use the system
Invite community (open data, data community, BDE community, W3C)
Evaluate within the 2 participating projects (BDE, YourDataStories)
BDE SC6 workshop in Cologne, 5.12.2016 + Overall BDE Tech WS
(ApacheCon)
Additional evaluation – tests over time with
a growing amount of data
a growing number of different sources & formats docked onto the system
additional analytics in place
26-oct.-16www.big-data-europe.eu
17. How to benefit best from BDE
26-oct.-16www.big-data-europe.eu
Health
19 October
Brussel
s
Standalone Workshop
Food&Agri 30 September
2016
Brussel
s
Collocated with DG AGRI WP2018-20 stakeholder
consultation
Energy 20 September
2016
Brussel
s
Collocated with H2020 Energy InfoDay (19
th
)
Transport 16 September
2016
Brussel
s
Collocated with TM 2.0 Steering Body meeting
Climate February 2017 Brussel
s
Collocated with EC JRC ISPRA Workshop
Societies 5 December 2016 Cologn
e
Collocated with EDDI16- 8th Annual European DDI
User Conference: http://bde-sc6-2016.eventbrite.com
(40 seats)
Security 18 October 2016 Brussel
s
Standalone Workshop
• BDE Workshops& Webinars
• Use & expand the BDE Platform
• Visit Website: news, events,
community, …
• Big Data Europe W3C Community
18. Contacts:
CESSDA, http://cessda.net/
Ivana Ilijasic Versic, ivana.versic@cessda.net
Hossein Abroshan, hossein.abroshan@cessda.net
NCSR-D, http://www.demokritos.gr/?lang=en
Michalis Vafopoulos, vafopoulos@gmail.com
Semantic Web Company (SWC), http://www.semantic-web.at
Martin Kaltenböck, m.kaltenboeck@semantic-web.at
Jürgen Jakobitsch, j.jakobitsch@semantic-web.at
26-oct.-16www.big-data-europe.eu
Project obecjtives:
Addressing each of the Societal Challenge domains (7), we have a domain representative for each & a pilot instantiation of the BDE platform for each in progress
One of the challenges to Big Data opportunities is the lack of skills (data science) – our aim is to provide out of the box technology with not a lot of training required to use and apply
BDE technology can be applied in multiple domains and in different phases within Data Value Chains, working with different data providers and addressing multiple objectives (as opposed to current solutions, which tend to be very specific to one data source or domain, and address one objective.
Project obecjtives:
Addressing each of the Societal Challenge domains (7), we have a domain representative for each & a pilot instantiation of the BDE platform for each in progress
One of the challenges to Big Data opportunities is the lack of skills (data science) – our aim is to provide out of the box technology with not a lot of training required to use and apply
BDE technology can be applied in multiple domains and in different phases within Data Value Chains, working with different data providers and addressing multiple objectives (as opposed to current solutions, which tend to be very specific to one data source or domain, and address one objective.
Data Lake is a storage repository for big data scale raw data in original data formats.
late binding approach to schema: “Let us decide, when we need it.”
scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses.
Semantic Data Lake = Data Lake + Knowledge Graph
management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other.
A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities.
Based on the Resource Description Framework (RDF) standard and Linked Data principles.
Data Lake is a storage repository for big data scale raw data in original data formats.
late binding approach to schema: “Let us decide, when we need it.”
scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses.
Semantic Data Lake = Data Lake + Knowledge Graph
management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other.
A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities.
Based on the Resource Description Framework (RDF) standard and Linked Data principles.