+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
BigDataEurope @BDVA Summit2016 1: The BDE Platform
1. BIG DATA EUROPE'S
INTEGRATOR PLATFORM
A ONE-STOP SOLUTION FOR BIG AND
SMART DATA MANAGEMENT
BDVA Summit 2016, Valencia1 December 2016
Summit 2016
2. Talk outline
The BigDataEurope Project, Mission & BDVA Synergies
The Big Data Integrator (BDI) platform
o Stakeholder Requirements
o Architecture
o Supported Components
o Beyond the State-of-the-Art
A look into the BDI platform [DEMO]
6-déc.-16www.big-data-europe.eu
3. Supporting the Societal Domains with Big Data Technology
BigDataEurope Project
6-déc.-16www.big-data-europe.eu
4. BigDataEurope Action
EC Horizon 2020 Coordination & Support Action
o ~5mio €, 2015-2017
Show societal value of Big Data
o Across all societal challenges addressed by H2020
Lower barrier for using big data technologies
o Effort to setup and deploy use-case workflows
o Lack of skills & expertise
Help establish data value chains across domains & orgs.
6-déc.-16www.big-data-europe.eu
6. Stakeholder Engagement Cycle
Present action, showcase
deployments
Raise awareness about BDE results,
what they mean for stakeholders
Collect requirements to drive
further development
6-déc.-16
www.big-data-europe.eu
M12M6 M18 M24 M30
7. Data Value Chain Evolution
6-déc.-16
Extraction, Curation Quality, Linking,
Integration
Publication,
Visualization, Analysis
Extraction, Curation, Quality,
Linking, Integration, Publication,
Visualization, Analysis
Health
Transport
Security
Extraction Curation Quality Linking Integration Publication Visualization Analysis
Data
Repositories
Linked
Open Data
TIME
Food SocietiesClimate Energy
Proprietary,
‘locked-in’
solutions
OS Solutions,
Big Data Stacks
www.big-data-europe.eu
8. Parallels to BDVA Mission
Task Force 6 (Technical)
o SG1: Management
o SG2: Big Data Architectures and Infrastructures
The Big Data Integrator Platform (SG2)
o Generic Architecture (Blueprint) & Instances
Smart Big Data Management (SG1)
o Support for Semantic Components & Data Lakes
6-déc.-16www.big-data-europe.eu
9. A flexible, generic platform for (Big) Data Value
Chain Deployment
1. Stakeholder Requirements
Big Data Integrator
6-déc.-16www.big-data-europe.eu
14. A flexible, generic platform for (Big) Data Value
Chain Deployment
2. Architecture
Big Data Integrator
6-déc.-16www.big-data-europe.eu
15. Big Data Integrator Architecture
Prototype developed by BDE
o Incorporates existing BD technology
o Facilitates integration and deployment
Main points of the architecture
o Dockerization
o Support layer, including integrated UI
o Semantification layer
6-déc.-16www.big-data-europe.eu
17. Docker containers
6-déc.-16www.big-data-europe.eu
Docker offers lightweight virtualization
o Containers can be shared/provisioned on different Linux variations/versions
Identical base system
o NOT Required
All BDI components
o Docker containers
22. BDE vs Hadoop distributions
BDE is not built on top of existing distributions
Targets
o Communities
o Research institutions
Bridges scientists and open data
Multi-Tier research efforts towards Smart Data
22
23. BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom components Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
23
24. A flexible, generic platform for (Big) Data Value
Chain Deployment
3. Supported Components
Big Data Integrator
6-déc.-16www.big-data-europe.eu
25. Dockerized Components
6-déc.-16www.big-data-europe.eu
Processing and storage components
o Re-used existing docker containers (where available)
o Dockerized by BDE otherwise
o Ensuring all can be provisioned through Docker Swarm
Other Components
o Semantic Layer
o Support Layer
27. A flexible, generic platform for (Big) Data Value
Chain Deployment
4. In-use: Deployment & Installation
Big Data Integrator
6-déc.-16www.big-data-europe.eu
29. Platform installation
Manual installation guide
Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
Screencasts (Getting Starting with the Platform)
29
30. Developing a component
Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/data
Published components
o Responsibilities divided b/w partners
o Image repositories on GitHub
o Automated builds on DockerHub
o Documentation on BDE Wiki
30
31. Deploying a Big Data Stack
Stack: Collection of communicating components to solve
a specific problem
Described in Docker Compose
o Component configuration
o Application topology
Orchestrator required for initialization process
o Components may depend on each other
o Components may require manual intervention
31
32. Support Layer (User Interfaces)
6-déc.-16www.big-data-europe.eu
Integrator UI
o Web UIs from BDE dockers (including 3rd party components)
follow these BDE stylesheets
Stack Monitor App
o Workflow Builder
o Workflow Monitor
Swarm UI
o Allows scaling up/down multiple Docker instances
Stack
37. Demonstrating the ease-of-use in deploying
custom instances of the BDI Platform
Recorded video showing an example available:
https://www.youtube.com/watch?v=1zHIhFDDdCg
BDI Platform – A Demo
6-déc.-16www.big-data-europe.eu
38. A flexible, generic platform for (Big) Data Value
Chain Deployment
5. Beyond the State-of-the-Art
Big Data Integrator
6-déc.-16www.big-data-europe.eu
40. Quelle: Gesellschaft für Informatik
Variety – The most neglected V?
Data Source
Heterogeneity
Lack of
interoperability
/semantics
41. Semantic Layer tools
6-déc.-16www.big-data-europe.eu
BDE tooling for Semantic Data Lake:
o Swagger: Semantics of RESTful APIs
o Semantic Analytics Stack (SANSA):
Distributed data processing over large-
scale Knowledge Graphs
o Semagrow: SPARQL over Big Data stores
o Ontario: Querying over Semantic Data
Lakes
42. Semantic Layer
www.big-data-europe.eu
Semantic Data Lakes
o Minimal ingestion
pre-processing
o Semantic layer
maintains metadata
o Add meaning when
retrieving/processing
Data Lake: scalable unstructureddata store
Relationshipdefinitions and metadata
JSON-LD CSVW R2RMLXML2RDF
Ongoing Research for Semantic Big Data & Analytics
Knowledge Graphs
43. Ontario: Semantic Data Lakes
Repository of data in its raw format
o Structured, semi-structured, unstructured
Schema-less
o No schema is defined on write, it is defined only on read
Open to any kind of processing
Add a Semantic layer on top of the source datasets
o Semantic data is handled as-is
o Non-Semantic data is semantically lifted using existing
ontology terms
43
45. SANSA: Semantic Analytics Stack
Abundant machine readable structured information is
available (e.g. in RDF)
o Across SCs, e.g. Life Science Data (OpenPhacts)
o General: DBpedia, Google knowledge graph
o Social graphs: Facebook, Twitter
Need for scalable querying, inference & ML
o Link prediction
o Knowledge base completion
o Predictive analytics
45
48. More Information
Big Data Integrator:
https://github.com/big-data-europe
README includes extensive
documentation, instructions and
information on supported
components
6-déc.-16www.big-data-europe.eu
50. 2nd round of Societal Workshops
6-déc.-16www.big-data-europe.eu
Transport 22 September 2016 Brussels Collocated with Big Data for
Transport, Tisa workshop
Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-
20 stakeholder consultation
Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day
on “Smart Grids and Storage”
Climate 11 October 2016 Brussels Collocated with Melodies Project
Event – Exploiting Open Data
Security 18 October 2016 Brussels Standalone Workshop
Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual
European DDI User Conference
Health 9 December 2016 Brussels Standalone Workshop
51. Other Activities
Fresh set (7) of Societal Workshops in 2017
Various SC-focussed and general hangouts, follow!
o Apache Flink & BDE (20 Oct) – available online
o BDVA & BDE Webinar planned early next year
o Keep track on BDE Website (Events)
6-déc.-16www.big-data-europe.eu
54. SANSA: Read Write Layer
Ingest RDF and OWL data in different formats
using Jena / OWL API style interfaces
Represent data in multiple formats (e.g. RDD, Data
Frames, GraphX, Tensors)
Allow transformation among these formats
Compute dataset statistics and apply functions to
URIs, literals, subjects, objects → Distributed
LODStats
54
55. SANSA: Query Layer
To make generic queries efficient and fast using:
o Intelligent indexing
o Splitting strategies
o Distributed Storage
o Distributed/ Federated Querying
Early work in progress: query evaluation (SPARQL-
to-SQL approaches, Virtual Views)
Provision of W3C SPARQL compliant endpoint
55
56. SANSA: Inference Layer
W3C Standards for Modelling: RDFS and
OWL
Parallel in-memory inference via rule-based
forward chaining
Beyond state of the art: dynamically build a
rule dependency graph for a rule set
→ Adjustable performance levels
56
57. SANSA: ML Layer
Distributed Machine Learning (ML) algorithms that
work on RDF data and make use of its structure /
semantics
Work in Progress:
o Tensor Factorization for e.g. KB completion (testing stage)
o Simple spatiotemporal analytics (idea stage)
o Graph Clustering (testing stage)
o Association rule mining (evaluation stage)
o Semantic Decision trees (idea stage)
57