This document discusses Big Data Europe (BDE), an open source big data platform. It provides an overview of BDE's goals, architecture, and applications. The key points are:
1) BDE's goals are to make it easy to install, develop for, deploy, and integrate big data applications. It aims to unlock the value of data through an open platform.
2) BDE supports a variety of frameworks and uses Docker to package components. Its architecture includes layers for resources, data, processing, and applications.
3) BDE is being applied to challenges in domains like health, transport, energy, and security. Examples analyze traffic patterns, perform predictive maintenance, and detect changes in infrastructure
30. Platform installation
◎ Manual installation guide
◎ Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
◎ Screencasts
30
33. ◎ High level picture
o docker-compose.yml describes pipeline topology
◎ BDE provided components
o extend template image with your code
◎ New components
o build a Docker image for your component
o this is your own little Virtual Machine for your component
◎ Sharing
o publish topology as git repository
o publish new components on docker hub
Platform development
34. Development
◎ Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/data
◎ Published components
o Image repositories on GitHub
o Automated builds on DockerHub
o Documentation on BDE Wiki
34
38. Enhancing the Component
◎ Orchestrator required for initialization process
(init_daemon)
o Components may depend on each other
o Components may require manual intervention
◎ User Interface Integration
o Standard Interfaces from components
o Combine and align the interfaces
38
40. Deploying a Big Data Stack
◎ Stack
o collection of communicating components
o to solve a specific problem
◎ Described in Docker Compose
o Component configuration
o Application topology
40
48. Beyond the state of the art ...
Smart Big Data
Increase the value of Big Data
by adding meaning to it!
48
49. Semantic Data Lake (Ontario)
◎ Data Swamp
o Repository of data in its raw format
o Structured, semi-structured, unstructured
o Schema-less
◎ Data Lake
o Add a Semantic layer on top of the source
datasets
o The data is semantically lifted using existing
ontology terms
49
54. BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
54
55. BDE vs Hadoop distributions
◎ BDE is not built on top of existing distributions
◎ Targets
o Communities
o Research institutions
◎ Bridges scientists and open data
◎ Multi Tier research efforts towards Smart
Data
55