1. BIG DATA EUROPE
H2020 CSA (2015-17)
BDE PILOT INSTANTIATION
Ronald Siebes VU Amsterdam
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges09.12.2016
3. SC1: Life Sciences & Health
14-déc.-16www.big-data-europe.eu
SC1: Life Sciences & Health
4. SC1: Life Sciences & Health
14-déc.-16www.big-data-europe.eu
Partners:
A not-for-profit membership organization,
which supports and continues the development
of the information infrastructure created during
the Open PHACTS project of the
Innovative Medicines Initiative (IMI).
The VU Amsterdam was a key participant in the
Open PHACTS project responsible for developi
the Linked-Data infrastructure.
Big Data Focus area: Large-scale heterogeneous pharma-research data
linking & integration
Selected Key Data assets: ACD Labs / ChemSpider, ChEBI, ChEMBL,
ConceptWiki, DrugBank, ENZYME, Gene Ontology, GO Annotation, SwissProt,
WikiPathways
8. SC1: Life Sciences & Health
14-déc.-16www.big-data-europe.eu
Pilot 1: Duplicate Open PHACTS functionality on
the BDE infrastructure using Open Source
solutions
Reasons:
• Deployment possible in-house
• Vary domains (e.g. Agriculture)
• Using extra BDE functionalities (e.g. logging,
analysis)
9. SC1: Life Sciences & Health
14-déc.-16www.big-data-europe.eu
BDE infrastructure
- Large scale RDF reasoning over 3 billion+ triples
- RESTful API
- Various front ends
11. SC2: Food & Agriculture
14-déc.-16www.big-data-europe.eu
Partners:
FAO, the largest autonomous agency within
the
United Nations system and one of the main
players in the agricultural information
community.
Big Data Focus area: Large-scale distributed agricultural data integration
Selected Key Data assets: INFOODS, AQUASTAT Green Learning Network
(GLN), Agricultural Bibliography Network (ABN), AgroVoc, AquaMaps, Fishbase
Semantic Web Company (SWC) is a technology provider
headquartered in Vienna (Austria). SWC supports
organizations from all industrial sectors worldwide to
improve their information management. Their core product is
to extract meaning from big data by making use of linked
data technologies.
13. SC2: Food & Agriculture
14-déc.-16www.big-data-europe.eu
Pilot focus area:
Viticulture
(from the Latin word for vine)
is the science, production,
and study of grapes.
It deals with the series of
events that occur in the vineyard.
14. SC2: Food & Agriculture
14-déc.-16www.big-data-europe.eu
Pilot 2: Support advanced crop
data discovery, processing,
combining and visualization
from distributed and
heterogeneous data
repositories
Vine and Wine sector: emerging market
in EU
Sustainability and biodiversity
challenges: local varieties are being lost
Exploitation of new grapevine varieties
and clones in terms of climate change
adaptation
Quality and health status of viticultural
products
Contribution to human health
(antioxidants, prevention of heart diseases
etc.)
Wide variety of heterogeneous (and big)
Reasons:
15. SC2: Food & Agriculture
14-déc.-16www.big-data-europe.eu
BDE infrastructure tasks
- Large scale data extraction and integration processing from external data sources
(tables, figures texts)
- Analysis batch jobs for generating statistical data
- Rich query support combining various parameters (e.g. location, geno/fenotypes,
publications, soil data)
- Various front ends similar to PubMed
17. SC3: Energy
14-déc.-16www.big-data-europe.eu
Partners:
A public entity supervised by the Ministry of
Environment, Energy and Climate Change in
Greece, founded in September 1987, active in the
fields of Renewable Energy Sources (RES),
Rational Use of Energy (RUE) and Energy Saving
(ES).
Big Data Focus area: Real-time turbine monitoring stream processing and
analytics
Selected Key Data assets: European Energy Exchange Data, smart meter
sensor data, gas/fuels market/price data, consumption statistics, stratigraphic
model data (geology, geophysics)
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
19. SC3: Energy
14-déc.-16www.big-data-europe.eu
Pilot 3: Operation,
maintenance and production
forecasting for wind turbines on
real-time sensor data.
Current technology is not able to deal
with full amount of available valuable data
Economic benefit of predicting output
and prevention of damage (if one can
predict one part about to fail it can be
prevented that other parts get damaged)
Large continuous stream of sensor data,
perfect to test our platform
Reasons:
20. SC3: Energy
14-déc.-16www.big-data-europe.eu
Data:
- Raw sensor and SCADA data from a
given wind farm
- Third-party raw or synthetic data
- Analysis results from built-in analysis
modules
Processing:
• Near-real time execution of parameterized
models to return operational statistics,
including correlation analysis of data
across units
• Weekly execution of operational statistics
• Weekly execution of model
parametrization
22. SC4: Transport
14-déc.-16www.big-data-europe.eu
Partners: The Fraunhofer Society is a German research organization
with 67 institutes spread throughout Germany, each
focusing on different fields of applied science.
Big Data Focus area: Real-time monitoring stream processing and analytics
Selected Key Data assets: European Energy Exchange Data, smart meter
sensor data, gas/fuels market/price data, consumption statistics, stratigraphic
model data (geology, geophysics)
The Centre for Research and Technology-Hellas (CERTH)
founded in 2000 is one of the leading research
centres in Greece. CERTH includes the Hellenic Institute of
Transport (HIT): Land, Sea and Air Transportation as well
as Sustainable Mobility services
ERTICO - ITS Europe is a partnership of around 100 companies
and institutions involved in the production of Intelligent Transport
Systems (ITS).
24. SC4: Transport
14-déc.-16www.big-data-europe.eu
Pilot 4: Multisource data
collection for the provision of
accurate info-mobility and
advanced transport planning
service in Thessaloniki, Greece
Congestion is a major problem in
Europe, especially in urban areas.
utilizing real-time probe data for the
provision of accurate info-mobility services
and advanced transport planning, leads to
better decisions
The use of mobility data coming from
multiple sources presents significant
challenges, especially due to the different
nature of the datasets both in content and
spatio-temporal terms as well as due to the
fact that the data should be collected and
processed in real time.
Reasons:
25. SC4: Transport
14-déc.-16www.big-data-europe.eu
Data:
• Traffic counts and speed (330 locations,
a data set every 1.5 – 5 minutes, 300k
records, 15 MB)
• Travel times from Bluetooth detectors
(43 locations, a data set every 15
minutes, 250k-300k records, 50 MB)
• Floating Car Data position and speed
(1200 vehicles, a data set every 2
minutes, 2M records, 200MB)
• Check-in events from social networks
27. SC5: Climate
14-déc.-16www.big-data-europe.eu
Partners:
A public entity supervised by the Ministry of
Environment, Energy and Climate Change in
Greece, founded in September 1987, active in the
fields of Renewable Energy Sources (RES),
Rational Use of Energy (RUE) and Energy Saving
(ES).
Big Data Focus area: Enormous simulation time. Extremely complicated
computing model. Selected Key Data assets: European Grid Infrastructure (EGI).
Access to several data centres hosted at CNRS-Lyon, NCSR-D Athens, INFN-Milan,
NIKhEF-Amsterdam.
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
29. SC5: Climate
14-déc.-16www.big-data-europe.eu
Pilot 5: Downscaling, and
retrieval
process on (raw) climate data
via
User-defined parameters (e.g.
geographical areas, time
period, physical variables,
computational grids, time
steps)
The provision of Climate model data
satisfies an important objective, that of
assessing the potential impacts of climate
change on well being for adaptation,
prevention and mitigation measures and
supporting other policy making decisions.
The awareness led to the availability of
huge datasets
Downscaling is a computational intensive
process
Reasons:
30. SC5: Climate
14-déc.-16www.big-data-europe.eu
Data:
• Earth System Grid Federation (ESGF) data:
• CMIP5 data (global climate model simulations)
• CORDEX data (regional climate model
simulations)
• NetCDF data
• European Centre for Medium range Weather
Forecasting (ECMWF) data
33. SC6: Social Sciences
14-déc.-16www.big-data-europe.eu
Partners:
CESSDA provides large scale, integrated and
sustainable data services to the social sciences.
CESSDA is organised as a limited company under
Norwegian law owned and financed by the
individual EU member states’ ministry of research
or a delegated institution.
Big Data Focus area: Statistical and research data linking & integration
Selected Key Data assets: Federated social sciences data catalogs, statistical data
from public data portals and statistical offices (e.g. EuroStats, UNESCO, WorldBank)
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
35. SC6: Social Sciences
14-déc.-16www.big-data-europe.eu
Pilot 6: Citizens budget
in municipal level
Budget: the most important document of
public policy
Budget execution affects everyday lives
Citizens are more involved in city level
Having a platform that integrates
heterogeneous budget data (many
municipality have their own data formats)
and calculates infographics would benefit
the citizens, the research community and
policy makers
Reasons:
36. SC6: Social Sciences
14-déc.-16www.big-data-europe.eu
Data:
• Datastream from Greek municipalities, with codes that
are unique identifiers based on national accounting
system for municipalities
• Data from 3 cities in Greece (Highest detail)
• Updated several times within the day (Streams with
no memory) ->Convert in daily observations
• Available through API or CSV/XLS
38. SC7: Security
14-déc.-16www.big-data-europe.eu
Partners:
The Centre supports the decision making of the
European Union in the field of the Common Foreign
and Security Policy (CFSP), by providing products
and services resulting from the exploitation of
relevant space assets and collateral data, including
satellite imagery and aerial imagery, and related
services.NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
39. SC7: Security
14-déc.-16www.big-data-europe.eu
Big Data Focus area: Image data analysis
Selected Key Data assets: Earth Observation data (e.g. Very High Resolution
Satellite Imagery acquired from commercial providers and governmental systems)
and collateral data for supporting CFSP/CSDP missions and operations
41. SC7: Security
14-déc.-16www.big-data-europe.eu
Pilot 7: Ingestion of remote
sensing images and social
sensing data to detect and
verify man-made changes on
the Earth surface for security
applications
Evacuation route planning
Monitoring of critical infrastructures
Border security
Satellite image data is HUGE and
computational intensive to compare
Smart ‘focus’ algorithms are needed to
prioritize the analysis jobs
Reasons:
42. SC7: Security
14-déc.-16www.big-data-europe.eu
Data:
• All data products are distributed in the SENTINEL
Standard Archive Format for Europe (SAFE) format
• The SENTINEL-SAFE format wraps a folder
containing image data in a binary data format and
product metadata in XML
• Social Media, which are demonstrated via consuming
Twitter streams
• News agencies, which are demonstrated via
consuming Reuters RSS feeds