SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
Prof. Yannis Ioannidis
“Athena” Research Center & University of Athens
BioMed
Oceans
Space & Earth
Culture Environment
OA Policies
Data Proc
OpenMinTeD
EXAREMEMaDIS
GRAPHOS
PAROS
CHESS
Optique
AITION/
TopMod
KDD/ML
MDP
OpenAIRE
MaDgIK Systems
DCV ML
ResAnal
HBP
Capsella
W-Dance
O-MinTeD
STE
G-kak^3
BB
EarthSrvr
V-Exhibit
EFG1914
Fut-TDM
OpenUP
WDAqua
RDA
StR-ESFRI
Data provision Layer : Extract, Transform, Load (ETL) , Anonymization & pre-processing of existing resources
Middleware Layer: Distributed execution of complex dataflows & distributed querying Engine
Application Layer: Data (pre) processing and knowledge discovery platform
Imaging ,
Video
Streaming Data Un/Semi/Structured
Biomedical Data
Legacy Data Simulation Models Digital Libraries
(PubMed etc)
Ontologies
(UMLS, GO..)
Clinician
knowledge
Upper level declarative language and extensible UDFs
MADRefine module
Data Preprocessing & Transformation
Curation & Validation
AITION clustering & general KDD
SoA Machine Learning Algorithms
Latent Variable & Topic Modelling
Distributed execution on clouds and ad-hoc clusters
Distributed Query Engine
AITION simulation
Graphical Probabilistic modelling for
Statistical simulation
Ontology Based Data Access
Data Processing
• Distribution, Federation, Parallelism
• EXAREME
Data Analytics
• Cleaning & curation
• MADRefine
• Modeling, Mining
• AITION
Federated data Layer & (open) research data infrastructures: Semantic Data modelling, Provenance & Integration
Multi-modal, vertical integrated,
distributed bio medical data
Biomedical Info
Registries & Metadata
Simulation Models
KDD Results
Data Infrastructures
• ESFRI Infrastructures
• ICOS, EMSO,
…
• E-Infrastructures
• OpenAIRE
WHATWHEREHOWWHY
OpenAIRE HUB
CERN
zenodo
Visualize - Manage
Enhanced Publications
Get support
(NOADs)
Linked Content
Statistics
+++
Search & Browse
Curate & collaborate
Deposit
Publications
& data
Research impact
Citations, usage
statistics
+++
Link Classify
De-duplicate Cite
Text Mine
APIs
Publication repositories
Institutional & Thematic
Open Access Journals
17,500,000 OA publications
700+ validated repositories
accessing >5K repos/OA journals
Data repositories
Data Journals
ResearchID (ORCID,
..)
OpenDOAR
…
CRIS
Systems
National funding
EC funding
Usage dataMetadata
on publications Metadata
on data
Guidelines for Data
Providers & Open Data Pilot
Guidelines for Funding
Info
Guidelines for
Publications
OpenAIRE
 ICOS
 LIFEWATCH
 EMSO
 SIOS
 EURO-ARGO
 IAGOS
 EPOS
 EISCAT
 COPAL
 ACTRIS
 DANUBIUS_RI
 ICOS: Integrated Carbon Observation System
 Harmonized and High Precision Scientific Data on
Carbon Cycle And Greenhouse Gas Budget and
Perturbations
 EMSO: European Multi-disciplinary Seafloor and
water-column Observatory
 Ocean observation systems for long-term, high-
resolution, (near) real-time monitoring of
environmental processes including natural hazards,
climate change, and marine ecosystems
 SIOS: Svalbard Integrated Earth Observing
System
 Arctic environmental and climate-related challenges
 EURO-ARGO: European contribution to ARGO
 Ocean observation and for oceanography and climate
 IAGOS: In-service Aircraft for a Global
Observing System
 Atmospheric composition, aerosol and cloud particles
 EISCAT_3D: European Incoherent Scatter
 Radar systems for the upper atmosphere, the
ionosphere and the Aurora Borealis
 EUFAR-COPAL: European Facility for
Airborne Research
 Airborne research for the environmental and geo
sciences in Europe
 ACTRIS: Aerosols, Clouds and Trace gases RI
 Models and forecast systems by offering high
quality data for atmospheric gases, clouds, and
trace gases
 DANUBIUS-RI: Int’l Center for Advanced
Studies on River-Sea Systems
 Addressing conflicts between society’s demands,
environmental change and environmental
protection in river–sea systems worldwide.
Data provision Layer : Extract, Transform, Load (ETL) , Anonymization & pre-processing of existing resources
Federated data Layer & (open) research data infrastructures: Semantic Data modelling, Provenance & Integration Layer:
Multi-modal, vertical integrated,
distributed bio medical data
Biomedical Info
Registries & Metadata
Simulation Models
Imaging ,
Video
Streaming Data Un/Semi/Structured
Biomedical Data
Legacy Data Simulation Models Digital Libraries
(PubMed etc)
Ontologies
(UMLS, GO..)
Clinician
knowledge
KDD Results
Application Layer: Data (pre) processing and knowledge discovery platform
MADRefine module
Data Preprocessing & Transformation
Curation & Validation
AITION clustering & general KDD
SoA Machine Learning Algorithms
Latent Variable & Topic Modelling
AITION simulation
Graphical Probabilistic modelling for
Statistical simulation
Data Analytics
• Cleaning & curation
• MADRefine
• Modeling, Mining
• AITION
Data Infrastructures
• ESFRI Infrastructures
• ELIXIR
• E-Infrastructures
• OpenAIRE
Middleware Layer: Distributed execution of complex dataflows & distributed querying Engine
Upper level declarative language and extensible UDFs
Distributed execution on clouds and ad-hoc clusters
Distributed Query Engine
Ontology Based Data Access
Data Processing
• Distribution, Federation, Parallelism
• EXAREME
Gateway
Master
Worker WorkerWorker Worker
Execution
Engine
Execution
Engine
Optimizatio
n Engine
Optimizatio
n Engine
Fast Local Net
Data
Connector
Data
Connector
P2P Net
 Parallel / distributed execution of complex data flows
targeting data analysis and mining
 Data remain at source (hospital) – dataflow / query travels
 Privacy preserving: transmit only aggregated information
from hospital (sufficient statistics)
 Advanced data compression, on the data partitioning
 Query Language: SQL + UDFs (in Python)
Query
Federation
Decompose query into
local and global parts
1 N
id m-name m-valueid m-name m-value
Local queries Local queries
Partial
aggregated
results
Run local
queries
Run local
queries
“count, avg, std”
m-name N avg std
m-name Σx Σx2 N
Σx,Σx2,N Σx,Σx2,N
Partial
aggregated
results
m-name Σx Σx2 N
L:“Σx, Σx2, N”
G:“N, avg, std”
Run global
queries
N, avg, std
• Distributed elastic execution
– Parallel aggregations, unions, and joins
– Resources are reserved dynamically
• Iterative dataflow execution
– Support machine learning algorithms
• Novel query optimization techniques
– SQL with User Defined Functions
– Arbitrary user code with unknown properties
– Privacy-aware query optimization
• Time and money
• 2-dimensional optimization
 Quantum: 1 hour
• Simple map-reduce flow
– A: 1 hour B: 10 minutes C: 1 hour
Schedule Time
(hours)
Money
(resource hours)
Winner
One host for all ops 18.60 19 5x cheaper
Different host per op 2.16 102 9x faster
• Optimal dataflow scheduling
• Skyline of all Pareto optimal plans
Time
Money
Data provision Layer : Extract, Transform, Load (ETL) , Anonymization & pre-processing of existing resources
EXAREME Middleware Layer: Distributed execution of complex dataflows & distributed querying Engine
Federated data Layer & (open) research data infrastructures: Semantic Data modelling, Provenance & Integration Layer:
Multi-modal, vertical integrated,
distributed bio medical data
Biomedical Info
Registries & Metadata
Simulation Models
Imaging ,
Video
Streaming Data Un/Semi/Structured
Biomedical Data
Legacy Data Simulation Models Digital Libraries
(PubMed etc)
Ontologies
(UMLS, GO..)
Clinician
knowledge
KDD Results
Upper level declarative language and extensible UDFs
Distributed execution on clouds and ad-hoc clusters
Distributed Query Engine
Ontology Based Data Access
Data Processing
• Distribution, Federation, Parallelism
• EXAREME
Data Infrastructures
• ESFRI Infrastructures
• ELIXIR
• E-Infrastructures
• OpenAIRE
Application Layer: Data (pre) processing and knowledge discovery platform
MADRefine module
Data Preprocessing & Transformation
Curation & Validation
AITION clustering & general KDD
SoA Machine Learning Algorithms
Latent Variable & Topic Modelling
AITION simulation
Graphical Probabilistic modelling for
Statistical simulation
Data Analytics
• Cleaning & curation
• MADRefine
• Modeling, Mining
• AITION
Data Mining
Disease signatures
Patient grouping & similarity
Raw data from biomarker based
personalized acquisition
Personalized Model
Guided Medicine
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Variable dependencies & causality
Simulation Models
Create Statistical
Simulation
Models
Individualized diagnosis,
prognosis & treatment plan
Model & VerificationKnowledge Discovery Reasoning & decision support
Data
Preprocessing
Curation & Validation
Transformed &
Validated Data
Domain knowledge &
assumptions
Clinical workflows
BOTTOM-UP TOP-DOWN
Big Data Analytics
• Capture
• multi source
• multi modal
• multi system
Management
• Data provenance
• Sanitization
(Anonymization)
• Process
• aggregate
• distributed
Analysis
• Privacy preserving
• Algorithms
• Mechanisms
Modeling
• Personalized
• De-identified
Practice
• Ethics
• Privacy
SEX AgeOnSet
ILAR
JntActDis
GlbActDis
DisDur JntLOM GenEval
CHAQ ESRCRPANA
MEFNIL2RAPoznanski
NSAID STEROID DMARD BIOLOGIC
JADI
JntLOMDiff CHAQDiff
ESRDiff CRPDiff
JntActDisDiffGlbActDisDiff
GenEvalDiff
BOXValidatedOut
Adapted Sharp/
van der Heijde
Score Out JADIOut
Extended BOX
Predictors
Medication
Outcome
demographics imaging genetics
clinical
lab
Synovial
volume
OTHER
Disease signatures
Patient grouping & similarity
Variable dependencies & causality
Simulation Models
Individualized diagnosis,
prognosis & treatment plan
Data Mining
Personalized Model
Guided Medicine
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Create Statistical
Simulation
Models
Model & VerificationKnowledge Discovery Reasoning & decision support
Domain knowledge &
assumptions
Clinical workflowsRaw data from biomarker based
personalized acquisition
Data
Preprocessing
Curation & Validation
Transformed &
Validated Data
 Extensible validation and data transformation engine
 Ιnteractive and efficient WEB-Based interface
 Data cleaning:
◦ Typographical error detection (numeric & alphanumeric)
◦ Data cleaning rules: (functional dependencies, conditional funct.
dependencies, denial constraints)
◦ New/derived columns (discretization, computation of medical scores)
◦ Data visualisation (barcharts, piecharts, scatterplots, linecharts, etc.)
 End-to-end data analysis workflow support (rerun experiments,
reproduce results)
Variable dependencies & causality
Simulation Models
Individualized diagnosis,
prognosis & treatment plan
Transformed &
Validated Data
Personalized Model
Guided Medicine
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Create Statistical
Simulation
Models
Model & Verification Reasoning & decision support
Data
Preprocessing
Curation & Validation
Domain knowledge &
assumptions
Clinical workflows
Data Mining
Raw data from biomarker based
personalized acquisition
Knowledge Discovery
Disease signatures
Patient grouping & similarity
 Disease signatures: Latent factors (patterns) that characterize
disease
◦ Distribution of most relevant variables for disease (e.g., biomarkers)
◦ Multiple variables per signature, signatures per disease
 Patient Cluster: Homogeneous patient group with common
characteristics
 Patient Similarity: Patients “like” me or mine (patient or
clinician role)
◦ “like” = according to different criteria
(e.g., allocation on disease signatures)
Similarity &
Graph clustering
Topics & allocations
Modelling
Disease signatures
Patient grouping & similarity
Individualized diagnosis,
prognosis & treatment plan
Transformed &
Validated Data
Personalized Model
Guided Medicine
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Reasoning & decision support
Clinical workflows
Data Mining
Raw data from biomarker based
personalized acquisition
Knowledge Discovery
Data
Preprocessing
Curation & Validation
Create Statistical
Simulation
Models
Model & Verification
Domain knowledge &
assumptions
Variable dependencies & causality
Simulation Models
 Bayesian Net: Directed Acyclic Graph + Conditional Prob Distributions
◦ Features (Nodes) & Dependencies (Edges)
◦ Compact representation of joint data distribution
Patient X1 X2 X3 X4 X5 X6 X7 X8
1 Y N N Y Y Y N Y
:
1000 N N Y N N Y N N
X1
X4 X5
X7
X8
Smoking
Lung
cancer
Chronic
bronchitis
X2
Genetic Factor
X6
X3
Allergy +
Find:
Given:
+
Domain
Knowledge
Final DAG (based on MCMC&DP, threshold=0.5)
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Modelling Dependency Analysis
Inference
Disease signatures
Patient grouping & similarity
Variable dependencies & causality
Simulation Models
Transformed &
Validated Data
Data Mining
Raw data from biomarker based
personalized acquisition
Knowledge Discovery
Data
Preprocessing
Curation & Validation
Create Statistical
Simulation
Models
Model & Verification
Domain knowledge &
assumptions
Personalized Model
Guided Medicine
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Reasoning & decision support
Clinical workflows
Individualized diagnosis,
prognosis & treatment plan
Increased RVD is related
with worse values in every
MR aspect
(TVPRegurg, PSMotion,
RedRV, AV_Block,
TriRegurg)
Brussels – 6-7 May 2014
MyHealthMyData
Raw
Personal
Data
Raw
Anonymised
Summary
Anonymised
Private Controlled Access Public
Bioinformatics
services for All Users
Doctors (and
Patients?)
Researchers
 Obtaining consent not straightforward
 Anonymisation: necessary, rather complicated,
ensuring neither privacy nor data value
 “Blending in a crowd” and k-anonymity: privacy is
property not output of sanitization
 How do we define privacy?
 data publishing: “Sanitization” (Anonymisation) hiding individual info
(k-anonymity) but preserving (sufficient) aggregated statistics
 data mining: Specific algorithms (usually operating in two phases)
for classification, clustering, association rules, …
 mechanisms: Differential Privacy & Crowd-Blending Privacy perturb
data or add noise ensuring ε-indistinguishable output distribution
 encryption: Fully Homomorphic Encryption (FHE) for computation
and query to run over encrypted data
 decentralization: Blockchain to Protect Personal Data - decentralized
personal data management, users own and control their data
 Big data is not only about size
 Data is distributed, data is heterogeneous
 Processing goes to data, not data to processing
 ICT (Data management & processing) advances
◦ Data compression
◦ Federated / privacy-preserving processing
◦ Scalable parallel / distributed processing
◦ Data curation (otherwise: garbage in, garbage out)
◦ Text and data analytics
 http://www.madgik.di.uoa.gr
 https://www.humanbrainproject.eu
 http://www.md-paedigree.eu/
 http://www.openaire.eu
 http://www.optique-project.eu

Mais conteúdo relacionado

Mais procurados

Inspire hack 2017-linked-data
Inspire hack 2017-linked-dataInspire hack 2017-linked-data
Inspire hack 2017-linked-dataRaul Palma
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
 
HNSciCloud meets the WLCG Board members: The Pre-Commercial Procurement pilot
HNSciCloud meets the WLCG Board members: The Pre-Commercial Procurement pilotHNSciCloud meets the WLCG Board members: The Pre-Commercial Procurement pilot
HNSciCloud meets the WLCG Board members: The Pre-Commercial Procurement pilotHelix Nebula The Science Cloud
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos
 
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE caseA Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE caseBlue BRIDGE
 
Phidias: Steps forward in detection and identification of anomalous atmospher...
Phidias: Steps forward in detection and identification of anomalous atmospher...Phidias: Steps forward in detection and identification of anomalous atmospher...
Phidias: Steps forward in detection and identification of anomalous atmospher...Phidias
 
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 BDE SC3.3 Workshop -  BDE review: Scope and Opportunities BDE SC3.3 Workshop -  BDE review: Scope and Opportunities
BDE SC3.3 Workshop - BDE review: Scope and OpportunitiesBigData_Europe
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Blue BRIDGE
 
PHIDIAS HPC – Building a prototype for Earth Science Data and HPC Services
PHIDIAS HPC – Building a prototype for Earth Science Data and HPC ServicesPHIDIAS HPC – Building a prototype for Earth Science Data and HPC Services
PHIDIAS HPC – Building a prototype for Earth Science Data and HPC ServicesPhidias
 
Experience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale PaganoExperience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale PaganoBlue BRIDGE
 
Open Source Software and Open Interoperability Standards at EDINA National Da...
Open Source Software and Open Interoperability Standards at EDINA National Da...Open Source Software and Open Interoperability Standards at EDINA National Da...
Open Source Software and Open Interoperability Standards at EDINA National Da...EDINA, University of Edinburgh
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...EOSC-hub project
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
 

Mais procurados (19)

Inspire hack 2017-linked-data
Inspire hack 2017-linked-dataInspire hack 2017-linked-data
Inspire hack 2017-linked-data
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
 
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
 
Glasgow University Geo Metadata Workshop
Glasgow University Geo Metadata WorkshopGlasgow University Geo Metadata Workshop
Glasgow University Geo Metadata Workshop
 
Perx and TechXtra
Perx and TechXtraPerx and TechXtra
Perx and TechXtra
 
Geospatial Metadata Workshop
Geospatial Metadata WorkshopGeospatial Metadata Workshop
Geospatial Metadata Workshop
 
HNSciCloud meets the WLCG Board members: The Pre-Commercial Procurement pilot
HNSciCloud meets the WLCG Board members: The Pre-Commercial Procurement pilotHNSciCloud meets the WLCG Board members: The Pre-Commercial Procurement pilot
HNSciCloud meets the WLCG Board members: The Pre-Commercial Procurement pilot
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
 
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE caseA Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
 
Phidias: Steps forward in detection and identification of anomalous atmospher...
Phidias: Steps forward in detection and identification of anomalous atmospher...Phidias: Steps forward in detection and identification of anomalous atmospher...
Phidias: Steps forward in detection and identification of anomalous atmospher...
 
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 BDE SC3.3 Workshop -  BDE review: Scope and Opportunities BDE SC3.3 Workshop -  BDE review: Scope and Opportunities
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?
 
PHIDIAS HPC – Building a prototype for Earth Science Data and HPC Services
PHIDIAS HPC – Building a prototype for Earth Science Data and HPC ServicesPHIDIAS HPC – Building a prototype for Earth Science Data and HPC Services
PHIDIAS HPC – Building a prototype for Earth Science Data and HPC Services
 
Experience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale PaganoExperience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale Pagano
 
Open Source Software and Open Interoperability Standards at EDINA National Da...
Open Source Software and Open Interoperability Standards at EDINA National Da...Open Source Software and Open Interoperability Standards at EDINA National Da...
Open Source Software and Open Interoperability Standards at EDINA National Da...
 
Pl data science october 2017
Pl data science october 2017Pl data science october 2017
Pl data science october 2017
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
 

Semelhante a Big data in the research life cycle: technologies, infrastructures, policies

SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...aceas13tern
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
2005-03-17 Air Quality Cluster TechTrack
2005-03-17 Air Quality Cluster TechTrack2005-03-17 Air Quality Cluster TechTrack
2005-03-17 Air Quality Cluster TechTrackRudolf Husar
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfssuserff37aa
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Exascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataExascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataJoel Saltz
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discoveryruss9595
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
AusCover portal presentation
AusCover portal presentationAusCover portal presentation
AusCover portal presentationTERN Australia
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesManjulaPatel
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
NeISSProject
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
061206 Ua Huntsville Seminar
061206 Ua Huntsville Seminar061206 Ua Huntsville Seminar
061206 Ua Huntsville SeminarRudolf Husar
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 

Semelhante a Big data in the research life cycle: technologies, infrastructures, policies (20)

SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
2005-03-17 Air Quality Cluster TechTrack
2005-03-17 Air Quality Cluster TechTrack2005-03-17 Air Quality Cluster TechTrack
2005-03-17 Air Quality Cluster TechTrack
 
Ws Stuff
Ws StuffWs Stuff
Ws Stuff
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Exascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataExascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor Data
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discovery
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
AusCover portal presentation
AusCover portal presentationAusCover portal presentation
AusCover portal presentation
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
061206 Ua Huntsville Seminar
061206 Ua Huntsville Seminar061206 Ua Huntsville Seminar
061206 Ua Huntsville Seminar
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 

Mais de BigData_Europe

Luigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformLuigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformBigData_Europe
 
Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4BigData_Europe
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectBigData_Europe
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...BigData_Europe
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
 
BDE SC3.3 Workshop - Agenda
 BDE SC3.3 Workshop - Agenda BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - AgendaBigData_Europe
 
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re... BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...BigData_Europe
 
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 BDE SC3.3 Workshop - Data management in WT testing and monitoring  BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Data management in WT testing and monitoring BigData_Europe
 
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition MonitoringBigData_Europe
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BigData_Europe
 
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics  BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics BigData_Europe
 
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...BigData_Europe
 
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BigData_Europe
 
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BigData_Europe
 
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BigData_Europe
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
 
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)BigData_Europe
 
SC1 Hangout: Updating public databases: Automation and other challenges for c...
SC1 Hangout: Updating public databases: Automation and other challenges for c...SC1 Hangout: Updating public databases: Automation and other challenges for c...
SC1 Hangout: Updating public databases: Automation and other challenges for c...BigData_Europe
 

Mais de BigData_Europe (20)

Luigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformLuigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator Platform
 
Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO Project
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
BDE SC3.3 Workshop - Agenda
 BDE SC3.3 Workshop - Agenda BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - Agenda
 
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re... BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 BDE SC3.3 Workshop - Data management in WT testing and monitoring  BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
 
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics  BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
 
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
 
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
 
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)
 
SC1 Hangout: Updating public databases: Automation and other challenges for c...
SC1 Hangout: Updating public databases: Automation and other challenges for c...SC1 Hangout: Updating public databases: Automation and other challenges for c...
SC1 Hangout: Updating public databases: Automation and other challenges for c...
 

Último

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Big data in the research life cycle: technologies, infrastructures, policies

  • 1. Prof. Yannis Ioannidis “Athena” Research Center & University of Athens
  • 2. BioMed Oceans Space & Earth Culture Environment OA Policies Data Proc OpenMinTeD
  • 4. Data provision Layer : Extract, Transform, Load (ETL) , Anonymization & pre-processing of existing resources Middleware Layer: Distributed execution of complex dataflows & distributed querying Engine Application Layer: Data (pre) processing and knowledge discovery platform Imaging , Video Streaming Data Un/Semi/Structured Biomedical Data Legacy Data Simulation Models Digital Libraries (PubMed etc) Ontologies (UMLS, GO..) Clinician knowledge Upper level declarative language and extensible UDFs MADRefine module Data Preprocessing & Transformation Curation & Validation AITION clustering & general KDD SoA Machine Learning Algorithms Latent Variable & Topic Modelling Distributed execution on clouds and ad-hoc clusters Distributed Query Engine AITION simulation Graphical Probabilistic modelling for Statistical simulation Ontology Based Data Access Data Processing • Distribution, Federation, Parallelism • EXAREME Data Analytics • Cleaning & curation • MADRefine • Modeling, Mining • AITION Federated data Layer & (open) research data infrastructures: Semantic Data modelling, Provenance & Integration Multi-modal, vertical integrated, distributed bio medical data Biomedical Info Registries & Metadata Simulation Models KDD Results Data Infrastructures • ESFRI Infrastructures • ICOS, EMSO, … • E-Infrastructures • OpenAIRE WHATWHEREHOWWHY
  • 5. OpenAIRE HUB CERN zenodo Visualize - Manage Enhanced Publications Get support (NOADs) Linked Content Statistics +++ Search & Browse Curate & collaborate Deposit Publications & data Research impact Citations, usage statistics +++ Link Classify De-duplicate Cite Text Mine APIs Publication repositories Institutional & Thematic Open Access Journals 17,500,000 OA publications 700+ validated repositories accessing >5K repos/OA journals Data repositories Data Journals ResearchID (ORCID, ..) OpenDOAR … CRIS Systems National funding EC funding Usage dataMetadata on publications Metadata on data Guidelines for Data Providers & Open Data Pilot Guidelines for Funding Info Guidelines for Publications OpenAIRE
  • 6.  ICOS  LIFEWATCH  EMSO  SIOS  EURO-ARGO  IAGOS  EPOS  EISCAT  COPAL  ACTRIS  DANUBIUS_RI
  • 7.  ICOS: Integrated Carbon Observation System  Harmonized and High Precision Scientific Data on Carbon Cycle And Greenhouse Gas Budget and Perturbations  EMSO: European Multi-disciplinary Seafloor and water-column Observatory  Ocean observation systems for long-term, high- resolution, (near) real-time monitoring of environmental processes including natural hazards, climate change, and marine ecosystems
  • 8.  SIOS: Svalbard Integrated Earth Observing System  Arctic environmental and climate-related challenges  EURO-ARGO: European contribution to ARGO  Ocean observation and for oceanography and climate  IAGOS: In-service Aircraft for a Global Observing System  Atmospheric composition, aerosol and cloud particles
  • 9.  EISCAT_3D: European Incoherent Scatter  Radar systems for the upper atmosphere, the ionosphere and the Aurora Borealis  EUFAR-COPAL: European Facility for Airborne Research  Airborne research for the environmental and geo sciences in Europe
  • 10.  ACTRIS: Aerosols, Clouds and Trace gases RI  Models and forecast systems by offering high quality data for atmospheric gases, clouds, and trace gases  DANUBIUS-RI: Int’l Center for Advanced Studies on River-Sea Systems  Addressing conflicts between society’s demands, environmental change and environmental protection in river–sea systems worldwide.
  • 11. Data provision Layer : Extract, Transform, Load (ETL) , Anonymization & pre-processing of existing resources Federated data Layer & (open) research data infrastructures: Semantic Data modelling, Provenance & Integration Layer: Multi-modal, vertical integrated, distributed bio medical data Biomedical Info Registries & Metadata Simulation Models Imaging , Video Streaming Data Un/Semi/Structured Biomedical Data Legacy Data Simulation Models Digital Libraries (PubMed etc) Ontologies (UMLS, GO..) Clinician knowledge KDD Results Application Layer: Data (pre) processing and knowledge discovery platform MADRefine module Data Preprocessing & Transformation Curation & Validation AITION clustering & general KDD SoA Machine Learning Algorithms Latent Variable & Topic Modelling AITION simulation Graphical Probabilistic modelling for Statistical simulation Data Analytics • Cleaning & curation • MADRefine • Modeling, Mining • AITION Data Infrastructures • ESFRI Infrastructures • ELIXIR • E-Infrastructures • OpenAIRE Middleware Layer: Distributed execution of complex dataflows & distributed querying Engine Upper level declarative language and extensible UDFs Distributed execution on clouds and ad-hoc clusters Distributed Query Engine Ontology Based Data Access Data Processing • Distribution, Federation, Parallelism • EXAREME
  • 12. Gateway Master Worker WorkerWorker Worker Execution Engine Execution Engine Optimizatio n Engine Optimizatio n Engine Fast Local Net Data Connector Data Connector P2P Net
  • 13.  Parallel / distributed execution of complex data flows targeting data analysis and mining  Data remain at source (hospital) – dataflow / query travels  Privacy preserving: transmit only aggregated information from hospital (sufficient statistics)  Advanced data compression, on the data partitioning  Query Language: SQL + UDFs (in Python)
  • 14. Query Federation Decompose query into local and global parts 1 N id m-name m-valueid m-name m-value Local queries Local queries Partial aggregated results Run local queries Run local queries “count, avg, std” m-name N avg std m-name Σx Σx2 N Σx,Σx2,N Σx,Σx2,N Partial aggregated results m-name Σx Σx2 N L:“Σx, Σx2, N” G:“N, avg, std” Run global queries N, avg, std
  • 15. • Distributed elastic execution – Parallel aggregations, unions, and joins – Resources are reserved dynamically • Iterative dataflow execution – Support machine learning algorithms • Novel query optimization techniques – SQL with User Defined Functions – Arbitrary user code with unknown properties – Privacy-aware query optimization
  • 16. • Time and money • 2-dimensional optimization  Quantum: 1 hour • Simple map-reduce flow – A: 1 hour B: 10 minutes C: 1 hour Schedule Time (hours) Money (resource hours) Winner One host for all ops 18.60 19 5x cheaper Different host per op 2.16 102 9x faster
  • 17. • Optimal dataflow scheduling • Skyline of all Pareto optimal plans Time Money
  • 18. Data provision Layer : Extract, Transform, Load (ETL) , Anonymization & pre-processing of existing resources EXAREME Middleware Layer: Distributed execution of complex dataflows & distributed querying Engine Federated data Layer & (open) research data infrastructures: Semantic Data modelling, Provenance & Integration Layer: Multi-modal, vertical integrated, distributed bio medical data Biomedical Info Registries & Metadata Simulation Models Imaging , Video Streaming Data Un/Semi/Structured Biomedical Data Legacy Data Simulation Models Digital Libraries (PubMed etc) Ontologies (UMLS, GO..) Clinician knowledge KDD Results Upper level declarative language and extensible UDFs Distributed execution on clouds and ad-hoc clusters Distributed Query Engine Ontology Based Data Access Data Processing • Distribution, Federation, Parallelism • EXAREME Data Infrastructures • ESFRI Infrastructures • ELIXIR • E-Infrastructures • OpenAIRE Application Layer: Data (pre) processing and knowledge discovery platform MADRefine module Data Preprocessing & Transformation Curation & Validation AITION clustering & general KDD SoA Machine Learning Algorithms Latent Variable & Topic Modelling AITION simulation Graphical Probabilistic modelling for Statistical simulation Data Analytics • Cleaning & curation • MADRefine • Modeling, Mining • AITION
  • 19. Data Mining Disease signatures Patient grouping & similarity Raw data from biomarker based personalized acquisition Personalized Model Guided Medicine For a particular patient Unknown / missing data Predict value of missing variable Variable dependencies & causality Simulation Models Create Statistical Simulation Models Individualized diagnosis, prognosis & treatment plan Model & VerificationKnowledge Discovery Reasoning & decision support Data Preprocessing Curation & Validation Transformed & Validated Data Domain knowledge & assumptions Clinical workflows BOTTOM-UP TOP-DOWN Big Data Analytics • Capture • multi source • multi modal • multi system Management • Data provenance • Sanitization (Anonymization) • Process • aggregate • distributed Analysis • Privacy preserving • Algorithms • Mechanisms Modeling • Personalized • De-identified Practice • Ethics • Privacy
  • 20. SEX AgeOnSet ILAR JntActDis GlbActDis DisDur JntLOM GenEval CHAQ ESRCRPANA MEFNIL2RAPoznanski NSAID STEROID DMARD BIOLOGIC JADI JntLOMDiff CHAQDiff ESRDiff CRPDiff JntActDisDiffGlbActDisDiff GenEvalDiff BOXValidatedOut Adapted Sharp/ van der Heijde Score Out JADIOut Extended BOX Predictors Medication Outcome demographics imaging genetics clinical lab Synovial volume OTHER
  • 21. Disease signatures Patient grouping & similarity Variable dependencies & causality Simulation Models Individualized diagnosis, prognosis & treatment plan Data Mining Personalized Model Guided Medicine For a particular patient Unknown / missing data Predict value of missing variable Create Statistical Simulation Models Model & VerificationKnowledge Discovery Reasoning & decision support Domain knowledge & assumptions Clinical workflowsRaw data from biomarker based personalized acquisition Data Preprocessing Curation & Validation Transformed & Validated Data
  • 22.  Extensible validation and data transformation engine  Ιnteractive and efficient WEB-Based interface  Data cleaning: ◦ Typographical error detection (numeric & alphanumeric) ◦ Data cleaning rules: (functional dependencies, conditional funct. dependencies, denial constraints) ◦ New/derived columns (discretization, computation of medical scores) ◦ Data visualisation (barcharts, piecharts, scatterplots, linecharts, etc.)  End-to-end data analysis workflow support (rerun experiments, reproduce results)
  • 23.
  • 24. Variable dependencies & causality Simulation Models Individualized diagnosis, prognosis & treatment plan Transformed & Validated Data Personalized Model Guided Medicine For a particular patient Unknown / missing data Predict value of missing variable Create Statistical Simulation Models Model & Verification Reasoning & decision support Data Preprocessing Curation & Validation Domain knowledge & assumptions Clinical workflows Data Mining Raw data from biomarker based personalized acquisition Knowledge Discovery Disease signatures Patient grouping & similarity
  • 25.  Disease signatures: Latent factors (patterns) that characterize disease ◦ Distribution of most relevant variables for disease (e.g., biomarkers) ◦ Multiple variables per signature, signatures per disease  Patient Cluster: Homogeneous patient group with common characteristics  Patient Similarity: Patients “like” me or mine (patient or clinician role) ◦ “like” = according to different criteria (e.g., allocation on disease signatures)
  • 26.
  • 27. Similarity & Graph clustering Topics & allocations Modelling
  • 28. Disease signatures Patient grouping & similarity Individualized diagnosis, prognosis & treatment plan Transformed & Validated Data Personalized Model Guided Medicine For a particular patient Unknown / missing data Predict value of missing variable Reasoning & decision support Clinical workflows Data Mining Raw data from biomarker based personalized acquisition Knowledge Discovery Data Preprocessing Curation & Validation Create Statistical Simulation Models Model & Verification Domain knowledge & assumptions Variable dependencies & causality Simulation Models
  • 29.  Bayesian Net: Directed Acyclic Graph + Conditional Prob Distributions ◦ Features (Nodes) & Dependencies (Edges) ◦ Compact representation of joint data distribution Patient X1 X2 X3 X4 X5 X6 X7 X8 1 Y N N Y Y Y N Y : 1000 N N Y N N Y N N X1 X4 X5 X7 X8 Smoking Lung cancer Chronic bronchitis X2 Genetic Factor X6 X3 Allergy + Find: Given: + Domain Knowledge
  • 30. Final DAG (based on MCMC&DP, threshold=0.5) Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Modelling Dependency Analysis Inference
  • 31. Disease signatures Patient grouping & similarity Variable dependencies & causality Simulation Models Transformed & Validated Data Data Mining Raw data from biomarker based personalized acquisition Knowledge Discovery Data Preprocessing Curation & Validation Create Statistical Simulation Models Model & Verification Domain knowledge & assumptions Personalized Model Guided Medicine For a particular patient Unknown / missing data Predict value of missing variable Reasoning & decision support Clinical workflows Individualized diagnosis, prognosis & treatment plan
  • 32.
  • 33. Increased RVD is related with worse values in every MR aspect (TVPRegurg, PSMotion, RedRV, AV_Block, TriRegurg)
  • 34. Brussels – 6-7 May 2014
  • 36. Raw Personal Data Raw Anonymised Summary Anonymised Private Controlled Access Public Bioinformatics services for All Users Doctors (and Patients?) Researchers
  • 37.  Obtaining consent not straightforward  Anonymisation: necessary, rather complicated, ensuring neither privacy nor data value  “Blending in a crowd” and k-anonymity: privacy is property not output of sanitization  How do we define privacy?
  • 38.  data publishing: “Sanitization” (Anonymisation) hiding individual info (k-anonymity) but preserving (sufficient) aggregated statistics  data mining: Specific algorithms (usually operating in two phases) for classification, clustering, association rules, …  mechanisms: Differential Privacy & Crowd-Blending Privacy perturb data or add noise ensuring ε-indistinguishable output distribution  encryption: Fully Homomorphic Encryption (FHE) for computation and query to run over encrypted data  decentralization: Blockchain to Protect Personal Data - decentralized personal data management, users own and control their data
  • 39.  Big data is not only about size  Data is distributed, data is heterogeneous  Processing goes to data, not data to processing  ICT (Data management & processing) advances ◦ Data compression ◦ Federated / privacy-preserving processing ◦ Scalable parallel / distributed processing ◦ Data curation (otherwise: garbage in, garbage out) ◦ Text and data analytics
  • 40.  http://www.madgik.di.uoa.gr  https://www.humanbrainproject.eu  http://www.md-paedigree.eu/  http://www.openaire.eu  http://www.optique-project.eu