SlideShare uma empresa Scribd logo
1 de 21
Amarnath Gupta
Univ. of California San Diego
If There is a Data Deluge, Where are the Data?
 Assembled the largest searchable
collation of neuroscience data on the
web
 The largest catalog of biomedical
resources (data, tools, materials,
services) available
 The largest ontology for
neuroscience
 NIF search portal: simultaneous
search over data, NIF catalog and
biomedical literature
 Neurolex Wiki: a community wiki
serving neuroscience concepts
 A unique technology platform
 Cross-neuroscience analytics
 A reservoir of cross-disciplinary
biomedical data expertise
FormalKnowledge/Ontologies
Extracted/AnalyzedFactCollections
Least Shared
Most Shared
Useful for
Deep (Re-) Analysis
Useful for
Comprehension,
Discovery
Uneven distribution of data volume, velocity,
variability, location and availability
Raw Data (in files) and Data Sets (in directories)
LOCAL OFFLINE/ONLINE STORAGE, IRs, PRs?
Data Collections and Databases
SPECIALIZED & GENERAL PRs, DBs
Processed Data
Products,
Processes
DBs, WEB-PRs, PUBS
Papers
w,w/o Data
PUBs
Aggregates and Resource Hubs
NIF is aware of 761 repositories
 47/50 major preclinical published
cancer studies could not be replicated
 “The scientific community
assumes that the claims in a
preclinical study can be taken at
face value-that although there
might be some errors in detail, the
main message of the paper can be
relied on and the data will, for the
most part, stand the test of time.
Unfortunately, this is not always
the case.”
 Getting data out sooner in a
form where they can be exposed
to many eyes and many analyses,
and easily compared, may allow
us to expose errors and develop
better metrics to evaluate the
validity of dataBegley and Ellis, 29 MARCH 2012 | VOL 483 |
NATURE | 531
 “There are no guidelines that require
all data sets to be reported in a
paper; often, original data are
removed during the peer review and
publication process.”
 “There must be more opportunities
to present negative data.”
 Significant cross-linking between
original papers, supporting/refuting
papers/data
Courtesy: Maryann Martone
Hello All,
Thank you for the people who are taking a look at
the data in tera15 :-)
There are a whole lot of data (about +8TB) that can
be looked at and/or removed.
If you had assistants, students, or volunteers who
assisted you in processing data, please locate those
folders and remove any duplicate or unused data.
This will help EVERYONE have space to process
new data.
Any old data that has been sitting in tera15
untouched in more than 4 years will be removed to
a different area for deletion.
Please take a look carefully!
 For every neuroscientist
 For every experiment he/she runs
 For every data set that leads to positive or negative results
 Store the data in some shared or on-demand repository
 Annotate the data with experimental and other contextual
information
 Perform some analysis and contribute your analysis method to the
repository where the data is being stored
 For every analysis result
 Keep the complete processing provenance of the result
 Point back to the data set or data element that contribute to the
analysis, specifically mark positively and negatively contributing data
 If an error is pointed out in some result,
 Provide an explanation of the error
 Create a pointer back to the part of the publication and to that part
of the data set or data element that produced the error
 For every publication
 For every result reported
 Create a pointer back to all data used in that section
 For every experimental object (e.g., reagents, or auxiliary data from
another group) used,
 Create an appropriate, if needed time-stamped, pointer to the correct
version of the data
 For every repository/database … that holds the data
 Ensure rapid availability
 Allow scientists to download or perform in-place analyses
 Adhere to appropriate data standards
 Keep consistency of all data + references
 Should permit multiple simultaneous analsyses by different
users
 Should allow searching/browsing/querying all possible metadata
Diverse distributed infrastructures consisting of individual researchers in different
institutions, institutional repositories, public data centers, publishers, annotators and
aggregators, bioinformaticians …
 Scalable, Elastic Storage and Computation
 Service Expectations
 Scalable Search and Query across structured/semi-
structured/unstructured data
 Facts – What neurons do Purkinje cells project to?
 Resources – What are recent data sets on biomarkers for SMA?
 Analytical Results -- What animal models have similar phenotypes to
Parkinson’s disease?
 Landscape Surveys – Who has what data holdings on neurodegenerative
diseases?
 Active Analyses
 Combining these data and mine, compute how the connectivity of the
human brain differ from non-human primates
 Perform GO-enrichment analysis on all genes upregulated in Alzheimer’s on
all available data and compare with my results
 Tracebacks
 What data and processing have been used to reach this result in this paper?
Which publication refuted the claims in this paper and how?
 If all neuroscientists want to comply with this data
sharing today, will the current infrastructure be able
to support it?
 Is enough attention being paid to an overarching
architecture and interoperation protocol for data
sharing?
 Is today’s technology properly harnessed to create a holistic
data sharing infrastructure?
 What would motivate neuroscientists and other
players to really play their parts in data sharing?
 Should there be a “monitoring scheme” to ensure
proper data sharing practices are actually happening?
 The Data-Sharing Ecosystem is a distributed system
that can be viewed as an operating system where
 Each object has a set of unique structured ids (e.g., extended
DOIs) that identify
 any data set, data object, or any interval of a data object
 The semantic category of the data element
 Any human/software agent
 Any parameter set of a software invocation
 A log is maintained and transmitted for each activity by any
agent on any data element
 Submission, transfer to repository, pickup by aggregator, creating
derived product, being crawled by search services, …
 These logs can be accessed by a central monitoring system
covering the ecosystem using a Twitter Storm-like infrastructure
Think of Facebook maintaining a log of the different actions such as being present at the
system, sending and accepting friend requests, posting comments and photos, starting and
ending chat sessions, …
 Update activities on data elements from Data
Centers and Repositories
 Resource References from literature and web sites,
including opinion cites like blogs and forums
 Citation categories from automated/human-driven
annotation systems like DataCite or DOMEO
 Provenance chains from workflow systems like
Kepler
 Data derivation changes from rule-based metadata
management systems like iRODS
 Frequency and regularity of data creation vis-à-vis
submission to the data-sharing ecosystem
 Frequency and regularity of data usage of various
kinds
 Viewing, downloading, replication, uptake by a software, …
 Number of derived data products
 Compounding by cascades of derived data
 Cross-referencing of data and resources in
publications
 Compounding by publication data citation cascades
 Human and programmatic access to data
 Accountability Score: a measure of “good data
citizenship”
 Of People
 Increases with contribution of data and analyses
 Decays (slowly) with time
 Increases with references and citations
 Increases with supporting work by others
 Decreases with refutation
 Decreases (rapidly) with paper retraction
 Of Publications
 Increases with addition of reference-able data
 Increases with data access
 Increases with keeping updated with data updates
 Influence: A
classification and
measure of the
professional
engagement one has
in terms of data
activity
 Longer-term measure
compared to
accountability score
 Applies to all types of
players in the
ecosystem including
just users
 These measures do not hold for scientists who do
not produce data
 The measures are mostly designed for online
activities and must be modified to match the
dynamics of different scientific communities
 Parameters like decay constants
 Time-window for score revision
 Global scores should be
 supplemented by community scores where a community is
defined by ontological regions where one’s research lies
 per activity type rather than a single overall score
 This is the Big Brother for science
 This is going to create a bias against “non-
performers”
 Scientific errors will be penalized more than
necessary
 The algorithms can be manipulated to the
advantage of some people over others
 Smaller individuals/organizations will be penalized
with respect to better-funded, higher-throughput
organization
 This will be hard to implement due to oppositions
from different groups and institutions
 My speculations
 If the community decides that it needs data sharing, it will naturally
gravitate toward some degree of judgment of those who don’t
comply
 Technology frameworks similar to what we discussed will be adopted
within individual e-infrastructures
 As more data become available and data sharing efforts become
successful, third-party watchers like credit bureaus that monitor
scientist’s products with respect to data will emerge
 Such scores would be used for community perception and in-kind
incentives earlier than their adoption for formal evaluations
 The real question is “How do we promote data
sharing?”
 Creating infrastructural elements and reusing
today’s (tomorrow’s) technological capabilities is not
enough
 We need a more holistic approach that factors in the
human component
 Using social activity analysis as a starting point we
should be able to build a monitoring-cum-incentivizing
scheme for data sharing

Mais conteúdo relacionado

Mais procurados

Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
 
Data management (1)
Data management (1)Data management (1)
Data management (1)SM Lalon
 
BIOMAG2018 - Denis Engemann - MNE-HCP
BIOMAG2018 - Denis Engemann - MNE-HCPBIOMAG2018 - Denis Engemann - MNE-HCP
BIOMAG2018 - Denis Engemann - MNE-HCPRobert Oostenveld
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
 
Donders Repository - removing barriers for management and sharing of research...
Donders Repository - removing barriers for management and sharing of research...Donders Repository - removing barriers for management and sharing of research...
Donders Repository - removing barriers for management and sharing of research...Robert Oostenveld
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsBeth Plale
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for BiopharmaTom Plasterer
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
CuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEGCuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEGRobert Oostenveld
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Robert Oostenveld
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-cziPaul Groth
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
 
Donders neuroimage toolkit - open science and good practices
Donders neuroimage toolkit -  open science and good practicesDonders neuroimage toolkit -  open science and good practices
Donders neuroimage toolkit - open science and good practicesRobert Oostenveld
 

Mais procurados (20)

Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
Data management (1)
Data management (1)Data management (1)
Data management (1)
 
BIOMAG2018 - Denis Engemann - MNE-HCP
BIOMAG2018 - Denis Engemann - MNE-HCPBIOMAG2018 - Denis Engemann - MNE-HCP
BIOMAG2018 - Denis Engemann - MNE-HCP
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
Donders Repository - removing barriers for management and sharing of research...
Donders Repository - removing barriers for management and sharing of research...Donders Repository - removing barriers for management and sharing of research...
Donders Repository - removing barriers for management and sharing of research...
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
CuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEGCuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEG
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
Donders neuroimage toolkit - open science and good practices
Donders neuroimage toolkit -  open science and good practicesDonders neuroimage toolkit -  open science and good practices
Donders neuroimage toolkit - open science and good practices
 

Destaque

Salida de campo cartagena, ararca, santa diapositivas
Salida de campo cartagena, ararca, santa diapositivasSalida de campo cartagena, ararca, santa diapositivas
Salida de campo cartagena, ararca, santa diapositivasAngie_Sanchez
 
Elizabeth warren
Elizabeth warrenElizabeth warren
Elizabeth warrenbankernews
 
Presentación de Pablo Caputi, Catedrático de Agronegocios de Universidad ORT ...
Presentación de Pablo Caputi, Catedrático de Agronegocios de Universidad ORT ...Presentación de Pablo Caputi, Catedrático de Agronegocios de Universidad ORT ...
Presentación de Pablo Caputi, Catedrático de Agronegocios de Universidad ORT ...facs_ort
 
APS1015 Class 6: Validation of Market-Based Solutions
APS1015 Class 6: Validation of Market-Based SolutionsAPS1015 Class 6: Validation of Market-Based Solutions
APS1015 Class 6: Validation of Market-Based SolutionsSocial Entrepreneurship
 
Speak Up 2012 Digital Learners Student National Report
Speak Up 2012 Digital Learners Student National ReportSpeak Up 2012 Digital Learners Student National Report
Speak Up 2012 Digital Learners Student National ReportJulie Evans
 
Love Learning - A content marketing approach to learning at work
Love Learning - A content marketing approach to learning at workLove Learning - A content marketing approach to learning at work
Love Learning - A content marketing approach to learning at workSam Burrough
 
Coastline Challenges Girls Program _ NEW
Coastline Challenges Girls Program _ NEWCoastline Challenges Girls Program _ NEW
Coastline Challenges Girls Program _ NEWCoastlineChallenges
 
Perfil de egreso de la educación básica
Perfil de egreso de la educación básicaPerfil de egreso de la educación básica
Perfil de egreso de la educación básicaMarco Colli
 
Coaching-Contribuições para o desenvolvimento de competências
Coaching-Contribuições para o desenvolvimento de competênciasCoaching-Contribuições para o desenvolvimento de competências
Coaching-Contribuições para o desenvolvimento de competênciasMauricio Sampaio
 
ROBOT HAZARD AND PREVENTION
ROBOT HAZARD AND PREVENTIONROBOT HAZARD AND PREVENTION
ROBOT HAZARD AND PREVENTIONCt Nurul Jannah
 

Destaque (20)

Salida de campo cartagena, ararca, santa diapositivas
Salida de campo cartagena, ararca, santa diapositivasSalida de campo cartagena, ararca, santa diapositivas
Salida de campo cartagena, ararca, santa diapositivas
 
Grupo puentes
Grupo puentesGrupo puentes
Grupo puentes
 
Using Health Literacy Principles for Virtual World Exhibits
Using Health Literacy Principles for Virtual World ExhibitsUsing Health Literacy Principles for Virtual World Exhibits
Using Health Literacy Principles for Virtual World Exhibits
 
Elizabeth warren
Elizabeth warrenElizabeth warren
Elizabeth warren
 
Ídolos del Rock
Ídolos del RockÍdolos del Rock
Ídolos del Rock
 
How tos of UX
How tos of UXHow tos of UX
How tos of UX
 
Presentación de Pablo Caputi, Catedrático de Agronegocios de Universidad ORT ...
Presentación de Pablo Caputi, Catedrático de Agronegocios de Universidad ORT ...Presentación de Pablo Caputi, Catedrático de Agronegocios de Universidad ORT ...
Presentación de Pablo Caputi, Catedrático de Agronegocios de Universidad ORT ...
 
Crowdsourcing
CrowdsourcingCrowdsourcing
Crowdsourcing
 
APS1015 Class 6: Validation of Market-Based Solutions
APS1015 Class 6: Validation of Market-Based SolutionsAPS1015 Class 6: Validation of Market-Based Solutions
APS1015 Class 6: Validation of Market-Based Solutions
 
Speak Up 2012 Digital Learners Student National Report
Speak Up 2012 Digital Learners Student National ReportSpeak Up 2012 Digital Learners Student National Report
Speak Up 2012 Digital Learners Student National Report
 
Love Learning - A content marketing approach to learning at work
Love Learning - A content marketing approach to learning at workLove Learning - A content marketing approach to learning at work
Love Learning - A content marketing approach to learning at work
 
Coastline Challenges Girls Program _ NEW
Coastline Challenges Girls Program _ NEWCoastline Challenges Girls Program _ NEW
Coastline Challenges Girls Program _ NEW
 
Quiero ser libre cdf
Quiero ser libre cdfQuiero ser libre cdf
Quiero ser libre cdf
 
Perfil de egreso de la educación básica
Perfil de egreso de la educación básicaPerfil de egreso de la educación básica
Perfil de egreso de la educación básica
 
Quad copter
Quad copterQuad copter
Quad copter
 
Coaching-Contribuições para o desenvolvimento de competências
Coaching-Contribuições para o desenvolvimento de competênciasCoaching-Contribuições para o desenvolvimento de competências
Coaching-Contribuições para o desenvolvimento de competências
 
ROBOT HAZARD AND PREVENTION
ROBOT HAZARD AND PREVENTIONROBOT HAZARD AND PREVENTION
ROBOT HAZARD AND PREVENTION
 
Guillermo mesa actividad1_mapa_c
Guillermo mesa actividad1_mapa_cGuillermo mesa actividad1_mapa_c
Guillermo mesa actividad1_mapa_c
 
INCF 2013 - Uniform Resource Layer
INCF 2013 - Uniform Resource LayerINCF 2013 - Uniform Resource Layer
INCF 2013 - Uniform Resource Layer
 
Lls
LlsLls
Lls
 

Semelhante a In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate

Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017Susanna-Assunta Sansone
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation InfrastructureMicah Altman
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxfPhilippe Rocca-Serra
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 
Sansone bio sharing introduction
Sansone bio sharing introductionSansone bio sharing introduction
Sansone bio sharing introductionMIBBI Checklists
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone
 
Management of Data Collections
Management of Data CollectionsManagement of Data Collections
Management of Data Collectionsabedejesus
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017Mitch Miller
 
Inteligent Catalogue Final
Inteligent Catalogue FinalInteligent Catalogue Final
Inteligent Catalogue Finalguestcaef1d
 

Semelhante a In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate (20)

Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Jonathan Breeze, Symplectic
Jonathan Breeze, SymplecticJonathan Breeze, Symplectic
Jonathan Breeze, Symplectic
 
BLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, SymplecticBLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, Symplectic
 
Data management plans
Data management plansData management plans
Data management plans
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 
Sansone bio sharing introduction
Sansone bio sharing introductionSansone bio sharing introduction
Sansone bio sharing introduction
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
داده های پژوهشی
داده های پژوهشیداده های پژوهشی
داده های پژوهشی
 
Management of Data Collections
Management of Data CollectionsManagement of Data Collections
Management of Data Collections
 
Va sla nov 15 final
Va sla nov 15 finalVa sla nov 15 final
Va sla nov 15 final
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
 
Inteligent Catalogue Final
Inteligent Catalogue FinalInteligent Catalogue Final
Inteligent Catalogue Final
 
The FAIR Principles and FAIRsharing
The FAIR Principles and FAIRsharingThe FAIR Principles and FAIRsharing
The FAIR Principles and FAIRsharing
 

Mais de Neuroscience Information Framework

Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neuroscience Information Framework
 
The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...Neuroscience Information Framework
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework Neuroscience Information Framework
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Neuroscience Information Framework
 

Mais de Neuroscience Information Framework (20)

Why should my institution support RRIDs?
Why should my institution support RRIDs?Why should my institution support RRIDs?
Why should my institution support RRIDs?
 
Why should Journals ask fo RRIDs?
Why should Journals ask fo RRIDs?Why should Journals ask fo RRIDs?
Why should Journals ask fo RRIDs?
 
Funders and RRIDs
Funders and RRIDsFunders and RRIDs
Funders and RRIDs
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
 
The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
NIF services overview
NIF services overviewNIF services overview
NIF services overview
 
NIF Lexical Overview
NIF Lexical OverviewNIF Lexical Overview
NIF Lexical Overview
 
NIF Services
NIF ServicesNIF Services
NIF Services
 
NIF Data Registration
NIF Data RegistrationNIF Data Registration
NIF Data Registration
 
NIF Data Ingest
NIF Data IngestNIF Data Ingest
NIF Data Ingest
 
NIF Data Federation
NIF Data FederationNIF Data Federation
NIF Data Federation
 
NIF Overview
NIF Overview NIF Overview
NIF Overview
 
A Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource LandscapeA Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource Landscape
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework
 
NIF: A vision for a uniform resource layer
NIF: A vision for a uniform resource layerNIF: A vision for a uniform resource layer
NIF: A vision for a uniform resource layer
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
 

Último

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Último (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate

  • 1. Amarnath Gupta Univ. of California San Diego If There is a Data Deluge, Where are the Data?
  • 2.  Assembled the largest searchable collation of neuroscience data on the web  The largest catalog of biomedical resources (data, tools, materials, services) available  The largest ontology for neuroscience  NIF search portal: simultaneous search over data, NIF catalog and biomedical literature  Neurolex Wiki: a community wiki serving neuroscience concepts  A unique technology platform  Cross-neuroscience analytics  A reservoir of cross-disciplinary biomedical data expertise
  • 3. FormalKnowledge/Ontologies Extracted/AnalyzedFactCollections Least Shared Most Shared Useful for Deep (Re-) Analysis Useful for Comprehension, Discovery Uneven distribution of data volume, velocity, variability, location and availability Raw Data (in files) and Data Sets (in directories) LOCAL OFFLINE/ONLINE STORAGE, IRs, PRs? Data Collections and Databases SPECIALIZED & GENERAL PRs, DBs Processed Data Products, Processes DBs, WEB-PRs, PUBS Papers w,w/o Data PUBs Aggregates and Resource Hubs NIF is aware of 761 repositories
  • 4.  47/50 major preclinical published cancer studies could not be replicated  “The scientific community assumes that the claims in a preclinical study can be taken at face value-that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.”  Getting data out sooner in a form where they can be exposed to many eyes and many analyses, and easily compared, may allow us to expose errors and develop better metrics to evaluate the validity of dataBegley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531  “There are no guidelines that require all data sets to be reported in a paper; often, original data are removed during the peer review and publication process.”  “There must be more opportunities to present negative data.”  Significant cross-linking between original papers, supporting/refuting papers/data Courtesy: Maryann Martone
  • 5. Hello All, Thank you for the people who are taking a look at the data in tera15 :-) There are a whole lot of data (about +8TB) that can be looked at and/or removed. If you had assistants, students, or volunteers who assisted you in processing data, please locate those folders and remove any duplicate or unused data. This will help EVERYONE have space to process new data. Any old data that has been sitting in tera15 untouched in more than 4 years will be removed to a different area for deletion. Please take a look carefully!
  • 6.  For every neuroscientist  For every experiment he/she runs  For every data set that leads to positive or negative results  Store the data in some shared or on-demand repository  Annotate the data with experimental and other contextual information  Perform some analysis and contribute your analysis method to the repository where the data is being stored  For every analysis result  Keep the complete processing provenance of the result  Point back to the data set or data element that contribute to the analysis, specifically mark positively and negatively contributing data  If an error is pointed out in some result,  Provide an explanation of the error  Create a pointer back to the part of the publication and to that part of the data set or data element that produced the error
  • 7.  For every publication  For every result reported  Create a pointer back to all data used in that section  For every experimental object (e.g., reagents, or auxiliary data from another group) used,  Create an appropriate, if needed time-stamped, pointer to the correct version of the data  For every repository/database … that holds the data  Ensure rapid availability  Allow scientists to download or perform in-place analyses  Adhere to appropriate data standards  Keep consistency of all data + references  Should permit multiple simultaneous analsyses by different users  Should allow searching/browsing/querying all possible metadata Diverse distributed infrastructures consisting of individual researchers in different institutions, institutional repositories, public data centers, publishers, annotators and aggregators, bioinformaticians …
  • 8.  Scalable, Elastic Storage and Computation  Service Expectations  Scalable Search and Query across structured/semi- structured/unstructured data  Facts – What neurons do Purkinje cells project to?  Resources – What are recent data sets on biomarkers for SMA?  Analytical Results -- What animal models have similar phenotypes to Parkinson’s disease?  Landscape Surveys – Who has what data holdings on neurodegenerative diseases?  Active Analyses  Combining these data and mine, compute how the connectivity of the human brain differ from non-human primates  Perform GO-enrichment analysis on all genes upregulated in Alzheimer’s on all available data and compare with my results  Tracebacks  What data and processing have been used to reach this result in this paper? Which publication refuted the claims in this paper and how?
  • 9.
  • 10.  If all neuroscientists want to comply with this data sharing today, will the current infrastructure be able to support it?  Is enough attention being paid to an overarching architecture and interoperation protocol for data sharing?  Is today’s technology properly harnessed to create a holistic data sharing infrastructure?  What would motivate neuroscientists and other players to really play their parts in data sharing?  Should there be a “monitoring scheme” to ensure proper data sharing practices are actually happening?
  • 11.  The Data-Sharing Ecosystem is a distributed system that can be viewed as an operating system where  Each object has a set of unique structured ids (e.g., extended DOIs) that identify  any data set, data object, or any interval of a data object  The semantic category of the data element  Any human/software agent  Any parameter set of a software invocation  A log is maintained and transmitted for each activity by any agent on any data element  Submission, transfer to repository, pickup by aggregator, creating derived product, being crawled by search services, …  These logs can be accessed by a central monitoring system covering the ecosystem using a Twitter Storm-like infrastructure Think of Facebook maintaining a log of the different actions such as being present at the system, sending and accepting friend requests, posting comments and photos, starting and ending chat sessions, …
  • 12.
  • 13.  Update activities on data elements from Data Centers and Repositories  Resource References from literature and web sites, including opinion cites like blogs and forums  Citation categories from automated/human-driven annotation systems like DataCite or DOMEO  Provenance chains from workflow systems like Kepler  Data derivation changes from rule-based metadata management systems like iRODS
  • 14.  Frequency and regularity of data creation vis-à-vis submission to the data-sharing ecosystem  Frequency and regularity of data usage of various kinds  Viewing, downloading, replication, uptake by a software, …  Number of derived data products  Compounding by cascades of derived data  Cross-referencing of data and resources in publications  Compounding by publication data citation cascades  Human and programmatic access to data
  • 15.  Accountability Score: a measure of “good data citizenship”  Of People  Increases with contribution of data and analyses  Decays (slowly) with time  Increases with references and citations  Increases with supporting work by others  Decreases with refutation  Decreases (rapidly) with paper retraction  Of Publications  Increases with addition of reference-able data  Increases with data access  Increases with keeping updated with data updates
  • 16.  Influence: A classification and measure of the professional engagement one has in terms of data activity  Longer-term measure compared to accountability score  Applies to all types of players in the ecosystem including just users
  • 17.
  • 18.  These measures do not hold for scientists who do not produce data  The measures are mostly designed for online activities and must be modified to match the dynamics of different scientific communities  Parameters like decay constants  Time-window for score revision  Global scores should be  supplemented by community scores where a community is defined by ontological regions where one’s research lies  per activity type rather than a single overall score
  • 19.  This is the Big Brother for science  This is going to create a bias against “non- performers”  Scientific errors will be penalized more than necessary  The algorithms can be manipulated to the advantage of some people over others  Smaller individuals/organizations will be penalized with respect to better-funded, higher-throughput organization  This will be hard to implement due to oppositions from different groups and institutions
  • 20.  My speculations  If the community decides that it needs data sharing, it will naturally gravitate toward some degree of judgment of those who don’t comply  Technology frameworks similar to what we discussed will be adopted within individual e-infrastructures  As more data become available and data sharing efforts become successful, third-party watchers like credit bureaus that monitor scientist’s products with respect to data will emerge  Such scores would be used for community perception and in-kind incentives earlier than their adoption for formal evaluations
  • 21.  The real question is “How do we promote data sharing?”  Creating infrastructural elements and reusing today’s (tomorrow’s) technological capabilities is not enough  We need a more holistic approach that factors in the human component  Using social activity analysis as a starting point we should be able to build a monitoring-cum-incentivizing scheme for data sharing