SlideShare uma empresa Scribd logo
1 de 30
What Does Responsible Data
Science Mean?
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
08/09/19 Data Science for the Public Good
@pebourne
Thanks to Claudia Scholz for some slides
Context – Our new School of Data Science is intent on practicing
responsible data science as our hallmark
From our draft strategic plan –
The practice of data science
through education, research and
service whereby all aspects of these
endeavors consider the ethical,
legal and policy aspects of all we
do such that the reputation and
integrity of the SDS are never in
question.
08/09/19 Data Science for the Public Good
Opportunity – In over 40+ years in academia I have never seen
anything as transformative as what is happening today
08/09/19 Data Science for the Public Good
Data Science Initiatives Nationwide
EffectCause
https://surgery.duke.edu/divisions/trauma-and-critical-care-surgery
The story of the trauma surgeon
https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)
https://www.microsoft.com/en-us/research/wp-
content/uploads/2009/10/Fourth_Paradigm.pdf
https://twitter.com/aip_publishing/status/856825353645559808
08/09/19 Data Science for the Public Good
Of course this was all predicted
by smart people ..
What is happening now is across all verticals – but
there is a precedent we can learn from …
08/09/19 Data Science for the Public Good
https://avora.com/blog/rise-of-the-data-warehouse/
https://individualizedmedicineblog.mayoclinic.org/2013/04/16/c
elebrating-10th-anniversary-of-human-genome-project/
https://science.sciencemag.org/content/291/5507/1304
What is happening now is across all verticals – but
there is a precedent we can learn from …
08/09/19 Data Science for the Public Good
https://avora.com/blog/rise-of-the-data-warehouse/
DNA Sequence Data Since the Human Genome
http://synbio.info/display/synbio/Genetic+data+likely+to+become+the+biggest+big+data+in+2025
What can we learn from what has come before….
Lesson 1
Responsible data science means recognizing that
exponential growth of data leads to unexpected
consequences
08/09/19 Data Science for the Public Good
08/09/19 Data Science for the Public Good
https://www.montana.edu/news/17886/public-forum-exploring-the-science-and-ethics-of-gene-editing-
set-for-aug-7
http://theconversation.com/five-things-to-consider-before-ordering-an-online-dna-test-92504
https://www.cnbc.com/2019/05/02/ubiome-what-really-happened-at-health-start-up-raided-by-fbi.html
Accuracy
Do you want to know?
You can do it at home
What is ethical in the research lab is not
when commercialized
The 6D’s provides one description of
the consequences..
08/09/19 Data Science for the Public Good
Lesson 1
Exponential growth of data leads to unexpected
consequences
Responsible data science anticipates or at least
prepares to deal with such consequences ahead of
time
08/09/19 Data Science for the Public Good
Lesson 2 – Its all too easy to forget the negative
consequences when …
08/09/19 Data Science for the Public Good [Courtesy Eric Green, NHGRI]
Lesson 3 – Policies and laws lag…
08/09/19 Data Science for the Public Good
http://www.navajo-nsn.gov/News%20Releases/OPVP/2019/may/FOR%20IMMEDIATE%20RELEASE%20-
%20Navajo%20Nation%20signs%20data%20sharing%20agreement%20to%20advance%20uranium%20exposure%20research%20efforts.pdf
Lesson 4 – Data sharing is a double edge sword…
08/09/19 Data Science for the Public Good
On the plus side data sharing can save lives …
Use case: Diffuse Intrinsic Pontine Gliomas (DIPG)
• Occur 1:100,000
individuals
• Peak incidence 6-8 years
of age
• Median survival 9-12
months
• Surgery is not an option
• Chemotherapy ineffective
and radiotherapy only
transitive
[From Adam Resnick]
08/09/19 Data Science for the Public Good
Timeline of genomic studies in DIPG
• 2012 Landmark studies identify
histone mutations as recurrent
driver mutations in DIPG
• The data were not shared for 3
years
• In 2015 in largely the same
datasets, others identify ACVR1
mutations as a secondary, co-
occurring mutation
• ACVR1 is targetable by a drug
• 3 years = 180 lives From Adam Resnick
08/09/19 Data Science for the Public Good
NIH Strategic Plan for Data
• Support a Highly Efficient and Effective
Biomedical Research Data
Infrastructure
• Promote Modernization of the Data-
Resources Ecosystem
• Support the Development and
Dissemination of Advanced Data
Management, Analytics, and
Visualization Tools
• Enhance Workforce Development for
Biomedical Data Science
• Enact Appropriate Policies to Promote
Stewardship and Sustainability
08/09/19 Data Science for the Public Good
https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
Lesson 4 – Data sharing is a double edge sword…
08/09/19 Data Science for the Public Good
STATE HEALTH SURVEILLANCE: NEWBORN SCREENING CASE STUDY
From Bonnie R and Bernheim R, Public Health Law, Policy and
Ethics, Foundation Press (2015)
Category Variables
Infant Patient ID, Birth date, birth time,
ethnicity, weight in grams, feeding
type, transfusion status, zip code
of mother
Sample Sample ID, collection date,
received date, disposition code for
sample (satisfactory/not
satisfactory)
Submitter Submitter ID, submitter name
Test 36 different tests
Diagnosis Diagnosis, diagnosis date, sample
ID
The final dataset contained more than 1.6 million sample
records and nearly 29,000 diagnosis records
08/09/19 Data Science for the Public Good
Zip Code Level Sickle Cell Prevalence
08/09/19 Data Science for the Public Good
Given these lessons – there are many others – from
just one vertical what should we be doing as a
School of Data Science to be responsible while
undertaking data science for the public good?
08/09/19 Data Science for the Public Good
Guiding Principles …
Be open, transparent & collaborative in all we do
• Make ourselves known - use persistent identifiers e.g., ORCID
• Use preprints to accelerate progress
• Only publish Open Access (OA)
• Recognize openness, transparency & collaboration in hiring
and P&T
• Promote institutional openness – Open Data Lab, wikimedian
in residence
• Support institutional open data governance
08/09/19 Data Science for the Public Good
Guiding Principles …
Consider the ethical consequences across the complete data
workflow
08/09/19 Data Science for the Public Good
Acquisition
Engineering
Analysis
Communication
Dissemination
Ethics
● Census, surveys
● Data mining, digitization
● Sensors, Internet of Things (IoT)
Ethical Issues:
● Mass surveillance
● Privacy, terms of service
● Data sovereignty
Data Acquisition:
Information → Data
Job titles:
● IoT engineer
● Chief privacy officer
● Survey designer
https://www.wired.com/story/all-of-us-launches/
Acquisition
Engineering
Analysis
Communication
Dissemination
Ethics
● Integration of data sources
● Data wrangling & cleaning
● Data structures
● Cloud & parallel computing
Ethical Issues:
● Intellectual property
● Consequences of integration
Data Engineering:
Data → Value
Job titles:
● Data engineer
● Information systems
engineer
Acquisition
Engineering
Analysis
Communication
Dissemination
Ethics
● Machine learning
○ supervised, unsupervised
● Models & simulations
Ethical Issues:
● Algorithmic bias
● Accountability & transparency
Data Analysis:
Data → Knowledge
Job titles:
● Data Scientist or Analyst
● Machine Learning Engineer
Acquisition
Engineering
Analysis
Communicatio
n
Dissemination
Ethics
● Visualization
● Storytelling
Ethical Issues:
● Confidentiality
● Distortion of facts
Data Communication:
Data → Insight
Job titles:
● Data Journalist
● Information Designer
● Dashboard Manager
Acquisition
Engineering
Analysis
Communication
Disseminatio
n
Ethics
● Data preservation
● Reproducibility of research
● F.A.I.R. & open
Ethical Issues:
● Cybersecurity
● Dual use
Data Dissemination:
Data → Future Use
Job titles:
● Data Steward
● Repository manager
● Open Science advocate
Take home
• The fourth paradigm is upon us and will change society
• Forming a new schools is an opportunity to do it right – we need help!
• Look to fields like genomics that have been doing data science for some
time and consider best (and worst) practices
• Responsible data science involves working by a set of guiding principles
and..
• Considering the consequences of what we do across the complete data
lifecycle
08/09/19 Data Science for the Public Good
Only then will we truly be undertaking
data science for the public good
Acknowledgements
08/09/19 Data Science for the Public Good
The BD2K Team at NIH
The 150 folks who have passed through my laboratory
https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
Thank You
peb6a@virginia.edu
08/09/19 Data Science for the Public Good

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Data Science Meets Academia - What Comes Next?
Data Science Meets Academia - What Comes Next?Data Science Meets Academia - What Comes Next?
Data Science Meets Academia - What Comes Next?
 
Moving Forward with Open Data Science - SWOT Analysis
Moving Forward with Open Data Science - SWOT AnalysisMoving Forward with Open Data Science - SWOT Analysis
Moving Forward with Open Data Science - SWOT Analysis
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 
Next Generation Preprint Service
Next Generation Preprint ServiceNext Generation Preprint Service
Next Generation Preprint Service
 
Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...
 
How to own your research communications - The importance of identity and owne...
How to own your research communications - The importance of identity and owne...How to own your research communications - The importance of identity and owne...
How to own your research communications - The importance of identity and owne...
 
Innovation, KM, and Data.gov
Innovation, KM, and Data.govInnovation, KM, and Data.gov
Innovation, KM, and Data.gov
 
Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...
Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...
Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...
 
Promoting an ethical and GDPR-compliant approach to learning analytics
Promoting an ethical and GDPR-compliant approach to learning analyticsPromoting an ethical and GDPR-compliant approach to learning analytics
Promoting an ethical and GDPR-compliant approach to learning analytics
 
MIT Biotech startups
MIT Biotech startupsMIT Biotech startups
MIT Biotech startups
 
Krystyn J. Van Vliet Advanced Manufacturing
Krystyn J. Van Vliet Advanced Manufacturing Krystyn J. Van Vliet Advanced Manufacturing
Krystyn J. Van Vliet Advanced Manufacturing
 
Open Science Policy Towards Achieving the SDGs/Muliaro Joseph Wafula
Open Science Policy Towards Achieving the SDGs/Muliaro Joseph WafulaOpen Science Policy Towards Achieving the SDGs/Muliaro Joseph Wafula
Open Science Policy Towards Achieving the SDGs/Muliaro Joseph Wafula
 
U++ competition
U++ competitionU++ competition
U++ competition
 
Brian Anthony MIT STEX Automation Workshop June 17, 2015
Brian Anthony MIT STEX Automation Workshop June 17, 2015Brian Anthony MIT STEX Automation Workshop June 17, 2015
Brian Anthony MIT STEX Automation Workshop June 17, 2015
 
Understanding the Big Data Enterprise
Understanding the Big Data EnterpriseUnderstanding the Big Data Enterprise
Understanding the Big Data Enterprise
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Showcase of MIT Startup Exchange
Showcase of MIT Startup ExchangeShowcase of MIT Startup Exchange
Showcase of MIT Startup Exchange
 
Act teacherlibrarians2016
Act teacherlibrarians2016Act teacherlibrarians2016
Act teacherlibrarians2016
 
SemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesSemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challenges
 
Elections in the digital age
Elections in the digital ageElections in the digital age
Elections in the digital age
 

Semelhante a What Does Responsible Data Science Mean?

Delivering value through data final ppt 2019
Delivering value through data final ppt 2019Delivering value through data final ppt 2019
Delivering value through data final ppt 2019
Future Agenda
 

Semelhante a What Does Responsible Data Science Mean? (20)

Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Delivering value through data final ppt 2019
Delivering value through data final ppt 2019Delivering value through data final ppt 2019
Delivering value through data final ppt 2019
 
e-SIDES workshop at ICE-IEEE Conference, Madeira 28/06/2017
e-SIDES workshop at ICE-IEEE Conference, Madeira 28/06/2017e-SIDES workshop at ICE-IEEE Conference, Madeira 28/06/2017
e-SIDES workshop at ICE-IEEE Conference, Madeira 28/06/2017
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
Mind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeMind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and Practice
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Aligning stakeholders' perspectives in Open Government Data Community
Aligning stakeholders' perspectives in Open Government Data CommunityAligning stakeholders' perspectives in Open Government Data Community
Aligning stakeholders' perspectives in Open Government Data Community
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17
 
The future of data risk sept 2020
The future of data risk   sept 2020The future of data risk   sept 2020
The future of data risk sept 2020
 
Research data sharing
Research data sharingResearch data sharing
Research data sharing
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
CODATA, Open Science Policies and Capacity Building by Simon Hodson
CODATA, Open Science Policies and Capacity Building by Simon HodsonCODATA, Open Science Policies and Capacity Building by Simon Hodson
CODATA, Open Science Policies and Capacity Building by Simon Hodson
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth Paradigm
 
Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015
 
What it means to be FAIR
What it means to be FAIRWhat it means to be FAIR
What it means to be FAIR
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practices
 
Open Data Institute - Presentation for workshop
Open Data Institute - Presentation for workshopOpen Data Institute - Presentation for workshop
Open Data Institute - Presentation for workshop
 

Mais de Philip Bourne

Mais de Philip Bourne (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Social Responsibility in Research
Social Responsibility in ResearchSocial Responsibility in Research
Social Responsibility in Research
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
The Most Important Ten Simple Rules
The Most Important Ten Simple RulesThe Most Important Ten Simple Rules
The Most Important Ten Simple Rules
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
 
Capstone Experience - SWOT Analysis
Capstone Experience - SWOT AnalysisCapstone Experience - SWOT Analysis
Capstone Experience - SWOT Analysis
 
Data Science During and After COVID-19
Data Science During and After COVID-19Data Science During and After COVID-19
Data Science During and After COVID-19
 

Último

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Último (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

What Does Responsible Data Science Mean?

  • 1. What Does Responsible Data Science Mean? Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 08/09/19 Data Science for the Public Good @pebourne Thanks to Claudia Scholz for some slides
  • 2. Context – Our new School of Data Science is intent on practicing responsible data science as our hallmark From our draft strategic plan – The practice of data science through education, research and service whereby all aspects of these endeavors consider the ethical, legal and policy aspects of all we do such that the reputation and integrity of the SDS are never in question. 08/09/19 Data Science for the Public Good
  • 3. Opportunity – In over 40+ years in academia I have never seen anything as transformative as what is happening today 08/09/19 Data Science for the Public Good Data Science Initiatives Nationwide EffectCause https://surgery.duke.edu/divisions/trauma-and-critical-care-surgery The story of the trauma surgeon
  • 5. What is happening now is across all verticals – but there is a precedent we can learn from … 08/09/19 Data Science for the Public Good https://avora.com/blog/rise-of-the-data-warehouse/ https://individualizedmedicineblog.mayoclinic.org/2013/04/16/c elebrating-10th-anniversary-of-human-genome-project/ https://science.sciencemag.org/content/291/5507/1304
  • 6. What is happening now is across all verticals – but there is a precedent we can learn from … 08/09/19 Data Science for the Public Good https://avora.com/blog/rise-of-the-data-warehouse/ DNA Sequence Data Since the Human Genome http://synbio.info/display/synbio/Genetic+data+likely+to+become+the+biggest+big+data+in+2025
  • 7. What can we learn from what has come before…. Lesson 1 Responsible data science means recognizing that exponential growth of data leads to unexpected consequences 08/09/19 Data Science for the Public Good
  • 8. 08/09/19 Data Science for the Public Good https://www.montana.edu/news/17886/public-forum-exploring-the-science-and-ethics-of-gene-editing- set-for-aug-7 http://theconversation.com/five-things-to-consider-before-ordering-an-online-dna-test-92504 https://www.cnbc.com/2019/05/02/ubiome-what-really-happened-at-health-start-up-raided-by-fbi.html Accuracy Do you want to know? You can do it at home What is ethical in the research lab is not when commercialized
  • 9. The 6D’s provides one description of the consequences.. 08/09/19 Data Science for the Public Good
  • 10. Lesson 1 Exponential growth of data leads to unexpected consequences Responsible data science anticipates or at least prepares to deal with such consequences ahead of time 08/09/19 Data Science for the Public Good
  • 11. Lesson 2 – Its all too easy to forget the negative consequences when … 08/09/19 Data Science for the Public Good [Courtesy Eric Green, NHGRI]
  • 12. Lesson 3 – Policies and laws lag… 08/09/19 Data Science for the Public Good http://www.navajo-nsn.gov/News%20Releases/OPVP/2019/may/FOR%20IMMEDIATE%20RELEASE%20- %20Navajo%20Nation%20signs%20data%20sharing%20agreement%20to%20advance%20uranium%20exposure%20research%20efforts.pdf
  • 13. Lesson 4 – Data sharing is a double edge sword… 08/09/19 Data Science for the Public Good
  • 14. On the plus side data sharing can save lives … Use case: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive [From Adam Resnick] 08/09/19 Data Science for the Public Good
  • 15. Timeline of genomic studies in DIPG • 2012 Landmark studies identify histone mutations as recurrent driver mutations in DIPG • The data were not shared for 3 years • In 2015 in largely the same datasets, others identify ACVR1 mutations as a secondary, co- occurring mutation • ACVR1 is targetable by a drug • 3 years = 180 lives From Adam Resnick 08/09/19 Data Science for the Public Good
  • 16. NIH Strategic Plan for Data • Support a Highly Efficient and Effective Biomedical Research Data Infrastructure • Promote Modernization of the Data- Resources Ecosystem • Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools • Enhance Workforce Development for Biomedical Data Science • Enact Appropriate Policies to Promote Stewardship and Sustainability 08/09/19 Data Science for the Public Good https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
  • 17. Lesson 4 – Data sharing is a double edge sword… 08/09/19 Data Science for the Public Good
  • 18. STATE HEALTH SURVEILLANCE: NEWBORN SCREENING CASE STUDY From Bonnie R and Bernheim R, Public Health Law, Policy and Ethics, Foundation Press (2015) Category Variables Infant Patient ID, Birth date, birth time, ethnicity, weight in grams, feeding type, transfusion status, zip code of mother Sample Sample ID, collection date, received date, disposition code for sample (satisfactory/not satisfactory) Submitter Submitter ID, submitter name Test 36 different tests Diagnosis Diagnosis, diagnosis date, sample ID The final dataset contained more than 1.6 million sample records and nearly 29,000 diagnosis records 08/09/19 Data Science for the Public Good
  • 19. Zip Code Level Sickle Cell Prevalence 08/09/19 Data Science for the Public Good
  • 20. Given these lessons – there are many others – from just one vertical what should we be doing as a School of Data Science to be responsible while undertaking data science for the public good? 08/09/19 Data Science for the Public Good
  • 21. Guiding Principles … Be open, transparent & collaborative in all we do • Make ourselves known - use persistent identifiers e.g., ORCID • Use preprints to accelerate progress • Only publish Open Access (OA) • Recognize openness, transparency & collaboration in hiring and P&T • Promote institutional openness – Open Data Lab, wikimedian in residence • Support institutional open data governance 08/09/19 Data Science for the Public Good
  • 22. Guiding Principles … Consider the ethical consequences across the complete data workflow 08/09/19 Data Science for the Public Good
  • 23. Acquisition Engineering Analysis Communication Dissemination Ethics ● Census, surveys ● Data mining, digitization ● Sensors, Internet of Things (IoT) Ethical Issues: ● Mass surveillance ● Privacy, terms of service ● Data sovereignty Data Acquisition: Information → Data Job titles: ● IoT engineer ● Chief privacy officer ● Survey designer https://www.wired.com/story/all-of-us-launches/
  • 24. Acquisition Engineering Analysis Communication Dissemination Ethics ● Integration of data sources ● Data wrangling & cleaning ● Data structures ● Cloud & parallel computing Ethical Issues: ● Intellectual property ● Consequences of integration Data Engineering: Data → Value Job titles: ● Data engineer ● Information systems engineer
  • 25. Acquisition Engineering Analysis Communication Dissemination Ethics ● Machine learning ○ supervised, unsupervised ● Models & simulations Ethical Issues: ● Algorithmic bias ● Accountability & transparency Data Analysis: Data → Knowledge Job titles: ● Data Scientist or Analyst ● Machine Learning Engineer
  • 26. Acquisition Engineering Analysis Communicatio n Dissemination Ethics ● Visualization ● Storytelling Ethical Issues: ● Confidentiality ● Distortion of facts Data Communication: Data → Insight Job titles: ● Data Journalist ● Information Designer ● Dashboard Manager
  • 27. Acquisition Engineering Analysis Communication Disseminatio n Ethics ● Data preservation ● Reproducibility of research ● F.A.I.R. & open Ethical Issues: ● Cybersecurity ● Dual use Data Dissemination: Data → Future Use Job titles: ● Data Steward ● Repository manager ● Open Science advocate
  • 28. Take home • The fourth paradigm is upon us and will change society • Forming a new schools is an opportunity to do it right – we need help! • Look to fields like genomics that have been doing data science for some time and consider best (and worst) practices • Responsible data science involves working by a set of guiding principles and.. • Considering the consequences of what we do across the complete data lifecycle 08/09/19 Data Science for the Public Good Only then will we truly be undertaking data science for the public good
  • 29. Acknowledgements 08/09/19 Data Science for the Public Good The BD2K Team at NIH The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
  • 30. Thank You peb6a@virginia.edu 08/09/19 Data Science for the Public Good

Notas do Editor

  1. Does data wrangling fall here? Or in acquisition? Does wrangling = munging = cleaning?
  2. Data Steward: https://librarianresources.taylorandfrancis.com/a-day-in-the-life-of-a-data-steward/
  3. 30