SlideShare uma empresa Scribd logo
1 de 40
Designing a Synergistic Relationship Between
Undergraduate Data Science Education and
Usability of Biodiversity Databases
Ciera Martinez, PhD
Berkeley Institute of Data Science
Mozilla Foundation
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
How do
organisms get
their shape?
Evolution!
Motivation
Learning to perform computational research is hard
- Attempting to gain the correct skills is frustrating
- They don’t teach the skills you need in classes
- PIs don’t teach these skills to their students
How I learned
- Using online tutorials and documentation
- Talking with peers, largely online
- Teaching others
Motivation
Now committed to programming education
Chitwood et al, 2014, Martinez et al, 2019, Martinez et al, 2016a, Martinez et al., 2016b,
Utilizing data made my research faster and better
Motivation
fieldmuseum.org
Motivation
ag.com/smithsonian-institution/story-behind-those-jaw-dropping-photos-collections-natur
Motivation
Motivation
Ect
Yay!
So much cool data!
Motivation
Ect
Yay!
So much cool data!
Nay! It’s hard to:
Navigate these databases
Understand what is in these databases
Access data within these databases
Motivation
Data Science
Education
Using Biodiversity data
To answer research questions+ =
Curiosity Data Project
Motivation
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
Guided by mentors, students are given free rein to explore a
database of their choosing with the end goal of creating a
tutorial outlining their process.
Curiosity Data Project
To use biodiversity databases and inquiry based learning to teach
data literacy skills to undergraduate and graduate students.
Project Overview
Project set-up
Synergistic Relationship
Project set-up
1.Got team together
• Recruited eight undergraduates from pool of 30
• Some programming knowledge (R or Python)
• Majors: CS, Stats, Biology
• Recruited mentor help: 1 PhD student and 1
industry data scientist
Project set-up
2. The students were asked to pick one
database and document how they access,
clean, and analyze the data from that database
Project set-upProject set-up
2. The students were asked to pick one
database and document how they access,
clean, and analyze the data from that database
List of prompt questions to begin exploration
• What features you find most useful?
• What kind of questions can we ask when using this database?
• What tools are available to use this database?
• What skills are required to use this database?
• How do you access the data?
• What tutorials help you access the database?
• What are the challenges to using this data?
• What are the rules for using this database?
Project set-upProject set-up
• We are giving back to the community by teaching others
• They get over their fears of posting their code online
• Expose them to open science philosophy and
computational reproducibility
• Focus on code readability
• Allow them to learn how to present a story
• Teaches them computational research notebook
management
• Regularly use Git and Github
• Start online presence and portfolio
3. Student end project goal is a tutorial that will
be posted online
Project set-up
Reasoning
Project set-up
4. We met for 20 min individually each week
to discuss what they worked on, challenges,
and plan next weeks work. (~3 hours a
week)
Project set-upProject set-up
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
I am so impressed with their work!
Results of the project
curiositydata.org
Ask how animals interact with network graph visualization
Yikang Li
GloBI
Results of the project
curiositydata.org
Tracking animal movement in time and space
Caryn
MoveBank
Results of the project
curiositydata.org
Mapping dinosaur and plant fossil data in time and
space
Paleobiology Database (PBDB)
Results of the project
curiositydata.org
Coming Soon: Explore 3D CT Scans using open source tools
Morphosource
Results of the project
Samantha
curiositydata.org
Exploring the data of natural history
Results of the project
What worked?
What didn’t work?
What did the students like about the project?
What did the students dislike about the project?
Results of the project
- They all thrived in self directed inquiry
-These tutorials can be coopted for workshops AND be
remixed for lesson modules
-Feedback works if directly in contact with developer
- They all enjoyed it and stuck with it, I didn’t lose one
student
- Students became extremely interested in the data
What worked?
Results of the project
- The students skills did improve!
What worked?
Results of the project
-They sometimes floundered with so much freedom
-It was hard to assess how much work they did each week
-They could not create tools like I originally planned
-The limiting factor is me, hard to scale.
-Reviewing the code
-Dealing with notebook to website
-Individual meetings
Results of the project
What didn’t work?
- “Ability to take the project in whatever direction I
wanted”
- “I love manipulating data programmatically and making
plots with the data.”
-“I enjoyed learning new skills outside of the classroom and
actually being able to create a tutorial using data that I was
really interested in!”
-“Exploring different data visualization methods and tailor
them to fit the data set”
Results of the project
What did they like about the project?
Results of the project
What did they dislike about the project?
-“Debugging code and figuring out how to make an
interesting…story out of the data”
-“Downloading data using API and storing big data.”
-"Lack of structure”
-“Trying to figure out how to solve the problems like dealing
with missing and "bad" data”
-“That some visualization methods/packages are difficult to
install and use “
-“Debugging code and figuring out how to make an
interesting…story out of the data”
-“Downloading data using API and storing big data.”
-"Lack of structure”
-“Trying to figure out how to solve the problems like dealing
with missing and "bad" data”
-“That some visualization methods/packages are difficult to
install and use “
But everything they listed is normal frustrations about
working with code, so they learned what is common
frustrations with the code
Results of the project
What did they dislike about the project?
CURIOSITY
DATA
TEAM
Caryn
CieraWinifred
Samantha Shirley
Sara
Zoe
Yikang OllieLynn
*Not pictured Yihui
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
- Is there something here with gender and this data? Why? How do
we leverage for bridging gender gap in STEM?
- Can this scale as an academic course?
- Can this scale in the online community?
- Do you need to know how to program to run a similar project?
- Will people use these tutorials?
- Where is an appropriate place to house these tutorials?
End questions
Thank You!
Neotoma, PBDB, GloBI, Movebank, Google Earth,
iDigBio, BHL, Morphosource
All the museums, collections, and universities that hosted me to visit, and
provided insightful discussion and critical feedback
Berkeley Museum of Vertebrate and Zoology, New York Botanical Garden,
National Museum of Denmark, Kew Gardens, Natural History Museum of Utah,
American Museum of Natural History, Cooper-Hewitt Design Museum, UT
Austin Vertebrate Paleontology Laboratory Collection, Digimorph Facility, UT
Austin Ichthyology Collection, Lewis and Clark College, Natural History
Museum (London)
Special Thanks: Deb Paul
Learning Objectives
Link to exact document
- identify assumptions about data
- formulate research questions
- tell a story with data
- visualize data
- build data science portfolio
- write collaborative code
- employ version control
- apply data / file management
- think with reproducibility in mind
- handle and merge data
- clean messy data
Overlapping Data Literacy Skills
1. Manipulation of data frames
• subset
• merge
• transform long vs wide
• apply formulas to values
2. Map points onto a geographic map, understand assumptions
3. Clean taxonomic names, understand assumptions
1. Species concept
2. Spelling errors
3. Species synonyms
4. Hierarchical classification
5. Species databases available and differences between them
4. Time stamp data formats
1. Time collected vs time measured
2. Time formats
3. Visualization techniques for time and space
5. Look into biology for sanity checks
6. Understand the value of plotting data and which visualization is important
Minimum

Mais conteúdo relacionado

Mais procurados

A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationUniversity of South Africa (Unisa)
 
22nd Academic Design Management Conference DMI
22nd Academic Design Management Conference DMI 22nd Academic Design Management Conference DMI
22nd Academic Design Management Conference DMI Brigitte Borja de Mozota
 
Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Nicola Osborne
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research PaperAnita de Waard
 
Towards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service InnovationTowards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service InnovationJack Park
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introductionDinesh K
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research PaperAnita de Waard
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinarJim Jansen
 
googlization of information
googlization of informationgooglization of information
googlization of informationrajat00001in
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automationbenosteen
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 
The New e-Science (Bangalore Edition)
The New e-Science (Bangalore Edition)The New e-Science (Bangalore Edition)
The New e-Science (Bangalore Edition)David De Roure
 
Experiments in Educational Research & Practice
Experiments in Educational Research & PracticeExperiments in Educational Research & Practice
Experiments in Educational Research & PracticeJoseph Jay Williams
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behaviorJames Howison
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science Carole Goble
 

Mais procurados (17)

A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) Education
 
22nd Academic Design Management Conference DMI
22nd Academic Design Management Conference DMI 22nd Academic Design Management Conference DMI
22nd Academic Design Management Conference DMI
 
Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
 
Towards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service InnovationTowards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service Innovation
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introduction
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research Paper
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinar
 
Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]
 
googlization of information
googlization of informationgooglization of information
googlization of information
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automation
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
The New e-Science (Bangalore Edition)
The New e-Science (Bangalore Edition)The New e-Science (Bangalore Edition)
The New e-Science (Bangalore Edition)
 
Experiments in Educational Research & Practice
Experiments in Educational Research & PracticeExperiments in Educational Research & Practice
Experiments in Educational Research & Practice
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 

Semelhante a Designing a synergistic relationship between undergraduate Data Science education and usability of Biodiversity databases

Technology Integration
Technology IntegrationTechnology Integration
Technology Integrationlxshelby
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
Organizing and Securing Ethnographic Field Materials.pptx
Organizing and Securing Ethnographic Field Materials.pptxOrganizing and Securing Ethnographic Field Materials.pptx
Organizing and Securing Ethnographic Field Materials.pptxCelia Emmelhainz
 
Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...Keith Webster
 
Ms paul s_biology_class1
Ms paul s_biology_class1Ms paul s_biology_class1
Ms paul s_biology_class1liscricket
 
/Users/jesseurban/desktop/ms paul s_biology_class1
/Users/jesseurban/desktop/ms paul s_biology_class1/Users/jesseurban/desktop/ms paul s_biology_class1
/Users/jesseurban/desktop/ms paul s_biology_class1liscricket
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
Loanable equipment supporting creation and dissemination for the campus commu...
Loanable equipment supporting creation and dissemination for the campus commu...Loanable equipment supporting creation and dissemination for the campus commu...
Loanable equipment supporting creation and dissemination for the campus commu...Shawna Sadler
 
Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24tracykteal
 
Designing and delivering an international MOOC on Research Data Management an...
Designing and delivering an international MOOC on Research Data Management an...Designing and delivering an international MOOC on Research Data Management an...
Designing and delivering an international MOOC on Research Data Management an...Robin Rice
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New ScienceAnita de Waard
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Start making sense: Talking data management with researchers
Start making sense: Talking data management with researchersStart making sense: Talking data management with researchers
Start making sense: Talking data management with researchersIncremental Project
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...University of California Curation Center
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 

Semelhante a Designing a synergistic relationship between undergraduate Data Science education and usability of Biodiversity databases (20)

Technology Integration
Technology IntegrationTechnology Integration
Technology Integration
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Organizing and Securing Ethnographic Field Materials.pptx
Organizing and Securing Ethnographic Field Materials.pptxOrganizing and Securing Ethnographic Field Materials.pptx
Organizing and Securing Ethnographic Field Materials.pptx
 
Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...
 
Ms paul s_biology_class1
Ms paul s_biology_class1Ms paul s_biology_class1
Ms paul s_biology_class1
 
/Users/jesseurban/desktop/ms paul s_biology_class1
/Users/jesseurban/desktop/ms paul s_biology_class1/Users/jesseurban/desktop/ms paul s_biology_class1
/Users/jesseurban/desktop/ms paul s_biology_class1
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Loanable equipment supporting creation and dissemination for the campus commu...
Loanable equipment supporting creation and dissemination for the campus commu...Loanable equipment supporting creation and dissemination for the campus commu...
Loanable equipment supporting creation and dissemination for the campus commu...
 
Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24
 
Designing and delivering an international MOOC on Research Data Management an...
Designing and delivering an international MOOC on Research Data Management an...Designing and delivering an international MOOC on Research Data Management an...
Designing and delivering an international MOOC on Research Data Management an...
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Start making sense: Talking data management with researchers
Start making sense: Talking data management with researchersStart making sense: Talking data management with researchers
Start making sense: Talking data management with researchers
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 

Último

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 

Último (20)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 

Designing a synergistic relationship between undergraduate Data Science education and usability of Biodiversity databases

  • 1. Designing a Synergistic Relationship Between Undergraduate Data Science Education and Usability of Biodiversity Databases Ciera Martinez, PhD Berkeley Institute of Data Science Mozilla Foundation
  • 2. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 3. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 4. How do organisms get their shape? Evolution! Motivation
  • 5. Learning to perform computational research is hard - Attempting to gain the correct skills is frustrating - They don’t teach the skills you need in classes - PIs don’t teach these skills to their students How I learned - Using online tutorials and documentation - Talking with peers, largely online - Teaching others Motivation Now committed to programming education
  • 6. Chitwood et al, 2014, Martinez et al, 2019, Martinez et al, 2016a, Martinez et al., 2016b, Utilizing data made my research faster and better Motivation
  • 10. Ect Yay! So much cool data! Motivation
  • 11. Ect Yay! So much cool data! Nay! It’s hard to: Navigate these databases Understand what is in these databases Access data within these databases Motivation
  • 12. Data Science Education Using Biodiversity data To answer research questions+ = Curiosity Data Project Motivation
  • 13. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 14. Guided by mentors, students are given free rein to explore a database of their choosing with the end goal of creating a tutorial outlining their process. Curiosity Data Project To use biodiversity databases and inquiry based learning to teach data literacy skills to undergraduate and graduate students. Project Overview Project set-up
  • 16. 1.Got team together • Recruited eight undergraduates from pool of 30 • Some programming knowledge (R or Python) • Majors: CS, Stats, Biology • Recruited mentor help: 1 PhD student and 1 industry data scientist Project set-up
  • 17. 2. The students were asked to pick one database and document how they access, clean, and analyze the data from that database Project set-upProject set-up
  • 18. 2. The students were asked to pick one database and document how they access, clean, and analyze the data from that database List of prompt questions to begin exploration • What features you find most useful? • What kind of questions can we ask when using this database? • What tools are available to use this database? • What skills are required to use this database? • How do you access the data? • What tutorials help you access the database? • What are the challenges to using this data? • What are the rules for using this database? Project set-upProject set-up
  • 19. • We are giving back to the community by teaching others • They get over their fears of posting their code online • Expose them to open science philosophy and computational reproducibility • Focus on code readability • Allow them to learn how to present a story • Teaches them computational research notebook management • Regularly use Git and Github • Start online presence and portfolio 3. Student end project goal is a tutorial that will be posted online Project set-up Reasoning Project set-up
  • 20. 4. We met for 20 min individually each week to discuss what they worked on, challenges, and plan next weeks work. (~3 hours a week) Project set-upProject set-up
  • 21. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 22. I am so impressed with their work! Results of the project
  • 23. curiositydata.org Ask how animals interact with network graph visualization Yikang Li GloBI Results of the project
  • 24. curiositydata.org Tracking animal movement in time and space Caryn MoveBank Results of the project
  • 25. curiositydata.org Mapping dinosaur and plant fossil data in time and space Paleobiology Database (PBDB) Results of the project
  • 26. curiositydata.org Coming Soon: Explore 3D CT Scans using open source tools Morphosource Results of the project Samantha
  • 27. curiositydata.org Exploring the data of natural history Results of the project
  • 28. What worked? What didn’t work? What did the students like about the project? What did the students dislike about the project? Results of the project
  • 29. - They all thrived in self directed inquiry -These tutorials can be coopted for workshops AND be remixed for lesson modules -Feedback works if directly in contact with developer - They all enjoyed it and stuck with it, I didn’t lose one student - Students became extremely interested in the data What worked? Results of the project
  • 30. - The students skills did improve! What worked? Results of the project
  • 31. -They sometimes floundered with so much freedom -It was hard to assess how much work they did each week -They could not create tools like I originally planned -The limiting factor is me, hard to scale. -Reviewing the code -Dealing with notebook to website -Individual meetings Results of the project What didn’t work?
  • 32. - “Ability to take the project in whatever direction I wanted” - “I love manipulating data programmatically and making plots with the data.” -“I enjoyed learning new skills outside of the classroom and actually being able to create a tutorial using data that I was really interested in!” -“Exploring different data visualization methods and tailor them to fit the data set” Results of the project What did they like about the project?
  • 33. Results of the project What did they dislike about the project? -“Debugging code and figuring out how to make an interesting…story out of the data” -“Downloading data using API and storing big data.” -"Lack of structure” -“Trying to figure out how to solve the problems like dealing with missing and "bad" data” -“That some visualization methods/packages are difficult to install and use “
  • 34. -“Debugging code and figuring out how to make an interesting…story out of the data” -“Downloading data using API and storing big data.” -"Lack of structure” -“Trying to figure out how to solve the problems like dealing with missing and "bad" data” -“That some visualization methods/packages are difficult to install and use “ But everything they listed is normal frustrations about working with code, so they learned what is common frustrations with the code Results of the project What did they dislike about the project?
  • 36. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 37. - Is there something here with gender and this data? Why? How do we leverage for bridging gender gap in STEM? - Can this scale as an academic course? - Can this scale in the online community? - Do you need to know how to program to run a similar project? - Will people use these tutorials? - Where is an appropriate place to house these tutorials? End questions
  • 38. Thank You! Neotoma, PBDB, GloBI, Movebank, Google Earth, iDigBio, BHL, Morphosource All the museums, collections, and universities that hosted me to visit, and provided insightful discussion and critical feedback Berkeley Museum of Vertebrate and Zoology, New York Botanical Garden, National Museum of Denmark, Kew Gardens, Natural History Museum of Utah, American Museum of Natural History, Cooper-Hewitt Design Museum, UT Austin Vertebrate Paleontology Laboratory Collection, Digimorph Facility, UT Austin Ichthyology Collection, Lewis and Clark College, Natural History Museum (London) Special Thanks: Deb Paul
  • 39. Learning Objectives Link to exact document - identify assumptions about data - formulate research questions - tell a story with data - visualize data - build data science portfolio - write collaborative code - employ version control - apply data / file management - think with reproducibility in mind - handle and merge data - clean messy data
  • 40. Overlapping Data Literacy Skills 1. Manipulation of data frames • subset • merge • transform long vs wide • apply formulas to values 2. Map points onto a geographic map, understand assumptions 3. Clean taxonomic names, understand assumptions 1. Species concept 2. Spelling errors 3. Species synonyms 4. Hierarchical classification 5. Species databases available and differences between them 4. Time stamp data formats 1. Time collected vs time measured 2. Time formats 3. Visualization techniques for time and space 5. Look into biology for sanity checks 6. Understand the value of plotting data and which visualization is important Minimum

Notas do Editor

  1. So this is the outline of my talk. Today I am going to talk about a project I that has been the culmination of over ten years of interests of mine. So to start, I am going to take a few minutes to explain my motivations for building this project and how my background led me here.
  2. The short answer is evolution. I am obsessed with thinking about how organisms get their shape. The moment I learned that there are people who can devote their careers to asking such questions, is the exact moment I set a trajectory to become an evolutionary biologist. I vowed to attack this question, How do organism get their shape, to where ever it led me.
  3. Where it led me, is where almost every modern scientist is led. To a lot of data and a lot of frustrations on how to use that data. Learning to program is hard. It took many years to figure out I was not alone with this frustration, And now I am very committed to teaching programming and the pedagogy behind learning these programming and datasceince skills.
  4. And overtime I stopped being so hard on myself. And as my programming skills improved, my research questions became bigger and I started uncovering surprising patterns in my research that would not have been possible without programming. Describe slide. Instead of being frustrated by large amounts of data I became craving it
  5. Now I am hooked, I am currently a postdoc studying comparitive genomics at Berkeley. I consider myself a data scientist, just as much as I consider myself an evolutionary biologist and I began thinking about where the coolest data in the world is to answer my evolutionary questions, and it hit me, duh, Natural History Museums. I got my start in research at the Field Museum in Chicago (maybe you were there last week?),
  6. but from working there I knew the sheer magnitude of specimens not open to the public and I knew Natural History Museum Collections and Botanical Gardens must have amazing data.
  7. But lucky for me, I don’t have to literally go to these museums. I was thrilled to learn about the digitization efforts that this community in the past fifteen years. I could not believe how many databases their were. How did I not know about these before? So then I started devouring as much information I could about how to use this data. and realized THIS is hard. Navigating these databases is hard. Understanding what is in these databases is hard. Accessing the data is hard. But still, I could plainly see, the data from this community is extraordinary and immense, I just wish there was more information on how to use this data!
  8. But lucky for me, I don’t have to literally go to these museums. I was thrilled to learn about the digitization efforts that this community in the past fifteen years. I could not believe how many databases their were. How did I not know about these before? So then I started devouring as much information I could about how to use this data. and realized THIS is hard. Navigating these databases is hard. Understanding what is in these databases is hard. Accessing the data is hard. But still, I could plainly see, the data from this community is extraordinary and immense, I just wish there was more information on how to use this data!
  9. So this year, with the help of the Mozilla foundation, I decided to dive head first into my new obsession - biodiversity and natural history data, while also incorporating my second passion of teaching programming, data science, and computational reproducibility I call the project The Curiosity Data Project, After Cabinet of Curiosities.
  10. So I wanted to help not only undergraduates, but also increase the usability of these databases. So while the databases provide the data and point of contact for questions, The students
  11. I knew from working with undergraduates on my research that they are extremely well trained and hungry for projects that allows them to handle real data.
  12. They all started the same list of questions about what to ask about the data. This structure was built so that it will be useful for 1.Future users of the database and 2. Can be used as critical feedback for database maintainers. These are the sorts of questions that every database should have somewhere on their site, but often take days to figure out.
  13. Start online presence and portfolio (which is instrumental in them getting a job or get into graduate school) Regularly use Git and Github if you have ever tried to teach or learn git, you know that you can only learn with regular practice. A short course or learning module is boring and nearly impossible way to learn git this way
  14. - even with sometimes very little knowledge of biology. They do what we all do, Google. - Feedback works if directly in contact with developer (for example GloBI maintainer Jorrit Polean worked directly with Yikang Lee discussing problems she had. Gtihub issues were created and database improved) - Students became extremely interested in the data. Meeting with them was a delight, they would tell me all about dinosaurs, how fossils are collected, California fires, sea start diseases, whale migratory patterns. It was a delight.
  15. I should have limited them to the scope of their questions early on. because exploratory data analysis is not linear Understanding and cleaning the data was full time.
  16. Lastly I want to thank my team. Notice anything? Gender: 6 female, 1 male, 1 non-binary gendered student You might be thinking “oh, but maybe they were attracted to a female led data science team”. Not the case, I also have a team of undergraduates with my comparative genomics project: which is 5 male 2 females. The majority of applicants for this project was female - 24 out of the 30 applicant that applied were females. This is something that I have noticed about this community in general, a lot of females are attracted to this type of data. Why? I have some theories, but overall, I think it is data itself,
  17. So I end with a list of questions that I would love your opinion on. Please come find me if you have any thoughts on these questions or data science education in general. I can take any questions not tag
  18. They all started in the same place, and from there, they had to follow their own curiosity (the ultimate motivation).