A Data Scientist Perspective on Data Curation in the Digital Era
1. Symposium on Digital Curation in
the Era of Big Data:
Career Opportunities and
Educational Requirements:
A Data Scientist Perspective
Dr. Vicki Lynn Ferrini
Lamont-Doherty Earth Observatory
2. Background (What I do)
•
•
•
•
•
•
•
•
•
•
•
Data Documentation (Metadata)
Data Management
Data Discovery & Access Tools
Develop/Implement QA/QC
Data Syntheses
Data Compliance Tools
Education Materials
Delivery to National Data Centers, Libraries
Data Publication & Links to Scientific Literature
Data Integration, Visualization & Analysis Tools
Best Practice Guidelines for Optimizing Acquisition
“Support, sustain, and advance the geosciences by providing
data services for observational solid earth data from the
Ocean, Earth, and Polar Sciences.”
rvdata.us
4. Perspective of Data Producers
Domain Specialists
• Goal: Scientific Discovery
• Data Acquisition&
Reduction
• Data Assembly
• Visualization, Integration
& Interpretation
• Scientific Standards
• Technical & Operational
Limitations
• Data documentation
• Varies by domain
• Often difficult
• Heterogeneous
5. Perspective of Data Consumers
•
•
•
•
•
•
Domain Specialists & Public
Goal: Discovery
Data Discoverability & Access
Cross-disciplinary
Scientific Standards
Interpretation
Increased importance of
documentation
• Data not self-generated
• Data Quality/Reliability
• Data Use/Misuse
6. Perspective of Data Providers
• Goal: Access/Preservation/Re-Use
• Data Formats & Standards
• Data Documentation &
Preservation Techniques
• Scientific & Metadata Standards
• Data Citation
• Data Transfer Mechanisms
• System Usability
• Interoperability/Linked Data
• Needs of Diversity of User
Community
• Knowledge of Content
Human & Digital Bridge between
Producers & Consumers
9. Key Attributes of Data Scientists
• Knowledge spanning full scientific data
stewardship continuum
• Domain Experience
• Content & applications
• Data acquisition & reduction practices
• Nuances of Data
• Technical knowledge
• Evolving Technologies
• Data Acquisition & Management
• Metadata
10. Key Attributes of Data Scientists
• Other skills (seldom taught)
• Communication & Organization
• Understand cultural aspects of user
community
• People/Project Management
• Balance between micro- and macroperspectives
11. Key Attributes Tech Team Members
• Basic knowledge of content OR interest/curiosity
• Experience with Data Production/Consumption
• Technical skills:
– web development & technology
– geospatially enabled data management tools
– experience with data analysis tools
– ability to work in a variety of tech environments
• Complementary skill sets
• Innovation & creativity
• Willingness to ask questions – assumptions can be
dangerous
12. Challenges & Opportunities
• Difficult to find right balance between technical
skills and interest in content
– Team dynamics, management approaches evolving
– Increasing opportunities to engage/educate computer
scientists in domain science
• Data producers are slow to join the digital era
– Educational opportunities
– Scientific benefits continue to grow
– New generation incorporating data sharing into scientific
workflow
• Difficult to keep pace with evolving technologies
– Educational & Professional Development opportunities