SlideShare uma empresa Scribd logo
1 de 23
Best Practices
Creating Research Data




                         Sherry Lake
                         July 31, 2012 University of Florida Data Management Workshop
WHY?

Following these Best Practices…….
• Will improve the usability of the data by you
  or by others
• Your data will be “computer ready”
• Your data will be ready to share with others
Spreadsheet Examples
Spreadsheet Problems?
Problems

• Dates are not
  stored
  consistently
• Values are labeled inconsistently
• Data coding is inconsistent
• Order of values are different
Problems

• Confusion
  between
  numbers and
  text
• Different types of data are stored in the
  same columns
• The spreadsheet loses interpretability if it
  is sorted
Best Practices Data Organization
• Lines or rows of data should be complete
   – Designed to be machine readable, not human
     readable (sort)
Best Practices Data Organization


• Include a Header Line 1st line (or record)
• Label each Column with a short but
  descriptive name
  – Names should be unique
  – Use letters, numbers, or “_” (underscore)
  – Do not include blank spaces or symbols (+ - & ^ *)
Best Practices Data Organization


• Columns of data should be consistent
  – Use the same naming convention for text data
• Columns should include only a single kind of
  data
  – Text or “string” data
  – Integer numbers
  – Floating point or real numbers
Use Standardized Formats

• ISO 8601 Standard for Date and Time
  – YYYYMMDDThh:mmss.sTZD
               20091013T09:1234.9Z
       20091013T09:1234.9+05:00
• Spatial Coordinates for Latitute/Longitude
  – +/- DD.DDDDD
        -78.476 (longitude)
        +38.029 (latitude)
File Names
File Names
• Use descriptive names
• Not too long
• Don’t use spaces
• Try to include time,
  place & theme
• May use “-” or “_”
File Names

• String words together with
  Caps (VegBiodiv_2007)
• Think about using version
  numbers
• Don’t change default
  extensions (txt, jpg, csv,…)
Quantitative Assurance/Control
Dataset Creation & Integrity Errors
   • Use a data entry program
      – Program to catch typing errors
      – Program pull-down menu options
   • Perform double entry of the data
   • Manually check 5 – 10% of data records
   • Check for out-of-range values (plotting)
   • Check for missing or impossible values
   • Perform statistical summaries (random samples)
Analyzing Data - Notes
• Keep Original File
  – Uncorrected copy
  – Make “read-only”
• Make notes on transformations
• Any changes, save as a new file
• Use scripted code to transform and correct
  data
Analyzing Data
• Use a scripted program (R, SAS, SPSS, Matlab)
  – Steps are recorded in textual format
  – Can be easily revised and re-executed
  – Helps sharing and repetition
  – Easy to document
• GUI-bases analysis may be easier, but harder
  to reproduce
Document EVERYTHING!

• Create a Project Document File
  – More than a Lab Notebook
  – Data Management Plan
• Start at the beginning of the project and
  continue throughout data collection & analysis
  – Why you are collecting data
  – Exact details of methods of collecting & analyzing
Document EVERYTHING!
• Details such as:
  – Names of data & analysis files associated with
    study
  – Definitions for data and codes (include missing
    value codes, names) example
  – Units of measure (accuracy and precision)
  – Standards or instrument calibrations
Choosing File Formats

• Accessible Data (in the future)
  – Non-proprietary (software formats)
  – Open, documented standard
  – Common, used by the research community
  – Standard representation (ASCII, Unicode)
  – Unencrypted & Uncompressed
  – Media formats (hardware formats)
Preferred Format Choices
•   PDF, not Word
•   ASCII, not Excel
•   MPEG-4, not Quicktime
•   TIFF or JPEG2000, not GIF or JPG
•   XML or RDF, not RDBMS

Good if not software specific
Best Practices

1. Use Consistent Data Organization
2. Use Standardized Formats
3. Assign Descriptive File Names
4. Perform Basic Quality Assurance/ Quality Control
5. Use Scripted Program for Analysis and Keep Notes
6. Document EVERYTHING! (Define Contents of Data
   Files )
7. Use Consistent, Stable and Open File Formats
Best Practices Bibliography
Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some
   simple guidelines for effective data management. Bulletin of the Ecological
   Society of America, 90(2), 205-214.
Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E.
  (2010). Best Practices for Preparing Environmental Data Sets to Share and
  Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf)
  from Oak Ridge National Laboratory Distributed Active Archive Center, Oak
  Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010.
Inter-university Consortium for Political and Social Research (ICPSR). (2012).
    Guide to social science data preparation and archiving: Best practices
    throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved
    05/31/2012, from
    http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf.
Data Observation Network for Earth (DataONE). (2012). DataONE Best
   Practices database. Retrieved 07/21/12, from
   http://www.dataone.org/best-practices.
Questions? Discussion?

• Sherry Lake
  Senior Scientific Data Consultant, UVA Library
• shlake@virginia.edu
• Twitter: shlakeuva
• Slideshare: http://www.slideshare.net/shlake
• Web: http://www.lib.virginia.edu/brown/data




                                                   23

Mais conteúdo relacionado

Mais procurados

Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsUmasree Raghunath
 
Differene between data and information
Differene between data and informationDifferene between data and information
Differene between data and informationSarah Ahmed
 
Decision Support Systems DSS
Decision Support Systems DSSDecision Support Systems DSS
Decision Support Systems DSSHussein Alshkhir
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
The data quality challenge
The data quality challengeThe data quality challenge
The data quality challengeLenia Miltiadous
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality StrategiesDATAVERSITY
 
Chapter 1: The Importance of Data Assets
Chapter 1: The Importance of Data AssetsChapter 1: The Importance of Data Assets
Chapter 1: The Importance of Data AssetsAhmed Alorage
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data GovernanceTuba Yaman Him
 
Foundation of information system in business
Foundation of information system in businessFoundation of information system in business
Foundation of information system in businessAmrit Banstola
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Matteo Manca
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance StrategyAnalytics8
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdfDatacademy.ai
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyDATAVERSITY
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaVaibhav Khanna
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDATAVERSITY
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 
Introduction to information systems and the role of information systems in bu...
Introduction to information systems and the role of information systems in bu...Introduction to information systems and the role of information systems in bu...
Introduction to information systems and the role of information systems in bu...Ultraspectra
 

Mais procurados (20)

Data Governance Intro.pptx
Data Governance Intro.pptxData Governance Intro.pptx
Data Governance Intro.pptx
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Differene between data and information
Differene between data and informationDifferene between data and information
Differene between data and information
 
Decision Support Systems DSS
Decision Support Systems DSSDecision Support Systems DSS
Decision Support Systems DSS
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
The data quality challenge
The data quality challengeThe data quality challenge
The data quality challenge
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality Strategies
 
Chapter 1: The Importance of Data Assets
Chapter 1: The Importance of Data AssetsChapter 1: The Importance of Data Assets
Chapter 1: The Importance of Data Assets
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
Foundation of information system in business
Foundation of information system in businessFoundation of information system in business
Foundation of information system in business
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdf
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Introduction to information systems and the role of information systems in bu...
Introduction to information systems and the role of information systems in bu...Introduction to information systems and the role of information systems in bu...
Introduction to information systems and the role of information systems in bu...
 

Semelhante a Best practices data collection

Best practices data management
Best practices data managementBest practices data management
Best practices data managementSherry Lake
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate StudentsRebekah Cummings
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 
3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.ppt3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.pptAghaSyedNaqvi
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Mojtaba Lotfaliany
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Rebekah Cummings
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217lyarmey
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentationssri-duke
 
Lec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemsLec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemssamiullahamjad06
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.FAIRDOM
 
IS L03 - Database Management
IS L03 - Database ManagementIS L03 - Database Management
IS L03 - Database ManagementJan Wong
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto UniversityStephanie Simms
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Dios Kurniawan
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath scienceMitikuTeka1
 

Semelhante a Best practices data collection (20)

Best practices data management
Best practices data managementBest practices data management
Best practices data management
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate Students
 
6.2 software
6.2 software6.2 software
6.2 software
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.ppt3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.ppt
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentation
 
Digital data
Digital dataDigital data
Digital data
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Lec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemsLec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systems
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 
IS L03 - Database Management
IS L03 - Database ManagementIS L03 - Database Management
IS L03 - Database Management
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath science
 

Mais de Sherry Lake

Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra DataSherry Lake
 
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Sherry Lake
 
Using a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansUsing a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansSherry Lake
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampSherry Lake
 
DMTool-ASERL-Webinar
DMTool-ASERL-WebinarDMTool-ASERL-Webinar
DMTool-ASERL-WebinarSherry Lake
 
DMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaDMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaSherry Lake
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandatesSherry Lake
 
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014Sherry Lake
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for EngineersSherry Lake
 
DMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanDMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanSherry Lake
 
Lake dmp tool_i_conference
Lake dmp tool_i_conferenceLake dmp tool_i_conference
Lake dmp tool_i_conferenceSherry Lake
 
Lake us-canada policesupdate
Lake us-canada policesupdateLake us-canada policesupdate
Lake us-canada policesupdateSherry Lake
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycleSherry Lake
 
Dmp tool presentation
Dmp tool presentationDmp tool presentation
Dmp tool presentationSherry Lake
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management PlansSherry Lake
 
Library support for life cycle
Library support for life cycleLibrary support for life cycle
Library support for life cycleSherry Lake
 

Mais de Sherry Lake (20)

Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra Data
 
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
 
Using a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansUsing a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to Librarians
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM Bootcamp
 
Creating dmp
Creating dmpCreating dmp
Creating dmp
 
DMTool-ASERL-Webinar
DMTool-ASERL-WebinarDMTool-ASERL-Webinar
DMTool-ASERL-Webinar
 
DMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaDMPTool Workshop University of Georgia
DMPTool Workshop University of Georgia
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandates
 
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
DMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanDMPTool Webinar Environmental Scan
DMPTool Webinar Environmental Scan
 
Lake dmp tool_i_conference
Lake dmp tool_i_conferenceLake dmp tool_i_conference
Lake dmp tool_i_conference
 
Lake us-canada policesupdate
Lake us-canada policesupdateLake us-canada policesupdate
Lake us-canada policesupdate
 
Why managedata
Why managedataWhy managedata
Why managedata
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
Web links
Web linksWeb links
Web links
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
Dmp tool presentation
Dmp tool presentationDmp tool presentation
Dmp tool presentation
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management Plans
 
Library support for life cycle
Library support for life cycleLibrary support for life cycle
Library support for life cycle
 

Último

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Último (20)

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Best practices data collection

  • 1. Best Practices Creating Research Data Sherry Lake July 31, 2012 University of Florida Data Management Workshop
  • 2. WHY? Following these Best Practices……. • Will improve the usability of the data by you or by others • Your data will be “computer ready” • Your data will be ready to share with others
  • 5. Problems • Dates are not stored consistently • Values are labeled inconsistently • Data coding is inconsistent • Order of values are different
  • 6. Problems • Confusion between numbers and text • Different types of data are stored in the same columns • The spreadsheet loses interpretability if it is sorted
  • 7. Best Practices Data Organization • Lines or rows of data should be complete – Designed to be machine readable, not human readable (sort)
  • 8. Best Practices Data Organization • Include a Header Line 1st line (or record) • Label each Column with a short but descriptive name – Names should be unique – Use letters, numbers, or “_” (underscore) – Do not include blank spaces or symbols (+ - & ^ *)
  • 9. Best Practices Data Organization • Columns of data should be consistent – Use the same naming convention for text data • Columns should include only a single kind of data – Text or “string” data – Integer numbers – Floating point or real numbers
  • 10. Use Standardized Formats • ISO 8601 Standard for Date and Time – YYYYMMDDThh:mmss.sTZD 20091013T09:1234.9Z 20091013T09:1234.9+05:00 • Spatial Coordinates for Latitute/Longitude – +/- DD.DDDDD -78.476 (longitude) +38.029 (latitude)
  • 12. File Names • Use descriptive names • Not too long • Don’t use spaces • Try to include time, place & theme • May use “-” or “_”
  • 13. File Names • String words together with Caps (VegBiodiv_2007) • Think about using version numbers • Don’t change default extensions (txt, jpg, csv,…)
  • 14. Quantitative Assurance/Control Dataset Creation & Integrity Errors • Use a data entry program – Program to catch typing errors – Program pull-down menu options • Perform double entry of the data • Manually check 5 – 10% of data records • Check for out-of-range values (plotting) • Check for missing or impossible values • Perform statistical summaries (random samples)
  • 15. Analyzing Data - Notes • Keep Original File – Uncorrected copy – Make “read-only” • Make notes on transformations • Any changes, save as a new file • Use scripted code to transform and correct data
  • 16. Analyzing Data • Use a scripted program (R, SAS, SPSS, Matlab) – Steps are recorded in textual format – Can be easily revised and re-executed – Helps sharing and repetition – Easy to document • GUI-bases analysis may be easier, but harder to reproduce
  • 17. Document EVERYTHING! • Create a Project Document File – More than a Lab Notebook – Data Management Plan • Start at the beginning of the project and continue throughout data collection & analysis – Why you are collecting data – Exact details of methods of collecting & analyzing
  • 18. Document EVERYTHING! • Details such as: – Names of data & analysis files associated with study – Definitions for data and codes (include missing value codes, names) example – Units of measure (accuracy and precision) – Standards or instrument calibrations
  • 19. Choosing File Formats • Accessible Data (in the future) – Non-proprietary (software formats) – Open, documented standard – Common, used by the research community – Standard representation (ASCII, Unicode) – Unencrypted & Uncompressed – Media formats (hardware formats)
  • 20. Preferred Format Choices • PDF, not Word • ASCII, not Excel • MPEG-4, not Quicktime • TIFF or JPEG2000, not GIF or JPG • XML or RDF, not RDBMS Good if not software specific
  • 21. Best Practices 1. Use Consistent Data Organization 2. Use Standardized Formats 3. Assign Descriptive File Names 4. Perform Basic Quality Assurance/ Quality Control 5. Use Scripted Program for Analysis and Keep Notes 6. Document EVERYTHING! (Define Contents of Data Files ) 7. Use Consistent, Stable and Open File Formats
  • 22. Best Practices Bibliography Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some simple guidelines for effective data management. Bulletin of the Ecological Society of America, 90(2), 205-214. Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E. (2010). Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010. Inter-university Consortium for Political and Social Research (ICPSR). (2012). Guide to social science data preparation and archiving: Best practices throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved 05/31/2012, from http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf. Data Observation Network for Earth (DataONE). (2012). DataONE Best Practices database. Retrieved 07/21/12, from http://www.dataone.org/best-practices.
  • 23. Questions? Discussion? • Sherry Lake Senior Scientific Data Consultant, UVA Library • shlake@virginia.edu • Twitter: shlakeuva • Slideshare: http://www.slideshare.net/shlake • Web: http://www.lib.virginia.edu/brown/data 23

Notas do Editor

  1. Have you ever collected data and had trouble remembering what you did at the start?Tried to share your data with someone and they (or you) couldn’t understand itUsing “Best Practices” when you collect and record your data will improve future usability and may save time.Preparing your data using these “Best Practices”Following these best practices (guidelines) will help you Following these best practices will improve the usability of the data by you or by others … use it with other data.
  2. Spreadsheets are widely used for simple analyses They are easy to use BUT They allow (encourage) users to structure data in ways that are hard to use with other softwareYou can use them like Word, with columns. These spreadsheets (in this format) are good for “human” interpretation, not computers – and since you probably will need either Write a program or use a software package, then the “human” format is not best.These formats are good for presenting your findings such as publishing…. But it will be harder to use with other software later on (if you need to do any analysis).It is betterto store the data in ways that it can be used in automated ways, with minimal human intervention
  3. This is some well data measurements, where a salinity meter was used to measure the salinity (top and bottom) and the conductivity (Top & bottom)Take a look at this spreadsheet… What’s wrong with it?Could this be easily automated? Sorted?
  4. Dates are not stored consistentlySometimes date is stored with a label (e.g., “Date:5/23/2005”) sometimes in its own cell (10/2/2005)Values are labeled inconsistentlySometimes “Conductivity Top” others “conductivity_top”For Salinity sometimes two cells are used for top and bottom, in others they are combined in one cellData coding is inconsistentSometimes YSI_Model_30, sometimes “YSI Model 30”---- sort of can’t tell if it’s a “label” or a data valueTide State is sometimes a text description, sometimes a numberThe order of values in the “mini-table” for a given sampling date are different“Meter Type” comes first in the 5/23 table and second in the 10/2 table
  5. Confusion between numbers and textFor most software 39% or <30 are considered TEXT not numbers (what is the average of 349 and <30?)Different types of data are stored in the same columnsMany software products require that a single column contain either TEXT or NUMBERS (but not both!)The spreadsheet loses interpretability if it is sortedDates are related to a set of attributes only by their position in the file. Once sorted that relationship is lost.Not sure why you would sort this.
  6. The spreadsheet loses interpretability if it is sortedDates are related to a set of attributes only by their position in the file. Once sorted that relationship is lost.Look what happens when we sort this….Look at the difference in this one… sort it..https://docs.google.com/spreadsheet/ccc?key=0Att-cHR6O7gCdEZ2NzRhUWFLYy1nM2FMcDhaNGRVeWchttps://docs.google.com/spreadsheet/ccc?key=0Att-cHR6O7gCdHpTMC1kdWREbTNlanBwM3J5WVE3ZFE
  7. Standard convention for many software programs (usually a “check” yes,no) is for the 1st line (record) to be a header line… lists the names of variables in the file. Rest of records (lines) are data.Not too long some software programs may not work with long variable names
  8. We’ve seen that a spreadsheet or word processor can create datasets that can only be interpreted by human interventionThe “ugly spreadsheet” example would be hard to analyze even in a spreadsheet, except with lots case-by-case human decisionsBut what are some principles that characterize good archival data?Keep in mind that good data formats for data and sharing may not be the ones you prefer for viewing or analysis!Same naming convention for text data – use a vocabulary, keep same… “slack-high”…. Not “slack high”
  9. There are already standards for certain types of data (like date/time, spatial coords). Use them, don’t invent your own.Can you think of others?(am/pm NOT allowed) T appears literally in the string. Min. for date is YYYY.YYYY = four-digit yearMM = two-digit month (01=January, etc.) DD = two-digit day of month (01 through 31)hh = two digits of hour (00 through 23)mm = two digits of minute (00 through 59)ss = two digits of second (00 through 59) s = one or more digits representing a decimal fraction of a second TZD = time zone designator (Z or +hh:mm or -hh:mm) Vs. DMS degree minutes seconds important when data field could have more than one type of unit.
  10. Guidelines for filenames will only help you with your files/research. Once they are “archived” they will get new names that fit with the systems, usually a permanent name based on computer “locating” the file.Look at the file names……Context.txt, DataFile1.txt, DataFile2.txt, word6doc.zipLong ones….Safari, Ray… good date, placeNote “_” and “-” Think about how the name will look in a directory with lots of other files, want to be able to “pick it out”.
  11. File names easiest way to indicate the contents of the file, use terse but indicative of their content. Want to uniquely id the data file.Don’t’ make them too long, some scripting programs have a filename limit for file importing (reading)Don’t use blanks, some software may not be able to read file names with blanks.Think about how the name will look in a directory with lots of other files, want to be able to “pick it out”.
  12. Maybe use version numbers…. Don’t forget the extension (3 char.) used to tell the file type
  13. Data Quality control takes place at various stages during data collection, data entry, and data checking. The quality of the collection methods has direct correlation to the quality of the data.Quality of data collection methods used has a significant bearing on data quality.Quality includes: equipment calibration (use instrument calibration to check precision) allows other researchers to look at your data and compare to theirs need to validate transcriptionTrain coders (different people doing this) – create handbook.Can create (program) data entry interfaces and verify data entry, use lists to choose fromVerification: out-of range values, random samples, double checking entriesMinimize manual entryVisual Basic can create forms for Excel. Access form creationRandom sample of dataConsistency checkseach record is keyed in and then re-keyed against the original. Several standard packages offer this feature. In the re-entry process, the program catches discrepancies immediately. Start before data collection, define standards – document in handbook
  14. Don’t want to change something (or delete something) that could be important later.If use a scripted language you could re-run analyses
  15. Analysis “scripted” software: R, SAS, SPSS, MatlabAnalysis scripts are written records of the various steps involved in processing and analyzing data (sort of “analytical metadata”).Easily revised and re-executed at any time if needs to modify analysisVS. GUI (easier) but does not leave a clear accounting of exactly what you have doneDocument scripted code with comments on why data is being changed.
  16. Important to repeat!!!!More documentation: Documentation can also be called metadataDescription of the data file names (especially if using acronyms and abbreviations).Record why you are collecting data, Details of methods of analysisNames of all data and analysis filesDefinitions for data (include coding keys)Missing value codesUnit of measures.Structured metadata (XML) format standards for discipline (Ecological Metadata language – EML)
  17. Can also be called metadataDescription of the data file names (especially if using acronyms and abbrevs.Record why you are collecting data, Details of methods of analysisNames of all data and analysis filesDefinitions for data (include coding keys)Missing value codesUnit of measures.Calibrations so others can compare their results with yours.Structured metadata (XML) format standards for discipline (Ecological Metadata language – EML)
  18. Spreadsheets are widely used for simple analysesBut they have poor archival qualities Different versions over time are not compatibleFormulas are hard to capture or displayPlan what type of data you will be collecting. Want to choose a file format that can be read well into the future and is independent of software changes.These are formats more likely to be accessible in the future. to replace old media, maintaining devices that can still read the proprietary formats or media typeFormat of the file is a major factor in the ability to use the data in the future. As technology changes, plan for software and hardware obsolescence. System files (SAS, SPSS) are compact and efficient, but not very portable. Use software to “export” data to a portable (or transport) file. Convert proprietary formats to non-proprietary. Check for data errors in conversion.
  19. Examples of preferred format choicesFormats for long-term digital preservation (open). Don’t expect you (won’t have time) or the archive to be able to convert older formats to new one.
  20. Remember create spreadsheet so it can be automated2. Date/Time standards, Geospatial coords, Species, other standards from discipline3. Descriptive File Names – File names can help id what’ inside 4. Quality Assurance – when planning on data entry can “program” data checks in forms (Access and Excel), create pick lists (codes), missing data values5. Make it easier to replicate data transformation, can be documented6. Document EVERYTHING, dataset details, database details, collection notes – conditions, You will not remember everything 20 years from now! What someone would need to know about your data to use it.7. Stable File Formats – easier if all files were same format, also knowing what formats are better in the long-term