SlideShare uma empresa Scribd logo
1 de 9
Efficient & effective
data management for research projects
ILRI's Data Management
Platform
Carlos Quiros
June, 2015
• Back in 2011
• Current status
• How we did it
• Example of a process
• CKAN
• Key decisions made
• Technology and skills required
Contents
Back in 2011
Survey design
• Too many
• Not common indicators
• <> Variables
• <> Calculations
Survey implementation
• Too many tools
• No protocols
• Poor field data
cleaning
• No standard process
Storage
• In files
• Too many formats
• Too many versions
• Messy data cleaning
• No accountability
Availability & accessibility
• Nothing
Now
Survey design
• Too many
• Common indicators
• = Variables
• = Calculations
Storage
• Server database
• No formats
• One version
• Central cleaning
• Accountability
Availability & accessibility
• CKAN
• OData
Survey implementation
• 2 tools (ODK, CSPro)
• Protocols
• Field data cleaning
• Standard process
• Standard tools
How we went around it
Storage• Server database
• How to integrate ODK and CSPro?
• How to make it easy for scientists?
• How to manage user decentralization?
• Increase accountability?
Availability and accessibility• What to use? CKAN, Dataverse, etc.
 CKAN
• How to extend it to serve our purpose?
• How to integrate it with a server database?
• How to manage our metadata and vocabularies?
• How to do this?
• Data interoperability? RDF, OData, Gdata, etc?
 OData
• How to do it?
Survey implementation• Support only two tools
• Wrote protocols
• Wrote field data cleaning applications
• Wrote policies and implementation plans
• Wrote standard processes and tools for processing the data
• Worked closely with teams
• Created a central place for all the surveys
• Separated surveys in modules
• Worked on common indicators
• Management supports this process
Survey design (ongoing)
Example of a process
Testing &
Review (.xls)
Uploaded to
Formhub to test
account
Testing &
Review
(ODK Collect)
Ok
?
Field
Deployment
Uploaded to
Formhub to
project account
Data
collection
Upload data
to Formhub
End of
Data
Collecti
on
Sharing in
Data Portal
Data Cleaning from
server using MySQL for
Excel
Detailed breakdown of ILRI’s RMD workflow with ODK
Coding
.doc  .xls
Start
Draft tool
(.doc) Consultation
Final tool
(.doc)
Who
Code
s
RMG Staff
Project Team Member
Create MySQL
schema with
ODKToMySQL
MySQL
schema in
server
Convert data to
JSON with
FormhubToJSO
N
Data in
JSON
format
Upload JSON into
MySQL Schema
with
JSONToMySQL
Metadata
for portal
Initialize META in
schema
S = Scientist input / usage
S S S
S
S
S
S
ILRI’s data portal (CKAN) – http://data.ilri.org/portal/
• CKAN?
• The Open Knowledge Foundation
• Biggest deployed data portal software
• USA data portal
• UK data portal
• EU data portal
• Open Africa
• What do you get out of the box?
• Create datasets with minimum metadata
• Name, Abstract, Author, Date
• Tags into controlled vocabulary
• Powerful search engine
• Public / private access to datasets
• Able to attach resources (files) to a dataset
• Data interoperability through powerful API and RDF
• Arrange datasets into organization and topics
• What can you do by creating extensions
• Add new vocabularies (e.g., Language, Countries, etc.)
• Add new metadata fields
• Visualize different kinds of data (e.g., maps)
• Change theme (colors, logos, fonts, etc.)
• Create data hubs by harvesting other CKANs
• What ever else you want…..
Key decisions made
• Use open source for all RDM
Pros:
• Bigger pool of tools
• Flexible
• Innovation
Cons:
• Complex skill set
• Learning curve
• Relational Database Management System (RDMS)
Pros:
• Central place
• Auditing
Cons:
• DB management skill set
• Scientist have no idea on how to work with a RDMS
• CKAN
Pros:
• There is nothing better out there
• Flexible and extendible
Cons:
• Programming in several languages is required
• Learning curve
Technology and skills required
• Server
• Linux (Ubuntu server) [Linux administration]
• http://www.ubuntu.com/download/server
• Database server
• MySQL – An open source database system [DB administration, SQL]
• http://www.mysql.com/
• Data processing software [Linux, C++, Python]
• ODK – A toolset for collecting data on mobile devices.
• https://opendatakit.org/
• CSPro – A software for creating data entry applications.
• https://www.census.gov/population/international/software/cspro/
• Formhub – A software tools that collects ODK data.
• https://github.com/SEL-Columbia/formhub
• ODK Tools – A toolbox for processing ODK survey data into MySQL databases.
• https://github.com/ilri/odktools
• META – A toolbox for managing research data in MySQL databases.
• https://github.com/ilri/meta
• CSProTools – A toolbox for processing CSPro survey data into MySQL databases.
• https://github.com/ilri/csprotools
• Data sharing and interoperability
• CKAN – The open source data portal software. [Linux, Python, WebDev]
• http://ckan.org/
• http://docs.ckan.org/en/latest/maintaining/installing/index.html
• http://docs.ckan.org/en/latest/extensions/index.html
• Odata – Allow the creation and consumption of queryable and interoperable data
resources in a simple and standard way. [Linux, Java, WebDev]
• http://www.odata.org/
Thank you
Visit us @
http://data.ilri.org/

Mais conteúdo relacionado

Mais procurados

DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
Marin Dimitrov
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 

Mais procurados (20)

The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEM
 
Document management #RWIRW
Document management #RWIRWDocument management #RWIRW
Document management #RWIRW
 
Data warehouse 11 introduction to data transformation
Data warehouse 11 introduction to data transformationData warehouse 11 introduction to data transformation
Data warehouse 11 introduction to data transformation
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarAdf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data Lakes
 
Are we there yet?
Are we there yet?Are we there yet?
Are we there yet?
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
The AmeriFlux Network Data Management System
The AmeriFlux Network Data Management SystemThe AmeriFlux Network Data Management System
The AmeriFlux Network Data Management System
 
Business objects data services advanced
Business objects data services advancedBusiness objects data services advanced
Business objects data services advanced
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Using Scalding for Data Driven Product Development at LinkedIn
Using Scalding for Data Driven Product Development at LinkedInUsing Scalding for Data Driven Product Development at LinkedIn
Using Scalding for Data Driven Product Development at LinkedIn
 
Centralizing Storage without going off the Rails
Centralizing Storage without going off the RailsCentralizing Storage without going off the Rails
Centralizing Storage without going off the Rails
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 

Semelhante a Efficient & effective data management for research projects : ILRI's Data Management Platform

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
IDERA Software
 
ActiveMigrate - ECM Renovation Roadshow
ActiveMigrate - ECM Renovation RoadshowActiveMigrate - ECM Renovation Roadshow
ActiveMigrate - ECM Renovation Roadshow
Zia Consulting
 
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
CTSI at UCSF
 

Semelhante a Efficient & effective data management for research projects : ILRI's Data Management Platform (20)

Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Ellucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIEllucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BI
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Southwickc lampert lodlam_training
Southwickc lampert lodlam_trainingSouthwickc lampert lodlam_training
Southwickc lampert lodlam_training
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceFast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud Service
 
ActiveMigrate - ECM Renovation Roadshow
ActiveMigrate - ECM Renovation RoadshowActiveMigrate - ECM Renovation Roadshow
ActiveMigrate - ECM Renovation Roadshow
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
 
USG Summit - September 2014 - Web Management using Drupal
USG Summit - September 2014 - Web Management using DrupalUSG Summit - September 2014 - Web Management using Drupal
USG Summit - September 2014 - Web Management using Drupal
 
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
 

Mais de CIARD Movement

Mais de CIARD Movement (20)

Social Media in: Disseminating and Sharing Agriculture Data/Information
Social Media in: Disseminating and Sharing Agriculture Data/InformationSocial Media in: Disseminating and Sharing Agriculture Data/Information
Social Media in: Disseminating and Sharing Agriculture Data/Information
 
DSpace at ILRI : A semi-technical overview of “CGSpace”
DSpace at ILRI : A semi-technical overview of “CGSpace”DSpace at ILRI : A semi-technical overview of “CGSpace”
DSpace at ILRI : A semi-technical overview of “CGSpace”
 
University of Nairobi, Open Access Initiatives
University of Nairobi, Open Access InitiativesUniversity of Nairobi, Open Access Initiatives
University of Nairobi, Open Access Initiatives
 
Knowledge Management at KEFRI
Knowledge Management at KEFRIKnowledge Management at KEFRI
Knowledge Management at KEFRI
 
Open Research Data – the KALRO experience
Open Research Data – the KALRO experienceOpen Research Data – the KALRO experience
Open Research Data – the KALRO experience
 
JKUAT Case on Open Access
JKUAT Case on Open AccessJKUAT Case on Open Access
JKUAT Case on Open Access
 
JKUAT Case on Open Access
JKUAT Case on Open AccessJKUAT Case on Open Access
JKUAT Case on Open Access
 
Open Data and Open Science in Agriculture: Management
Open Data and Open Science in Agriculture: ManagementOpen Data and Open Science in Agriculture: Management
Open Data and Open Science in Agriculture: Management
 
Open Access Initiatives and Challenges in Kenya: Universities
Open Access Initiatives and Challenges in Kenya: UniversitiesOpen Access Initiatives and Challenges in Kenya: Universities
Open Access Initiatives and Challenges in Kenya: Universities
 
ICT Centre of Excellence and Open Data –iCEOD
ICT Centre of Excellence and Open Data –iCEODICT Centre of Excellence and Open Data –iCEOD
ICT Centre of Excellence and Open Data –iCEOD
 
Open Data and Big Data Capacity Building Initiative
Open Data and Big Data Capacity Building InitiativeOpen Data and Big Data Capacity Building Initiative
Open Data and Big Data Capacity Building Initiative
 
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
 
Open Data and Open Science in Agriculture : Experiences and Opinions
Open Data and Open Science in Agriculture : Experiences and Opinions Open Data and Open Science in Agriculture : Experiences and Opinions
Open Data and Open Science in Agriculture : Experiences and Opinions
 
Open Access, Open Data and Open Science in the context of agricultural research
Open Access, Open Data and Open Science in the context of agricultural researchOpen Access, Open Data and Open Science in the context of agricultural research
Open Access, Open Data and Open Science in the context of agricultural research
 
Introducing the GODAN Secretariat
Introducing the GODAN SecretariatIntroducing the GODAN Secretariat
Introducing the GODAN Secretariat
 
Research Data Management at International Food Policy Research Institute-IFPRI
Research Data Management at International Food Policy Research Institute-IFPRIResearch Data Management at International Food Policy Research Institute-IFPRI
Research Data Management at International Food Policy Research Institute-IFPRI
 
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
RDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsRDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developments
 
Turning three thesauri into a Global Agricultural Concept Scheme
Turning three thesauri into a  Global Agricultural Concept SchemeTurning three thesauri into a  Global Agricultural Concept Scheme
Turning three thesauri into a Global Agricultural Concept Scheme
 

Último

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 

Efficient & effective data management for research projects : ILRI's Data Management Platform

  • 1. Efficient & effective data management for research projects ILRI's Data Management Platform Carlos Quiros June, 2015
  • 2. • Back in 2011 • Current status • How we did it • Example of a process • CKAN • Key decisions made • Technology and skills required Contents
  • 3. Back in 2011 Survey design • Too many • Not common indicators • <> Variables • <> Calculations Survey implementation • Too many tools • No protocols • Poor field data cleaning • No standard process Storage • In files • Too many formats • Too many versions • Messy data cleaning • No accountability Availability & accessibility • Nothing Now Survey design • Too many • Common indicators • = Variables • = Calculations Storage • Server database • No formats • One version • Central cleaning • Accountability Availability & accessibility • CKAN • OData Survey implementation • 2 tools (ODK, CSPro) • Protocols • Field data cleaning • Standard process • Standard tools
  • 4. How we went around it Storage• Server database • How to integrate ODK and CSPro? • How to make it easy for scientists? • How to manage user decentralization? • Increase accountability? Availability and accessibility• What to use? CKAN, Dataverse, etc.  CKAN • How to extend it to serve our purpose? • How to integrate it with a server database? • How to manage our metadata and vocabularies? • How to do this? • Data interoperability? RDF, OData, Gdata, etc?  OData • How to do it? Survey implementation• Support only two tools • Wrote protocols • Wrote field data cleaning applications • Wrote policies and implementation plans • Wrote standard processes and tools for processing the data • Worked closely with teams • Created a central place for all the surveys • Separated surveys in modules • Worked on common indicators • Management supports this process Survey design (ongoing)
  • 5. Example of a process Testing & Review (.xls) Uploaded to Formhub to test account Testing & Review (ODK Collect) Ok ? Field Deployment Uploaded to Formhub to project account Data collection Upload data to Formhub End of Data Collecti on Sharing in Data Portal Data Cleaning from server using MySQL for Excel Detailed breakdown of ILRI’s RMD workflow with ODK Coding .doc  .xls Start Draft tool (.doc) Consultation Final tool (.doc) Who Code s RMG Staff Project Team Member Create MySQL schema with ODKToMySQL MySQL schema in server Convert data to JSON with FormhubToJSO N Data in JSON format Upload JSON into MySQL Schema with JSONToMySQL Metadata for portal Initialize META in schema S = Scientist input / usage S S S S S S S
  • 6. ILRI’s data portal (CKAN) – http://data.ilri.org/portal/ • CKAN? • The Open Knowledge Foundation • Biggest deployed data portal software • USA data portal • UK data portal • EU data portal • Open Africa • What do you get out of the box? • Create datasets with minimum metadata • Name, Abstract, Author, Date • Tags into controlled vocabulary • Powerful search engine • Public / private access to datasets • Able to attach resources (files) to a dataset • Data interoperability through powerful API and RDF • Arrange datasets into organization and topics • What can you do by creating extensions • Add new vocabularies (e.g., Language, Countries, etc.) • Add new metadata fields • Visualize different kinds of data (e.g., maps) • Change theme (colors, logos, fonts, etc.) • Create data hubs by harvesting other CKANs • What ever else you want…..
  • 7. Key decisions made • Use open source for all RDM Pros: • Bigger pool of tools • Flexible • Innovation Cons: • Complex skill set • Learning curve • Relational Database Management System (RDMS) Pros: • Central place • Auditing Cons: • DB management skill set • Scientist have no idea on how to work with a RDMS • CKAN Pros: • There is nothing better out there • Flexible and extendible Cons: • Programming in several languages is required • Learning curve
  • 8. Technology and skills required • Server • Linux (Ubuntu server) [Linux administration] • http://www.ubuntu.com/download/server • Database server • MySQL – An open source database system [DB administration, SQL] • http://www.mysql.com/ • Data processing software [Linux, C++, Python] • ODK – A toolset for collecting data on mobile devices. • https://opendatakit.org/ • CSPro – A software for creating data entry applications. • https://www.census.gov/population/international/software/cspro/ • Formhub – A software tools that collects ODK data. • https://github.com/SEL-Columbia/formhub • ODK Tools – A toolbox for processing ODK survey data into MySQL databases. • https://github.com/ilri/odktools • META – A toolbox for managing research data in MySQL databases. • https://github.com/ilri/meta • CSProTools – A toolbox for processing CSPro survey data into MySQL databases. • https://github.com/ilri/csprotools • Data sharing and interoperability • CKAN – The open source data portal software. [Linux, Python, WebDev] • http://ckan.org/ • http://docs.ckan.org/en/latest/maintaining/installing/index.html • http://docs.ckan.org/en/latest/extensions/index.html • Odata – Allow the creation and consumption of queryable and interoperable data resources in a simple and standard way. [Linux, Java, WebDev] • http://www.odata.org/
  • 9. Thank you Visit us @ http://data.ilri.org/

Notas do Editor

  1. 2
  2. 7