O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Iterative data discovery and transformation with open refine

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 11 Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Iterative data discovery and transformation with open refine (20)

Anúncio

Mais recentes (20)

Iterative data discovery and transformation with open refine

  1. 1. Martin Magdinier - @magdmartin 1 Iterative data discovery and transformation with Martin Magdinier - @magdmartin OpenRefine - @OpenRefine http://openrefine.org
  2. 2. Martin Magdinier - @magdmartin 2 80% of data analysis is spent on the process of cleaning, transformation and integration
  3. 3. Martin Magdinier - @magdmartin 3 • Duplicate value & Typos • Multi value cells • Data in the wrong field • Missing / Partial Values • Encoding Errors • Change format (text, number, date) • Flat to relational data set • Schema alignment • Transpose rows and columns • Join data-set • Enrichment from other sources (MDM, API calls) Data Quality & Integration & Is Time Consuming
  4. 4. Martin Magdinier - @magdmartin 4 OpenRefine Bridges The Skill Gap DBA ETL Data Science Spreadsheet User Data Visualization / Interpretation Data Preparation Understand The Data (Business Skills) Know How To Transform Data (Technical Skills) User Base
  5. 5. Martin Magdinier - @magdmartin 5 • SaaS and on-premise solution for extra compute power, collaboration and lightweight ETL • On demand training • Custom development • Free & Open Source • Community developed for 5 years • Available on local machine only • 5,000+ monthly download • Strong user base with Open Data, Library, Semantic web and Bio Science Semantic WebLibraryBio ScienceOpen Data
  6. 6. Martin Magdinier - @magdmartin 6 Data Engineer Scale & Automate Processes Data Quality Manage Master Data Agile Data Process
  7. 7. Martin Magdinier - @magdmartin 7 Data Engineer Scale & Automate Processes Data Quality Manage Master Data Data Scientist Develop Machine Learning & Data Analysis Model Agile Data Process
  8. 8. Martin Magdinier - @magdmartin 8 Data Engineer IT Support Governance Access To Data Scale & Automate Processes Data Quality Manage Master Data Data Scientist Discovery Data Wrangling Profiling Preparation Quality Integration Agile Data Process Business Analyst Develop Machine Learning & Data Analysis Model Sense Making Data Exploration Reporting Analysis Scale Real -Time Lightweight ETL Migration
  9. 9. Martin Magdinier - @magdmartin 9 Business Analyst Data Engineer IT Support Governance Access To Data Scale & Automate Processes Data Quality Manage Master Data Data Scientist Discovery Data Wrangling Profiling Preparation Quality Integration Agile Data Process Develop Machine Learning & Data Analysis Model ETL Tools
  10. 10. Martin Magdinier - @magdmartin 10 Demo: 2014 Toronto Cleared Building Permits http://ow.ly/Js8GD Data Discovery 1. What of Permit Type are issued? 2. Explore Previous usage ; Application Date & Dwelling Units Created Data Preparation 1. Geocode with Google Maps API 2. Map Construction with over 10 new Dwelling Units Created
  11. 11. Martin Magdinier - @magdmartin 11 Iterative data discovery and transformation with Martin Magdinier - @magdmartin OpenRefine - @OpenRefine http://openrefine.org

×