SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Data Science
Harnessing Open Data for high impact solutions
About:Me
Mohd Izhar Firdaus Ismail
- Current: Solution Architect @ ABYRES Enterprise
Technologies Sdn Bhd
- Open Source Activist & (self-proclaimed) Hacker, Open Data
Advocate, Fedora Ambassador, Data Architect, Data Engineer,
Consultant, Python Programmer, Analyst, Trainer, and bunch of
other hats ;-)
- Contributing to Open Source projects for over 8 years
- Over 6 years building systems related to data, content,
information and knowledge management
- http://linkedin.com/in/kagesenshi
Disclaimer:
Some people call me a data scientist,
But I don't consider myself one (yet)
(( its a personal integrity thing – Machine Learning & Stats is not (yet) my strong point ))
But I do work a lot with data: designing application, infrastructure,
algorithms, processes and pipelines for big data workload – from data
acquisition to visualization
"Real" Data Scientists
are one heck of a super(wo)man
Infographic source: MarketingDistillery.com
Open Data Apps Around The World
What you can do with quality Open Data
(and a glimpse of what nice stuff other people have ^.^)
Data.gov (United States)
- One of the earliest Government Open
Data initiative
- Over 159576 dataset from all over US
government agencies (as of 14th
Aug
2015)
- NGOs such as Code For America
building apps using data from it
- Companies leveraging on data for
their own startups and business
Data.gov : Alternative Fuels Station Locator
Benefit / Impact:
Help individuals
locate nearby
alternative fuel
stations (electric,
hydrogen, biodiesel,
etc)
Data from:
US Department of
Energy
Data.gov : Climate.com
Benefit / Impact:
Help farmers plan their
farming activities based
on weather conditions
Data from:
- National Weather
Service,
- US Geological Survey
- National Aeronautics
and Space
Administration
Data.gov : College Affordability and Transparency Center
Benefit / Impact:
Enable students to make
informed decision on choosing
where to further their studies
based on their budget
Data from:
Department of Education –
National Center for Education
Statistics
Data.gov.uk (United Kingdom)
- 1st
ranking in international
Open Data Initiative (ODI)'s
Open Data Barometer
- Over 22946 dataset (as of
14th
Aug 2015)
- 378 apps (as of 14th
Aug
2015)
Data.gov.uk : CrimeInEngland.co.uk
Benefit / Impact
Enable citizen to be
more aware of crime
rate in their area, and
take necessary
measures
Data from:
UK HomeOffice
Data.gov.uk : WhereDoesMyMoneyGo.org
Benefit / Impact
Better government
transparency. More
informed citizens on
tax spendings.
Data from:
UK Her Majesty
Treasury
Getting Started
Some tips for beginners
Bulk of your data
related work would
be in cleaning data
- Excel to JSON/CSV
- PDF to JSON/CSV
- Unstructured to structured
- Joining multiple data sources into one, where
joining key is not obvious
- Normalizing duplicates, errors, typos, language, etc
- Dealing with inconsistent schema of historical data
- Extracting more features of data points
- Enriching data with more useful information (eg: long,lat)
- Dealing with data that was poorly collected
- Dealing with aggregated data that is not quite useful
- Real-life data is a mess: SNAFU ;-)
Analytic Tools & Platform
Plenty Open Source Tools available
- Simple data and analysis can be done without the need of complex Big Data
ecosystem. A ${YourFavouriteLanguage} executable is usually more than
enough to transform, clean, explore data to get initial insights and understanding
- I speak mostly in snake language, so naturally I prefer Python stuff ;-)
– Python is a strong language in scientific computing due to its history in mathematics, its
rich open source library ecosystem, and its simplicity for rapid experimentation
– Pandas, numpy, scipy, pymapreduce, xlrd, pyexcel, scikit, luigi, vaderSentiment, etc
- D3.js is highly recommended for development of data driven visualizations for
web
– Plenty of other javascript libraries to help render beautiful diagrams
My Personal
Favourites :
IPython Notebook & Python libraries
Apache Zeppelin, PySpark
& Python libs
"Small" data
"Big data"
Hortonworks HDP Sandbox
(Pig, Hive, Spark, and friends)
Amazon EMR
(large cluster to crunch your data)
Goodluck!!
And most importantly,
Have Fun!!
Izhar Firdaus <izhar@abyres.net>
http://linkedin.com/in/kagesenshi

Mais conteúdo relacionado

Mais procurados

Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overviewColleen Farrelly
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science clubData Science Club
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyPeter Kua
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceVignesh Prajapati
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligencehktripathy
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data ScienceActonRoy
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBernard Marr
 

Mais procurados (20)

Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
 
Data Science
Data ScienceData Science
Data Science
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Data science
Data scienceData science
Data science
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business Needs
 
data science
data sciencedata science
data science
 

Semelhante a Data Science: Harnessing Open Data for High Impact Solutions

Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Knowledge Nepal
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsBrand Niemann
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euEUDAT
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Research Data Alliance
 
Data Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxData Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxsa3302
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataMartin Kaltenböck
 
Briefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data ApproachBriefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data Approach3 Round Stones
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020Sarah Jones
 
Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong KongSammy Fung
 
Dart ord the citizen's persepctive-20141107
Dart ord the citizen's persepctive-20141107Dart ord the citizen's persepctive-20141107
Dart ord the citizen's persepctive-20141107Andre Golliez
 
data analytics lecture2.pptx
data analytics lecture2.pptxdata analytics lecture2.pptx
data analytics lecture2.pptxNamrataBhatt8
 
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are goingEuropean Data Forum
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 

Semelhante a Data Science: Harnessing Open Data for High Impact Solutions (20)

Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial Intelligence
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data Dashboards
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.eu
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
Data Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxData Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptx
 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IP
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
 
Briefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data ApproachBriefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data Approach
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong Kong
 
Dart ord the citizen's persepctive-20141107
Dart ord the citizen's persepctive-20141107Dart ord the citizen's persepctive-20141107
Dart ord the citizen's persepctive-20141107
 
data analytics lecture2.pptx
data analytics lecture2.pptxdata analytics lecture2.pptx
data analytics lecture2.pptx
 
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
On Big Data
On Big DataOn Big Data
On Big Data
 

Último

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Data Science: Harnessing Open Data for High Impact Solutions

  • 1. Data Science Harnessing Open Data for high impact solutions
  • 2. About:Me Mohd Izhar Firdaus Ismail - Current: Solution Architect @ ABYRES Enterprise Technologies Sdn Bhd - Open Source Activist & (self-proclaimed) Hacker, Open Data Advocate, Fedora Ambassador, Data Architect, Data Engineer, Consultant, Python Programmer, Analyst, Trainer, and bunch of other hats ;-) - Contributing to Open Source projects for over 8 years - Over 6 years building systems related to data, content, information and knowledge management - http://linkedin.com/in/kagesenshi
  • 3. Disclaimer: Some people call me a data scientist, But I don't consider myself one (yet) (( its a personal integrity thing – Machine Learning & Stats is not (yet) my strong point )) But I do work a lot with data: designing application, infrastructure, algorithms, processes and pipelines for big data workload – from data acquisition to visualization
  • 4. "Real" Data Scientists are one heck of a super(wo)man Infographic source: MarketingDistillery.com
  • 5.
  • 6. Open Data Apps Around The World What you can do with quality Open Data (and a glimpse of what nice stuff other people have ^.^)
  • 7. Data.gov (United States) - One of the earliest Government Open Data initiative - Over 159576 dataset from all over US government agencies (as of 14th Aug 2015) - NGOs such as Code For America building apps using data from it - Companies leveraging on data for their own startups and business
  • 8. Data.gov : Alternative Fuels Station Locator Benefit / Impact: Help individuals locate nearby alternative fuel stations (electric, hydrogen, biodiesel, etc) Data from: US Department of Energy
  • 9. Data.gov : Climate.com Benefit / Impact: Help farmers plan their farming activities based on weather conditions Data from: - National Weather Service, - US Geological Survey - National Aeronautics and Space Administration
  • 10. Data.gov : College Affordability and Transparency Center Benefit / Impact: Enable students to make informed decision on choosing where to further their studies based on their budget Data from: Department of Education – National Center for Education Statistics
  • 11. Data.gov.uk (United Kingdom) - 1st ranking in international Open Data Initiative (ODI)'s Open Data Barometer - Over 22946 dataset (as of 14th Aug 2015) - 378 apps (as of 14th Aug 2015)
  • 12. Data.gov.uk : CrimeInEngland.co.uk Benefit / Impact Enable citizen to be more aware of crime rate in their area, and take necessary measures Data from: UK HomeOffice
  • 13. Data.gov.uk : WhereDoesMyMoneyGo.org Benefit / Impact Better government transparency. More informed citizens on tax spendings. Data from: UK Her Majesty Treasury
  • 14. Getting Started Some tips for beginners
  • 15. Bulk of your data related work would be in cleaning data - Excel to JSON/CSV - PDF to JSON/CSV - Unstructured to structured - Joining multiple data sources into one, where joining key is not obvious - Normalizing duplicates, errors, typos, language, etc - Dealing with inconsistent schema of historical data - Extracting more features of data points - Enriching data with more useful information (eg: long,lat) - Dealing with data that was poorly collected - Dealing with aggregated data that is not quite useful - Real-life data is a mess: SNAFU ;-)
  • 16. Analytic Tools & Platform Plenty Open Source Tools available - Simple data and analysis can be done without the need of complex Big Data ecosystem. A ${YourFavouriteLanguage} executable is usually more than enough to transform, clean, explore data to get initial insights and understanding - I speak mostly in snake language, so naturally I prefer Python stuff ;-) – Python is a strong language in scientific computing due to its history in mathematics, its rich open source library ecosystem, and its simplicity for rapid experimentation – Pandas, numpy, scipy, pymapreduce, xlrd, pyexcel, scikit, luigi, vaderSentiment, etc - D3.js is highly recommended for development of data driven visualizations for web – Plenty of other javascript libraries to help render beautiful diagrams
  • 17. My Personal Favourites : IPython Notebook & Python libraries Apache Zeppelin, PySpark & Python libs "Small" data "Big data" Hortonworks HDP Sandbox (Pig, Hive, Spark, and friends) Amazon EMR (large cluster to crunch your data)
  • 18. Goodluck!! And most importantly, Have Fun!! Izhar Firdaus <izhar@abyres.net> http://linkedin.com/in/kagesenshi