SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
Everyone can do 
data science" 
import.io webinar 23/9/14 
! 
Louis Dorard 
(@louisdorard)
US real estate portals:" 
- Realtor 
- Zillow 
- Trulia 
- …
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) 
3 1 860 1950 house 565,000 
3 1 1012 1951 house 
2 1.5 968 1976 townhouse 447,000 
4 1315 1950 house 648,000 
3 2 1599 1964 house 
3 2 987 1951 townhouse 790,000 
1 1 530 2007 condo 122,000 
4 2 1574 1964 house 835,000 
4 2001 house 855,000 
3 2.5 1472 2005 house 
4 3.5 1714 2005 townhouse 
2 2 1113 1999 condo 
1 769 1999 condo 315,000
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) 
3 1 860 1950 house 565,000 
3 1 1012 1951 house 
2 1.5 968 1976 townhouse 447,000 
4 1315 1950 house 648,000 
3 2 1599 1964 house 
3 2 987 1951 townhouse 790,000 
1 1 530 2007 condo 122,000 
4 2 1574 1964 house 835,000 
4 2001 house 855,000 
3 2.5 1472 2005 house 
4 3.5 1714 2005 townhouse 
2 2 1113 1999 condo 
1 769 1999 condo 315,000
Let’s create a real estate pricing 
model
Fabien Durand 
(@thefabiendurand) 
www.louisdorard.com/guest/everyone-can-do-data-science-importio
Data Science:" 
- domain knowledge 
- hacking abilities 
- machine learning
What the @#?~% is ML?
“Which type of email is this? 
— Spam/Ham”" 
-> Classification
“How much is this house worth? 
— X $” 
-> Regression
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) 
3 1 860 1950 house 565,000 
3 1 1012 1951 house 
2 1.5 968 1976 townhouse 447,000 
4 1315 1950 house 648,000 
3 2 1599 1964 house 
3 2 987 1951 townhouse 790,000 
1 1 530 2007 condo 122,000 
4 2 1574 1964 house 835,000 
4 2001 house 855,000 
3 2.5 1472 2005 house 
4 3.5 1714 2005 townhouse 
2 2 1113 1999 condo 
1 769 1999 condo 315,000
ML is a set of AI techniques 
where “intelligence” is built by 
referring to examples
??
“A significant constraint on 
realizing value from big data 
will be a shortage of talent, 
particularly of people with 
deep expertise in statistics 
and machine learning.” 
(McKinsey & Co.)
Making ML effortless 
(Bret Victor)
HTML / CSS / JavaScript
HTML / CSS / JavaScript
squarespace.com
The two phases of machine learning: 
• TRAIN a model 
• PREDICT with a model
The two methods of prediction APIs: 
• TRAIN a model 
• PREDICT with a model
The two methods of prediction APIs: 
• model = create_model(dataset)! 
• predicted_output = 
create_prediction(model, new_input)
from bigml.api import BigML 
! 
# create a model! 
api = BigML()! 
source = 
api.create_source('training_data.csv')! 
dataset = api.create_dataset(source)! 
model = api.create_model(dataset) 
! 
# make a prediction! 
prediction = 
api.create_prediction(model, new_input)! 
print "Predicted output value: 
",prediction['object']['output'] 
http://bit.ly/bigml_wakari
Recap
• Classification and regression 
• 2 phases in ML: train and predict 
• Prediction APIs make it easy to build 
models 
• Let’s use them on real estate data to 
predict price from house 
characteristics
• Encoding domain knowledge 
• Making our life easier: restricting 
data to only 1 city
BigML! 
• Look at data 
• Split into training and test 
• Build model from training 
• Evaluate model on test 
• Errors: mean absolute error (or 
percentage?)
Other import.io + BigML use cases:! 
- Predict ebook rating from description 
- Predict sales of etsy stores
Talk at #APIconUK! 
tomorrow in London
ML Algorithm API 
Automated Pred. API 
Text Classification API 
Vertical Pred. API 
Fixed-model Pred. API 
ABSTRACTION
www.louisdorard.com/machine-learning-book 
50% off for 24 hours with code “importio” 
! 
! 
@louisdorard
Everyone can do data science — import.io webinar

Mais conteúdo relacionado

Destaque

Russia 1917 41 revision notes
Russia 1917 41 revision notesRussia 1917 41 revision notes
Russia 1917 41 revision notesKatie B
 
Trusting AI with important decisions
Trusting AI with important decisionsTrusting AI with important decisions
Trusting AI with important decisionsLouis Dorard
 
Bootstrapping Machine Learning
Bootstrapping Machine LearningBootstrapping Machine Learning
Bootstrapping Machine LearningLouis Dorard
 
Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in businessLouis Dorard
 
割り当てゲームの考察
割り当てゲームの考察割り当てゲームの考察
割り当てゲームの考察stucon
 
100 content marketing examples
100 content marketing examples100 content marketing examples
100 content marketing examplesAshley Lenz
 
EMC World 2016 - cnaITL.06 Containers are not Cloud Native
EMC World 2016 - cnaITL.06 Containers are not Cloud NativeEMC World 2016 - cnaITL.06 Containers are not Cloud Native
EMC World 2016 - cnaITL.06 Containers are not Cloud Native{code}
 
What is data science
What is data scienceWhat is data science
What is data scienceJohn Spencer
 
今日1日おつかれさま、これはマレーシアのおいしい果物 Delicious Fruits of Malaysia
今日1日おつかれさま、これはマレーシアのおいしい果物 Delicious Fruits of Malaysia今日1日おつかれさま、これはマレーシアのおいしい果物 Delicious Fruits of Malaysia
今日1日おつかれさま、これはマレーシアのおいしい果物 Delicious Fruits of MalaysiaYama San
 
Star 8-solar-tile-presentation
Star 8-solar-tile-presentationStar 8-solar-tile-presentation
Star 8-solar-tile-presentationÔng Râu
 
Communicated deadlines = bad quality
Communicated deadlines = bad qualityCommunicated deadlines = bad quality
Communicated deadlines = bad qualityJohan Hoberg
 
Himpervinculos 1 km
Himpervinculos  1 kmHimpervinculos  1 km
Himpervinculos 1 kmKatia Vega
 
συναντηση υπουργειου πεχωδε
συναντηση υπουργειου πεχωδεσυναντηση υπουργειου πεχωδε
συναντηση υπουργειου πεχωδεATHANASIOS KAVVADAS
 
周宏桥产品创新实战体系
周宏桥产品创新实战体系周宏桥产品创新实战体系
周宏桥产品创新实战体系kevinlu
 

Destaque (14)

Russia 1917 41 revision notes
Russia 1917 41 revision notesRussia 1917 41 revision notes
Russia 1917 41 revision notes
 
Trusting AI with important decisions
Trusting AI with important decisionsTrusting AI with important decisions
Trusting AI with important decisions
 
Bootstrapping Machine Learning
Bootstrapping Machine LearningBootstrapping Machine Learning
Bootstrapping Machine Learning
 
Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in business
 
割り当てゲームの考察
割り当てゲームの考察割り当てゲームの考察
割り当てゲームの考察
 
100 content marketing examples
100 content marketing examples100 content marketing examples
100 content marketing examples
 
EMC World 2016 - cnaITL.06 Containers are not Cloud Native
EMC World 2016 - cnaITL.06 Containers are not Cloud NativeEMC World 2016 - cnaITL.06 Containers are not Cloud Native
EMC World 2016 - cnaITL.06 Containers are not Cloud Native
 
What is data science
What is data scienceWhat is data science
What is data science
 
今日1日おつかれさま、これはマレーシアのおいしい果物 Delicious Fruits of Malaysia
今日1日おつかれさま、これはマレーシアのおいしい果物 Delicious Fruits of Malaysia今日1日おつかれさま、これはマレーシアのおいしい果物 Delicious Fruits of Malaysia
今日1日おつかれさま、これはマレーシアのおいしい果物 Delicious Fruits of Malaysia
 
Star 8-solar-tile-presentation
Star 8-solar-tile-presentationStar 8-solar-tile-presentation
Star 8-solar-tile-presentation
 
Communicated deadlines = bad quality
Communicated deadlines = bad qualityCommunicated deadlines = bad quality
Communicated deadlines = bad quality
 
Himpervinculos 1 km
Himpervinculos  1 kmHimpervinculos  1 km
Himpervinculos 1 km
 
συναντηση υπουργειου πεχωδε
συναντηση υπουργειου πεχωδεσυναντηση υπουργειου πεχωδε
συναντηση υπουργειου πεχωδε
 
周宏桥产品创新实战体系
周宏桥产品创新实战体系周宏桥产品创新实战体系
周宏桥产品创新实战体系
 

Mais de Louis Dorard

From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...Louis Dorard
 
Predictive apps for startups
Predictive apps for startupsPredictive apps for startups
Predictive apps for startupsLouis Dorard
 
Pragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainPragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainLouis Dorard
 
Intro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixIntro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixLouis Dorard
 
A developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIsA developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIsLouis Dorard
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine LearningLouis Dorard
 
Using predictive APIs to create smarter apps
Using predictive APIs to create smarter appsUsing predictive APIs to create smarter apps
Using predictive APIs to create smarter appsLouis Dorard
 
Predictive APIs at APIdays Berlin
Predictive APIs at APIdays BerlinPredictive APIs at APIdays Berlin
Predictive APIs at APIdays BerlinLouis Dorard
 
Pragmatic machine learning for the real world
Pragmatic machine learning for the real worldPragmatic machine learning for the real world
Pragmatic machine learning for the real worldLouis Dorard
 

Mais de Louis Dorard (9)

From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
 
Predictive apps for startups
Predictive apps for startupsPredictive apps for startups
Predictive apps for startups
 
Pragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainPragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML Spain
 
Intro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixIntro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMix
 
A developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIsA developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIs
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
Using predictive APIs to create smarter apps
Using predictive APIs to create smarter appsUsing predictive APIs to create smarter apps
Using predictive APIs to create smarter apps
 
Predictive APIs at APIdays Berlin
Predictive APIs at APIdays BerlinPredictive APIs at APIdays Berlin
Predictive APIs at APIdays Berlin
 
Pragmatic machine learning for the real world
Pragmatic machine learning for the real worldPragmatic machine learning for the real world
Pragmatic machine learning for the real world
 

Último

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Último (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Everyone can do data science — import.io webinar

  • 1. Everyone can do data science" import.io webinar 23/9/14 ! Louis Dorard (@louisdorard)
  • 2.
  • 3.
  • 4. US real estate portals:" - Realtor - Zillow - Trulia - …
  • 5. Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) 3 1 860 1950 house 565,000 3 1 1012 1951 house 2 1.5 968 1976 townhouse 447,000 4 1315 1950 house 648,000 3 2 1599 1964 house 3 2 987 1951 townhouse 790,000 1 1 530 2007 condo 122,000 4 2 1574 1964 house 835,000 4 2001 house 855,000 3 2.5 1472 2005 house 4 3.5 1714 2005 townhouse 2 2 1113 1999 condo 1 769 1999 condo 315,000
  • 6. Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) 3 1 860 1950 house 565,000 3 1 1012 1951 house 2 1.5 968 1976 townhouse 447,000 4 1315 1950 house 648,000 3 2 1599 1964 house 3 2 987 1951 townhouse 790,000 1 1 530 2007 condo 122,000 4 2 1574 1964 house 835,000 4 2001 house 855,000 3 2.5 1472 2005 house 4 3.5 1714 2005 townhouse 2 2 1113 1999 condo 1 769 1999 condo 315,000
  • 7. Let’s create a real estate pricing model
  • 8. Fabien Durand (@thefabiendurand) www.louisdorard.com/guest/everyone-can-do-data-science-importio
  • 9.
  • 10. Data Science:" - domain knowledge - hacking abilities - machine learning
  • 11. What the @#?~% is ML?
  • 12.
  • 13. “Which type of email is this? — Spam/Ham”" -> Classification
  • 14.
  • 15. “How much is this house worth? — X $” -> Regression
  • 16. Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) 3 1 860 1950 house 565,000 3 1 1012 1951 house 2 1.5 968 1976 townhouse 447,000 4 1315 1950 house 648,000 3 2 1599 1964 house 3 2 987 1951 townhouse 790,000 1 1 530 2007 condo 122,000 4 2 1574 1964 house 835,000 4 2001 house 855,000 3 2.5 1472 2005 house 4 3.5 1714 2005 townhouse 2 2 1113 1999 condo 1 769 1999 condo 315,000
  • 17.
  • 18. ML is a set of AI techniques where “intelligence” is built by referring to examples
  • 19.
  • 20. ??
  • 21. “A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning.” (McKinsey & Co.)
  • 22. Making ML effortless (Bret Victor)
  • 23.
  • 24. HTML / CSS / JavaScript
  • 25. HTML / CSS / JavaScript
  • 27.
  • 28.
  • 29. The two phases of machine learning: • TRAIN a model • PREDICT with a model
  • 30. The two methods of prediction APIs: • TRAIN a model • PREDICT with a model
  • 31. The two methods of prediction APIs: • model = create_model(dataset)! • predicted_output = create_prediction(model, new_input)
  • 32. from bigml.api import BigML ! # create a model! api = BigML()! source = api.create_source('training_data.csv')! dataset = api.create_dataset(source)! model = api.create_model(dataset) ! # make a prediction! prediction = api.create_prediction(model, new_input)! print "Predicted output value: ",prediction['object']['output'] http://bit.ly/bigml_wakari
  • 33.
  • 34. Recap
  • 35. • Classification and regression • 2 phases in ML: train and predict • Prediction APIs make it easy to build models • Let’s use them on real estate data to predict price from house characteristics
  • 36. • Encoding domain knowledge • Making our life easier: restricting data to only 1 city
  • 37.
  • 38. BigML! • Look at data • Split into training and test • Build model from training • Evaluate model on test • Errors: mean absolute error (or percentage?)
  • 39.
  • 40. Other import.io + BigML use cases:! - Predict ebook rating from description - Predict sales of etsy stores
  • 41. Talk at #APIconUK! tomorrow in London
  • 42. ML Algorithm API Automated Pred. API Text Classification API Vertical Pred. API Fixed-model Pred. API ABSTRACTION
  • 43.
  • 44.
  • 45. www.louisdorard.com/machine-learning-book 50% off for 24 hours with code “importio” ! ! @louisdorard