SlideShare a Scribd company logo
1 of 14
Download to read offline
Small cars areSmall cars are
dangerous!dangerous!
Willem Hendriks
Data Scientist IBM
willem.hendriks@nl.ibm.com
https://github.com/willemhendriks
Nice to be in
Groningen again!
“More data usually beats
better algorithms”Anand Rajaraman (when teaching at Stanford)
http://anand.typepad.com/datawocky/2008/03/more-data-usual.html
What I learned in Groningen...What I learned in Groningen... What I am doing now...What I am doing now...
Parallel Computing is not easy....
Google Trends of “Apache Spark”
Apache Spark™ is a fast
and general engine for
large-scale data
processing.
Why Spark?
(4) Nice library!
Is it really that easy &Is it really that easy &
quick?quick?
Best deal if you want a Mercedes MLBest deal if you want a Mercedes ML Best place to have dinner in BrusselsBest place to have dinner in Brussels
(and have a walk afterwards)(and have a walk afterwards)
Let's combine police reports datasets &
marktplaats advertisements...
(not big data, just a toy example of spark)
Do thieves like certain
neighborhoods with certain
items?
Download advertisement data with script
Find postal code of each neighborhood
Combine in Apache Spark
Scale models are an indication for burglary!
Check markplaats.nl if more than 70
advertisements are in a radius of 600
meter!!!!
Maybe markplaats.nl advertisements can predict.....
House-pricing trends? Crime? Education level?
They have something!!!
If you were asked to build a model, on the Netherlands, what
tool would you use?
*dataset too small to make this statement
Try yourself! (GB's limited)Try yourself! (GB's limited)
●
Mix with various Services, e.g. Hadoop/NoSQLMix with various Services, e.g. Hadoop/NoSQL
●
Free Trial & Paid (with Serious Power)Free Trial & Paid (with Serious Power)
●
Made for the App DeveloperMade for the App Developer
● Run Spark Online
● (Various) Notebook, to use for Python, Scala, & R
● Free, perfect to start & learn! (examples)
● Made for the Data Scientist
Try yourself! (GB's limited)Try yourself! (GB's limited)
IBM Will: “Educate one million data scientistsIBM Will: “Educate one million data scientists
and data engineers on Apache Spark throughand data engineers on Apache Spark through
extensive partnerships with AMPLab,extensive partnerships with AMPLab,
DataCamp, MetiStream, Galvanize and BigDataCamp, MetiStream, Galvanize and Big
Data University MOOC”Data University MOOC”
Join us, & start today at the BIG DATAJoin us, & start today at the BIG DATA
University! https://bigdatauniversity.com/University! https://bigdatauniversity.com/
Spark Hackathon Coming soon in NL!Spark Hackathon Coming soon in NL!
IBM Wants YOU to learn
spark!
Questions about...Questions about...
Start with Spark?Start with Spark?
IBM & Spark?IBM & Spark?
Markplaats.nl?Markplaats.nl?
Code will be on GithubCode will be on Github
(after cleaning)(after cleaning)
Thank you!Thank you!
Willem Hendriks
06 2240 8900
Data Scientist IBM
willem.hendriks@nl.ibm.com
https://github.com/willemhendriks

More Related Content

What's hot

Powers of Ten Redux
Powers of Ten ReduxPowers of Ten Redux
Powers of Ten ReduxJason Plurad
 
Computing at scale
Computing at scaleComputing at scale
Computing at scalejerjou
 
Integrating an editorial calendar with Drupal / Thunder
Integrating an editorial calendar with Drupal / ThunderIntegrating an editorial calendar with Drupal / Thunder
Integrating an editorial calendar with Drupal / ThunderDesk-Net
 
IOGDC - McKeel presentation on mashups and OpenEI
IOGDC - McKeel presentation on mashups and OpenEIIOGDC - McKeel presentation on mashups and OpenEI
IOGDC - McKeel presentation on mashups and OpenEIpianory
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIMESlides
 
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Formulatedby
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformKNIMESlides
 
Talk in Google fest 2013
Talk in Google fest 2013Talk in Google fest 2013
Talk in Google fest 2013David Chen
 
Introduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data PlatformIntroduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data PlatformMargriet Groenendijk
 
Instabrand - Insight Project
Instabrand - Insight ProjectInstabrand - Insight Project
Instabrand - Insight ProjectKyle Schmidt
 
Streaming data mining
Streaming data miningStreaming data mining
Streaming data miningAnkit Solanki
 
Big data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataBig data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataChristos Hadjinikolis
 

What's hot (12)

Powers of Ten Redux
Powers of Ten ReduxPowers of Ten Redux
Powers of Ten Redux
 
Computing at scale
Computing at scaleComputing at scale
Computing at scale
 
Integrating an editorial calendar with Drupal / Thunder
Integrating an editorial calendar with Drupal / ThunderIntegrating an editorial calendar with Drupal / Thunder
Integrating an editorial calendar with Drupal / Thunder
 
IOGDC - McKeel presentation on mashups and OpenEI
IOGDC - McKeel presentation on mashups and OpenEIIOGDC - McKeel presentation on mashups and OpenEI
IOGDC - McKeel presentation on mashups and OpenEI
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
 
Talk in Google fest 2013
Talk in Google fest 2013Talk in Google fest 2013
Talk in Google fest 2013
 
Introduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data PlatformIntroduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data Platform
 
Instabrand - Insight Project
Instabrand - Insight ProjectInstabrand - Insight Project
Instabrand - Insight Project
 
Streaming data mining
Streaming data miningStreaming data mining
Streaming data mining
 
Big data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataBig data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big Data
 

Viewers also liked

아이씨엔 매거진- August 2012. Industrial Communication Network MAGAZINE
아이씨엔 매거진- August 2012. Industrial Communication Network MAGAZINE아이씨엔 매거진- August 2012. Industrial Communication Network MAGAZINE
아이씨엔 매거진- August 2012. Industrial Communication Network MAGAZINESeungMo Oh
 
Presentación power point (12) p1 valle de la pascua
Presentación power point (12) p1 valle de la pascuaPresentación power point (12) p1 valle de la pascua
Presentación power point (12) p1 valle de la pascuaMEILYN LISETH BELLO PAEZ
 
باید و نبایدهای بازاریابی ایمیلی
باید و نبایدهای بازاریابی ایمیلیباید و نبایدهای بازاریابی ایمیلی
باید و نبایدهای بازاریابی ایمیلیMohammad Amin Nobakht
 
Manual avanzado de redes sociales para destinos turisticos (2/2): Youtube e I...
Manual avanzado de redes sociales para destinos turisticos (2/2): Youtube e I...Manual avanzado de redes sociales para destinos turisticos (2/2): Youtube e I...
Manual avanzado de redes sociales para destinos turisticos (2/2): Youtube e I...David Giner Sánchez
 
Anteproyecto de edificación
Anteproyecto de edificaciónAnteproyecto de edificación
Anteproyecto de edificaciónmancomar
 
아이씨엔 매거진- May 2012. Industrial Communication Network MAGAZINE
아이씨엔 매거진- May 2012. Industrial Communication Network MAGAZINE아이씨엔 매거진- May 2012. Industrial Communication Network MAGAZINE
아이씨엔 매거진- May 2012. Industrial Communication Network MAGAZINESeungMo Oh
 
Shareholder Activism & The Rise of Shareholder Value
Shareholder Activism & The Rise of Shareholder ValueShareholder Activism & The Rise of Shareholder Value
Shareholder Activism & The Rise of Shareholder ValueWSD Capital Management
 
transformadores, refrijeracion conexiones y analisis
transformadores, refrijeracion conexiones y analisistransformadores, refrijeracion conexiones y analisis
transformadores, refrijeracion conexiones y analisisMaximiliano Garcia
 
Modulo 4 transformadores
Modulo 4 transformadoresModulo 4 transformadores
Modulo 4 transformadoresjohander suarez
 
Desenvolvimento economico sec_xii
Desenvolvimento economico sec_xiiDesenvolvimento economico sec_xii
Desenvolvimento economico sec_xiiIsabel Ribeiro
 
Anthro30 7 characteristics of culture
Anthro30   7 characteristics of cultureAnthro30   7 characteristics of culture
Anthro30 7 characteristics of cultureYvan Gumbao
 
Cómo sacar provecho de la explosión de datos smart big data
Cómo sacar provecho de la explosión de datos smart big dataCómo sacar provecho de la explosión de datos smart big data
Cómo sacar provecho de la explosión de datos smart big dataAlet & CO
 

Viewers also liked (20)

아이씨엔 매거진- August 2012. Industrial Communication Network MAGAZINE
아이씨엔 매거진- August 2012. Industrial Communication Network MAGAZINE아이씨엔 매거진- August 2012. Industrial Communication Network MAGAZINE
아이씨엔 매거진- August 2012. Industrial Communication Network MAGAZINE
 
Presentación power point (12) p1 valle de la pascua
Presentación power point (12) p1 valle de la pascuaPresentación power point (12) p1 valle de la pascua
Presentación power point (12) p1 valle de la pascua
 
Análisis del sitio
Análisis del sitioAnálisis del sitio
Análisis del sitio
 
باید و نبایدهای بازاریابی ایمیلی
باید و نبایدهای بازاریابی ایمیلیباید و نبایدهای بازاریابی ایمیلی
باید و نبایدهای بازاریابی ایمیلی
 
Manual avanzado de redes sociales para destinos turisticos (2/2): Youtube e I...
Manual avanzado de redes sociales para destinos turisticos (2/2): Youtube e I...Manual avanzado de redes sociales para destinos turisticos (2/2): Youtube e I...
Manual avanzado de redes sociales para destinos turisticos (2/2): Youtube e I...
 
Marketing Research
Marketing ResearchMarketing Research
Marketing Research
 
Anteproyecto de edificación
Anteproyecto de edificaciónAnteproyecto de edificación
Anteproyecto de edificación
 
아이씨엔 매거진- May 2012. Industrial Communication Network MAGAZINE
아이씨엔 매거진- May 2012. Industrial Communication Network MAGAZINE아이씨엔 매거진- May 2012. Industrial Communication Network MAGAZINE
아이씨엔 매거진- May 2012. Industrial Communication Network MAGAZINE
 
Shareholder Activism & The Rise of Shareholder Value
Shareholder Activism & The Rise of Shareholder ValueShareholder Activism & The Rise of Shareholder Value
Shareholder Activism & The Rise of Shareholder Value
 
transformadores, refrijeracion conexiones y analisis
transformadores, refrijeracion conexiones y analisistransformadores, refrijeracion conexiones y analisis
transformadores, refrijeracion conexiones y analisis
 
Modulo 4 transformadores
Modulo 4 transformadoresModulo 4 transformadores
Modulo 4 transformadores
 
Desenvolvimento economico sec_xii
Desenvolvimento economico sec_xiiDesenvolvimento economico sec_xii
Desenvolvimento economico sec_xii
 
Anthro30 7 characteristics of culture
Anthro30   7 characteristics of cultureAnthro30   7 characteristics of culture
Anthro30 7 characteristics of culture
 
Cómo sacar provecho de la explosión de datos smart big data
Cómo sacar provecho de la explosión de datos smart big dataCómo sacar provecho de la explosión de datos smart big data
Cómo sacar provecho de la explosión de datos smart big data
 
Ethics & Politics
Ethics & PoliticsEthics & Politics
Ethics & Politics
 
Armonia y contraste
Armonia y contrasteArmonia y contraste
Armonia y contraste
 
I phone6 pros -cons
I phone6 pros -consI phone6 pros -cons
I phone6 pros -cons
 
Arte paleocristã
Arte paleocristãArte paleocristã
Arte paleocristã
 
A sociedade medieval
A sociedade medievalA sociedade medieval
A sociedade medieval
 
Arquitetura grega
Arquitetura gregaArquitetura grega
Arquitetura grega
 

Similar to Big data groningen

Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Chris Fregly
 
STC Design - Engage
STC Design - EngageSTC Design - Engage
STC Design - Engagesparktc
 
Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Chris Fregly
 
Anything data (revisited)
Anything data (revisited)Anything data (revisited)
Anything data (revisited)Ahmet Akyol
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAhmet Akyol
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefitsJohan Picard
 
Design for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabDesign for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabAmanda Casari
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016  Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016 Chris Fregly
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7Paul Lo
 
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChris Fregly
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Chris Fregly
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...Athens Big Data
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Chris Fregly
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingTrieu Nguyen
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Chris Fregly
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 

Similar to Big data groningen (20)

STC Design
STC DesignSTC Design
STC Design
 
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
 
STC Design - Engage
STC Design - EngageSTC Design - Engage
STC Design - Engage
 
Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016
 
Anything data (revisited)
Anything data (revisited)Anything data (revisited)
Anything data (revisited)
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefits
 
Design for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabDesign for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLab
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016  Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB Testing
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Big data groningen

  • 1. Small cars areSmall cars are dangerous!dangerous! Willem Hendriks Data Scientist IBM willem.hendriks@nl.ibm.com https://github.com/willemhendriks
  • 2. Nice to be in Groningen again!
  • 3. “More data usually beats better algorithms”Anand Rajaraman (when teaching at Stanford) http://anand.typepad.com/datawocky/2008/03/more-data-usual.html What I learned in Groningen...What I learned in Groningen... What I am doing now...What I am doing now...
  • 4. Parallel Computing is not easy....
  • 5. Google Trends of “Apache Spark” Apache Spark™ is a fast and general engine for large-scale data processing.
  • 7. Is it really that easy &Is it really that easy & quick?quick? Best deal if you want a Mercedes MLBest deal if you want a Mercedes ML Best place to have dinner in BrusselsBest place to have dinner in Brussels (and have a walk afterwards)(and have a walk afterwards)
  • 8. Let's combine police reports datasets & marktplaats advertisements... (not big data, just a toy example of spark) Do thieves like certain neighborhoods with certain items?
  • 9. Download advertisement data with script Find postal code of each neighborhood Combine in Apache Spark
  • 10. Scale models are an indication for burglary! Check markplaats.nl if more than 70 advertisements are in a radius of 600 meter!!!! Maybe markplaats.nl advertisements can predict..... House-pricing trends? Crime? Education level? They have something!!! If you were asked to build a model, on the Netherlands, what tool would you use? *dataset too small to make this statement
  • 11. Try yourself! (GB's limited)Try yourself! (GB's limited) ● Mix with various Services, e.g. Hadoop/NoSQLMix with various Services, e.g. Hadoop/NoSQL ● Free Trial & Paid (with Serious Power)Free Trial & Paid (with Serious Power) ● Made for the App DeveloperMade for the App Developer
  • 12. ● Run Spark Online ● (Various) Notebook, to use for Python, Scala, & R ● Free, perfect to start & learn! (examples) ● Made for the Data Scientist Try yourself! (GB's limited)Try yourself! (GB's limited)
  • 13. IBM Will: “Educate one million data scientistsIBM Will: “Educate one million data scientists and data engineers on Apache Spark throughand data engineers on Apache Spark through extensive partnerships with AMPLab,extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize and BigDataCamp, MetiStream, Galvanize and Big Data University MOOC”Data University MOOC” Join us, & start today at the BIG DATAJoin us, & start today at the BIG DATA University! https://bigdatauniversity.com/University! https://bigdatauniversity.com/ Spark Hackathon Coming soon in NL!Spark Hackathon Coming soon in NL! IBM Wants YOU to learn spark!
  • 14. Questions about...Questions about... Start with Spark?Start with Spark? IBM & Spark?IBM & Spark? Markplaats.nl?Markplaats.nl? Code will be on GithubCode will be on Github (after cleaning)(after cleaning) Thank you!Thank you! Willem Hendriks 06 2240 8900 Data Scientist IBM willem.hendriks@nl.ibm.com https://github.com/willemhendriks