SlideShare uma empresa Scribd logo
1 de 23
Big data on a small budget
What do I know about big data?

- skobbler logs all positions
from our users (100 billion+)
- > 10TB of data from users
- Products / revenues
significantly Improved with
Business Intelligence
Big data on a small budget

@apphil #2
Why should you learn about big data?

 Harvard Business Review: “Data Scientist: The
Sexiest Job of the 21st Century”
 Obama became president of the US in big parts
due to the use of big data…
 World class sports teams enhance their
performance by big data

 Amazon, Google, Facebook, etc. have all their devprocesses by now data-driven

Big data on a small budget

@apphil #3
What are some great use-cases for big
data?
 Analyzing of log files
and user behavior (and
predictions about future
behavior)
 A/B testing and
automatic optimization
of functionality
 Improving monetization
(e.g. ad optimization,
etc.)

 Checking adoption and
usage of new features
Big data on a small budget

@apphil #4
When better not to rely on big data?
 When qualitative feedback is
better than quantitative one
(e.g. very early stage
companies)
 When you don’t have
enough users yet to get
statistically relevant results
 When you do not know what
you are optimizing for

Big data on a small budget

@apphil #5
How does a solid and simple workflow for
big data analysis look like?
Proces
s

Log

Analyse

Eval /
Test

Big data on a small budget

Improv
e

@apphil #6
Tools / technologies for a good big data
setup
 Logging: MongoDB, VoltDB,
Cassandra
 Processing & Analyzing /
Storing: Hadoop & Hbase
(batch), Storm (real-time),
Samza (real-time)
 Optimizing: Mahout (machine
learning)

Big data on a small budget

@apphil #7
How can you build this without breaking
the bank?

- Analyse / process Async
- Cheap dedicated servers
(vs. cloud)
- Use Open / Free
Software
Big data on a small budget

@apphil #8
Key cost factor: Real-time, near-time vs.
batch

- Real-time much more
expensive than batch
- Leverage as much preprocessing as possible
- Try using in-memory
technology for realtime analytics
Big data on a small budget

@apphil #9
#1 Log: Initially as much data as feasible
should be logged so it’s available later

- Define interesting data
(rather log too much if
unsure)
- Upload / collect data
- Decide on real-time, neartime or batch processing in
the chain
Big data on a small budget

@apphil #10
#2 Process: Enhance the data and make it
as rich as possible and easy to query

- Move data to processing environment
- Run logged data through processing
chain so it can be queried
- Enhance the logged data with any
additional data available (e.g.
geography, social data, user data, etc.)
Big data on a small budget

@apphil
#3 Analyse: Cluster the data in meaningful
groups and compare it

Big data on a small budget

- Define Key performance
Indicators (KPI)
- Cluster data in a meaningful
way (e.g. by geography, time
of day, customer past
behaviour)
- Compare data vs. reference
sets
@apphil #12
#4 Improve: Learn from analysis where
your challenges are to optimize behavior

- Manually / Automatically adjust
features (e.g. lower prices in
certain regions, etc.)
- Develop A/B testing scenarios
and formulate improvement
theories
Big data on a small budget

@apphil #13
#5 Evaluate
 Check if the KPIs
improve after applying
the changes
 Accept changes that
improved your users
behavior / reject changes
that kept them the same
 Define which additional
logs you might need to
better cluster / identify
behaviour

 Go back to step #1

Big data on a small budget

@apphil #14
#1 Log: Practical example on how this
works at skobbler
 Software version
 Routing profile used

 Device
 Raw Positions
 Geography (e.g. country)

 Rating of the route (optional)
 Destination reached (yes / no)
 Etc.
Big data on a small budget

@apphil #15
#2 Process: Enhance and split the data
based on drives and segments
 Combine the data on a per drive basis (= session)
 Combine the data on a per segment basis (= how
fast are people driving on a street versus our
estimate)
 Identify key behavior across the route (e.g. reroutings, etc.)

Big data on a small budget

@apphil #16
Example: Real time analysis with Twitter
Storm framework to detect road changes

Example visualization of
drives in last five
minutes (real-time)
Big data on a small budget

@apphil #17
Example: Historic driving patterns
(processed with Hadoop / HBase)

Big data on a small budget

@apphil #18
#3 Analyse: Try to see in which areas our
routing is not optimal
 KPIs are:
 Route rating (if given)

 # of re-routings (the smaller the better)
 Time to destination vs. estimation by routing
 Cluster the data by

 Routing algorithm (and parameters used)
 Geography

Big data on a small budget

@apphil #19
#4 Improve: Come up with strategies to
improve routing experience based on data
 For future routes improve the estimation on time
taken on a segment vs. time actually travelled
 Alter routing parameters based on country specifics
to get better results (e.g. in Germany people drive
faster on the Autobahn)

Big data on a small budget

@apphil #20
#5 Evaluate: Deploy the changes and
compare them to reference data

- Deploy changes to production
and compare ratings / timings
vs. base values (~weekly)
- Verify if other parameters such
as usage, etc. also improve
Big data on a small budget

@apphil #21
Summary: Big data can drive big value but
stay affordable

Simple formula:
Log -> Process -> Analyze ->
Improve -> Evaluate
= Success

Big data on a small budget

@apphil #22
Thank you for your attention!
Get in Touch: philipp.kandal@skobbler.com
Phone: +49-172-4597015
Follow me on
.com/apphil

Mais conteúdo relacionado

Semelhante a Philipp Kandal , CTO, Skobbler - Big data on a small budget

[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
Big data analytics in banking sector
Big data analytics in banking sectorBig data analytics in banking sector
Big data analytics in banking sectorAnil Rana
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Conceptsraulmisir
 
Innovaccer service capabilities with case studies
Innovaccer service capabilities with case studiesInnovaccer service capabilities with case studies
Innovaccer service capabilities with case studiesAbhinav Shashank
 
Building a data-driven application
Building a data-driven applicationBuilding a data-driven application
Building a data-driven applicationwgyn
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mininggulab sharma
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningNandakumar P
 
Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextInMobi Technology
 
Big Data & Information Management Channel Manager
Big Data & Information Management Channel ManagerBig Data & Information Management Channel Manager
Big Data & Information Management Channel ManagerArrow ECS UK
 
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6Manoj Kolhe
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDLT Solutions
 
Some emerging trends in analytics
Some emerging trends in analyticsSome emerging trends in analytics
Some emerging trends in analyticsPrasant Patro
 
Data analysis step by step guide
Data analysis   step by step guideData analysis   step by step guide
Data analysis step by step guideManish Gupta
 
SAP HANA Project - Real Time Analytics
SAP HANA Project - Real Time AnalyticsSAP HANA Project - Real Time Analytics
SAP HANA Project - Real Time AnalyticsAli Asad
 

Semelhante a Philipp Kandal , CTO, Skobbler - Big data on a small budget (20)

[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Big data analytics in banking sector
Big data analytics in banking sectorBig data analytics in banking sector
Big data analytics in banking sector
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
 
Innovaccer service capabilities with case studies
Innovaccer service capabilities with case studiesInnovaccer service capabilities with case studies
Innovaccer service capabilities with case studies
 
Building a data-driven application
Building a data-driven applicationBuilding a data-driven application
Building a data-driven application
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 
Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile Context
 
Big Data & Information Management Channel Manager
Big Data & Information Management Channel ManagerBig Data & Information Management Channel Manager
Big Data & Information Management Channel Manager
 
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great Data
 
Some emerging trends in analytics
Some emerging trends in analyticsSome emerging trends in analytics
Some emerging trends in analytics
 
Data analysis step by step guide
Data analysis   step by step guideData analysis   step by step guide
Data analysis step by step guide
 
V33119122
V33119122V33119122
V33119122
 
SAP HANA Project - Real Time Analytics
SAP HANA Project - Real Time AnalyticsSAP HANA Project - Real Time Analytics
SAP HANA Project - Real Time Analytics
 

Mais de How to Web

Marketing Technologies, Tools and Tactics by Travis Wright at How to Web Conf...
Marketing Technologies, Tools and Tactics by Travis Wright at How to Web Conf...Marketing Technologies, Tools and Tactics by Travis Wright at How to Web Conf...
Marketing Technologies, Tools and Tactics by Travis Wright at How to Web Conf...How to Web
 
MVP Academy: Lessons Learnt
MVP Academy: Lessons LearntMVP Academy: Lessons Learnt
MVP Academy: Lessons LearntHow to Web
 
MVP Academy Follow-up Report
MVP Academy Follow-up ReportMVP Academy Follow-up Report
MVP Academy Follow-up ReportHow to Web
 
How to Web Conference 2015 - Event Report
How to Web Conference 2015 - Event ReportHow to Web Conference 2015 - Event Report
How to Web Conference 2015 - Event ReportHow to Web
 
How to Web Conference 2015
How to Web Conference 2015How to Web Conference 2015
How to Web Conference 2015How to Web
 
Product metrics by Bogdan Ripa
Product metrics by Bogdan RipaProduct metrics by Bogdan Ripa
Product metrics by Bogdan RipaHow to Web
 
Michael Ni, Senior VP Marketing & Products Avangate - What's a Product? Servi...
Michael Ni, Senior VP Marketing & Products Avangate - What's a Product? Servi...Michael Ni, Senior VP Marketing & Products Avangate - What's a Product? Servi...
Michael Ni, Senior VP Marketing & Products Avangate - What's a Product? Servi...How to Web
 
Agnieszka Szostak, Founder PR Outreach - The Good, The Bad and the PR (How to...
Agnieszka Szostak, Founder PR Outreach - The Good, The Bad and the PR (How to...Agnieszka Szostak, Founder PR Outreach - The Good, The Bad and the PR (How to...
Agnieszka Szostak, Founder PR Outreach - The Good, The Bad and the PR (How to...How to Web
 
Cristian Diaconescu, Founder Sand Sailor Studio - Black The Fall: the story b...
Cristian Diaconescu, Founder Sand Sailor Studio - Black The Fall: the story b...Cristian Diaconescu, Founder Sand Sailor Studio - Black The Fall: the story b...
Cristian Diaconescu, Founder Sand Sailor Studio - Black The Fall: the story b...How to Web
 
Roberto Mangiafico, CTO BadSeed Entertainment - Sleep Attach: A Technical Pos...
Roberto Mangiafico, CTO BadSeed Entertainment - Sleep Attach: A Technical Pos...Roberto Mangiafico, CTO BadSeed Entertainment - Sleep Attach: A Technical Pos...
Roberto Mangiafico, CTO BadSeed Entertainment - Sleep Attach: A Technical Pos...How to Web
 
Dan Olthen, Game of Thrones Producer @ BigPoint GmbH - Make it happen: the st...
Dan Olthen, Game of Thrones Producer @ BigPoint GmbH - Make it happen: the st...Dan Olthen, Game of Thrones Producer @ BigPoint GmbH - Make it happen: the st...
Dan Olthen, Game of Thrones Producer @ BigPoint GmbH - Make it happen: the st...How to Web
 
Vlad Micu, Head of Studio Critical Force Entertainment - The complete game st...
Vlad Micu, Head of Studio Critical Force Entertainment - The complete game st...Vlad Micu, Head of Studio Critical Force Entertainment - The complete game st...
Vlad Micu, Head of Studio Critical Force Entertainment - The complete game st...How to Web
 
Mathieu Muller, Field Engineer Unity Technologies - Unity 5: Easier, Better, ...
Mathieu Muller, Field Engineer Unity Technologies - Unity 5: Easier, Better, ...Mathieu Muller, Field Engineer Unity Technologies - Unity 5: Easier, Better, ...
Mathieu Muller, Field Engineer Unity Technologies - Unity 5: Easier, Better, ...How to Web
 
Valerian Banu, Product Analyst UberVu via HootSuite - What we've learnt while...
Valerian Banu, Product Analyst UberVu via HootSuite - What we've learnt while...Valerian Banu, Product Analyst UberVu via HootSuite - What we've learnt while...
Valerian Banu, Product Analyst UberVu via HootSuite - What we've learnt while...How to Web
 
Mark Tolmacs, Product Manager UStream Inc. - How I stopped worrying and start...
Mark Tolmacs, Product Manager UStream Inc. - How I stopped worrying and start...Mark Tolmacs, Product Manager UStream Inc. - How I stopped worrying and start...
Mark Tolmacs, Product Manager UStream Inc. - How I stopped worrying and start...How to Web
 
Salim Virani, Partner Founder Centric - Craft (HTW Conference 2014)
Salim Virani, Partner Founder Centric - Craft (HTW Conference 2014)Salim Virani, Partner Founder Centric - Craft (HTW Conference 2014)
Salim Virani, Partner Founder Centric - Craft (HTW Conference 2014)How to Web
 
Alex Hunter, CEO Rushmore - Getting and keeping customers
Alex Hunter, CEO Rushmore - Getting and keeping customersAlex Hunter, CEO Rushmore - Getting and keeping customers
Alex Hunter, CEO Rushmore - Getting and keeping customersHow to Web
 
Marco Cecconi, Software Developer @ Stack Exchange - The architecture of Stac...
Marco Cecconi, Software Developer @ Stack Exchange - The architecture of Stac...Marco Cecconi, Software Developer @ Stack Exchange - The architecture of Stac...
Marco Cecconi, Software Developer @ Stack Exchange - The architecture of Stac...How to Web
 
Sitar Teli, Managing Partner, Connect Ventures - Core Metrics: What Web and M...
Sitar Teli, Managing Partner, Connect Ventures - Core Metrics: What Web and M...Sitar Teli, Managing Partner, Connect Ventures - Core Metrics: What Web and M...
Sitar Teli, Managing Partner, Connect Ventures - Core Metrics: What Web and M...How to Web
 
Simon Stewart, Facebook engineer - Building Facebook for Android
Simon Stewart, Facebook engineer - Building Facebook for AndroidSimon Stewart, Facebook engineer - Building Facebook for Android
Simon Stewart, Facebook engineer - Building Facebook for AndroidHow to Web
 

Mais de How to Web (20)

Marketing Technologies, Tools and Tactics by Travis Wright at How to Web Conf...
Marketing Technologies, Tools and Tactics by Travis Wright at How to Web Conf...Marketing Technologies, Tools and Tactics by Travis Wright at How to Web Conf...
Marketing Technologies, Tools and Tactics by Travis Wright at How to Web Conf...
 
MVP Academy: Lessons Learnt
MVP Academy: Lessons LearntMVP Academy: Lessons Learnt
MVP Academy: Lessons Learnt
 
MVP Academy Follow-up Report
MVP Academy Follow-up ReportMVP Academy Follow-up Report
MVP Academy Follow-up Report
 
How to Web Conference 2015 - Event Report
How to Web Conference 2015 - Event ReportHow to Web Conference 2015 - Event Report
How to Web Conference 2015 - Event Report
 
How to Web Conference 2015
How to Web Conference 2015How to Web Conference 2015
How to Web Conference 2015
 
Product metrics by Bogdan Ripa
Product metrics by Bogdan RipaProduct metrics by Bogdan Ripa
Product metrics by Bogdan Ripa
 
Michael Ni, Senior VP Marketing & Products Avangate - What's a Product? Servi...
Michael Ni, Senior VP Marketing & Products Avangate - What's a Product? Servi...Michael Ni, Senior VP Marketing & Products Avangate - What's a Product? Servi...
Michael Ni, Senior VP Marketing & Products Avangate - What's a Product? Servi...
 
Agnieszka Szostak, Founder PR Outreach - The Good, The Bad and the PR (How to...
Agnieszka Szostak, Founder PR Outreach - The Good, The Bad and the PR (How to...Agnieszka Szostak, Founder PR Outreach - The Good, The Bad and the PR (How to...
Agnieszka Szostak, Founder PR Outreach - The Good, The Bad and the PR (How to...
 
Cristian Diaconescu, Founder Sand Sailor Studio - Black The Fall: the story b...
Cristian Diaconescu, Founder Sand Sailor Studio - Black The Fall: the story b...Cristian Diaconescu, Founder Sand Sailor Studio - Black The Fall: the story b...
Cristian Diaconescu, Founder Sand Sailor Studio - Black The Fall: the story b...
 
Roberto Mangiafico, CTO BadSeed Entertainment - Sleep Attach: A Technical Pos...
Roberto Mangiafico, CTO BadSeed Entertainment - Sleep Attach: A Technical Pos...Roberto Mangiafico, CTO BadSeed Entertainment - Sleep Attach: A Technical Pos...
Roberto Mangiafico, CTO BadSeed Entertainment - Sleep Attach: A Technical Pos...
 
Dan Olthen, Game of Thrones Producer @ BigPoint GmbH - Make it happen: the st...
Dan Olthen, Game of Thrones Producer @ BigPoint GmbH - Make it happen: the st...Dan Olthen, Game of Thrones Producer @ BigPoint GmbH - Make it happen: the st...
Dan Olthen, Game of Thrones Producer @ BigPoint GmbH - Make it happen: the st...
 
Vlad Micu, Head of Studio Critical Force Entertainment - The complete game st...
Vlad Micu, Head of Studio Critical Force Entertainment - The complete game st...Vlad Micu, Head of Studio Critical Force Entertainment - The complete game st...
Vlad Micu, Head of Studio Critical Force Entertainment - The complete game st...
 
Mathieu Muller, Field Engineer Unity Technologies - Unity 5: Easier, Better, ...
Mathieu Muller, Field Engineer Unity Technologies - Unity 5: Easier, Better, ...Mathieu Muller, Field Engineer Unity Technologies - Unity 5: Easier, Better, ...
Mathieu Muller, Field Engineer Unity Technologies - Unity 5: Easier, Better, ...
 
Valerian Banu, Product Analyst UberVu via HootSuite - What we've learnt while...
Valerian Banu, Product Analyst UberVu via HootSuite - What we've learnt while...Valerian Banu, Product Analyst UberVu via HootSuite - What we've learnt while...
Valerian Banu, Product Analyst UberVu via HootSuite - What we've learnt while...
 
Mark Tolmacs, Product Manager UStream Inc. - How I stopped worrying and start...
Mark Tolmacs, Product Manager UStream Inc. - How I stopped worrying and start...Mark Tolmacs, Product Manager UStream Inc. - How I stopped worrying and start...
Mark Tolmacs, Product Manager UStream Inc. - How I stopped worrying and start...
 
Salim Virani, Partner Founder Centric - Craft (HTW Conference 2014)
Salim Virani, Partner Founder Centric - Craft (HTW Conference 2014)Salim Virani, Partner Founder Centric - Craft (HTW Conference 2014)
Salim Virani, Partner Founder Centric - Craft (HTW Conference 2014)
 
Alex Hunter, CEO Rushmore - Getting and keeping customers
Alex Hunter, CEO Rushmore - Getting and keeping customersAlex Hunter, CEO Rushmore - Getting and keeping customers
Alex Hunter, CEO Rushmore - Getting and keeping customers
 
Marco Cecconi, Software Developer @ Stack Exchange - The architecture of Stac...
Marco Cecconi, Software Developer @ Stack Exchange - The architecture of Stac...Marco Cecconi, Software Developer @ Stack Exchange - The architecture of Stac...
Marco Cecconi, Software Developer @ Stack Exchange - The architecture of Stac...
 
Sitar Teli, Managing Partner, Connect Ventures - Core Metrics: What Web and M...
Sitar Teli, Managing Partner, Connect Ventures - Core Metrics: What Web and M...Sitar Teli, Managing Partner, Connect Ventures - Core Metrics: What Web and M...
Sitar Teli, Managing Partner, Connect Ventures - Core Metrics: What Web and M...
 
Simon Stewart, Facebook engineer - Building Facebook for Android
Simon Stewart, Facebook engineer - Building Facebook for AndroidSimon Stewart, Facebook engineer - Building Facebook for Android
Simon Stewart, Facebook engineer - Building Facebook for Android
 

Último

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Último (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Philipp Kandal , CTO, Skobbler - Big data on a small budget

  • 1. Big data on a small budget
  • 2. What do I know about big data? - skobbler logs all positions from our users (100 billion+) - > 10TB of data from users - Products / revenues significantly Improved with Business Intelligence Big data on a small budget @apphil #2
  • 3. Why should you learn about big data?  Harvard Business Review: “Data Scientist: The Sexiest Job of the 21st Century”  Obama became president of the US in big parts due to the use of big data…  World class sports teams enhance their performance by big data  Amazon, Google, Facebook, etc. have all their devprocesses by now data-driven Big data on a small budget @apphil #3
  • 4. What are some great use-cases for big data?  Analyzing of log files and user behavior (and predictions about future behavior)  A/B testing and automatic optimization of functionality  Improving monetization (e.g. ad optimization, etc.)  Checking adoption and usage of new features Big data on a small budget @apphil #4
  • 5. When better not to rely on big data?  When qualitative feedback is better than quantitative one (e.g. very early stage companies)  When you don’t have enough users yet to get statistically relevant results  When you do not know what you are optimizing for Big data on a small budget @apphil #5
  • 6. How does a solid and simple workflow for big data analysis look like? Proces s Log Analyse Eval / Test Big data on a small budget Improv e @apphil #6
  • 7. Tools / technologies for a good big data setup  Logging: MongoDB, VoltDB, Cassandra  Processing & Analyzing / Storing: Hadoop & Hbase (batch), Storm (real-time), Samza (real-time)  Optimizing: Mahout (machine learning) Big data on a small budget @apphil #7
  • 8. How can you build this without breaking the bank? - Analyse / process Async - Cheap dedicated servers (vs. cloud) - Use Open / Free Software Big data on a small budget @apphil #8
  • 9. Key cost factor: Real-time, near-time vs. batch - Real-time much more expensive than batch - Leverage as much preprocessing as possible - Try using in-memory technology for realtime analytics Big data on a small budget @apphil #9
  • 10. #1 Log: Initially as much data as feasible should be logged so it’s available later - Define interesting data (rather log too much if unsure) - Upload / collect data - Decide on real-time, neartime or batch processing in the chain Big data on a small budget @apphil #10
  • 11. #2 Process: Enhance the data and make it as rich as possible and easy to query - Move data to processing environment - Run logged data through processing chain so it can be queried - Enhance the logged data with any additional data available (e.g. geography, social data, user data, etc.) Big data on a small budget @apphil
  • 12. #3 Analyse: Cluster the data in meaningful groups and compare it Big data on a small budget - Define Key performance Indicators (KPI) - Cluster data in a meaningful way (e.g. by geography, time of day, customer past behaviour) - Compare data vs. reference sets @apphil #12
  • 13. #4 Improve: Learn from analysis where your challenges are to optimize behavior - Manually / Automatically adjust features (e.g. lower prices in certain regions, etc.) - Develop A/B testing scenarios and formulate improvement theories Big data on a small budget @apphil #13
  • 14. #5 Evaluate  Check if the KPIs improve after applying the changes  Accept changes that improved your users behavior / reject changes that kept them the same  Define which additional logs you might need to better cluster / identify behaviour  Go back to step #1 Big data on a small budget @apphil #14
  • 15. #1 Log: Practical example on how this works at skobbler  Software version  Routing profile used  Device  Raw Positions  Geography (e.g. country)  Rating of the route (optional)  Destination reached (yes / no)  Etc. Big data on a small budget @apphil #15
  • 16. #2 Process: Enhance and split the data based on drives and segments  Combine the data on a per drive basis (= session)  Combine the data on a per segment basis (= how fast are people driving on a street versus our estimate)  Identify key behavior across the route (e.g. reroutings, etc.) Big data on a small budget @apphil #16
  • 17. Example: Real time analysis with Twitter Storm framework to detect road changes Example visualization of drives in last five minutes (real-time) Big data on a small budget @apphil #17
  • 18. Example: Historic driving patterns (processed with Hadoop / HBase) Big data on a small budget @apphil #18
  • 19. #3 Analyse: Try to see in which areas our routing is not optimal  KPIs are:  Route rating (if given)  # of re-routings (the smaller the better)  Time to destination vs. estimation by routing  Cluster the data by  Routing algorithm (and parameters used)  Geography Big data on a small budget @apphil #19
  • 20. #4 Improve: Come up with strategies to improve routing experience based on data  For future routes improve the estimation on time taken on a segment vs. time actually travelled  Alter routing parameters based on country specifics to get better results (e.g. in Germany people drive faster on the Autobahn) Big data on a small budget @apphil #20
  • 21. #5 Evaluate: Deploy the changes and compare them to reference data - Deploy changes to production and compare ratings / timings vs. base values (~weekly) - Verify if other parameters such as usage, etc. also improve Big data on a small budget @apphil #21
  • 22. Summary: Big data can drive big value but stay affordable Simple formula: Log -> Process -> Analyze -> Improve -> Evaluate = Success Big data on a small budget @apphil #22
  • 23. Thank you for your attention! Get in Touch: philipp.kandal@skobbler.com Phone: +49-172-4597015 Follow me on .com/apphil