SlideShare a Scribd company logo
1 of 25
Bigger Data. Better Insights.™
Is Bigger Data Really Better?
10 Facts from Theory and Practice
Alexander Gray, PhD
CTO, Skytree
Adj. Assoc. Prof., Georgia Tech
2
Is bigger data necessarily better?
If so, when and when not?
To what extent?
Even if it is, can we realize the gains?
3
First, what is the link between
bigger data and
bigger business value?
Let’s start with your high-value prediction problem
Healthcare:
Diagnosis
Prescription
Prognosis
Prevention
Drug screening
Drug efficacy
Cost optimization
Energy:
Remote sensing
Automatic equipment operation
Telco/data center:
Churn
Load prediction/provisioning
Asset-intensive:
Predictive maintenance
Prescriptive maintenance
Fault diagnosis
Dynamic allocation
Govt/law enforcement:
Association/phone call analysis
Threat scoring
Security:
Data loss prevention
Intrusion detection
Point-of-compromise identification
Malware identification
Marketing/sales:
Lead scoring
Recommendation
Personalized pricing
Personalized product/service
Product/service optimization
Optimal next action
Opportunity scoring
Retail:
Demand forecasting
Optimal pricing
Promo planning
Ensemble planning
Workforce allocation
Demand-driven supply chain
Insurance:
Loss model
Bind model
Claims leakage
Claims fraud
Bank/credit card:
Transaction fraud
Credit/loan scoring
Investing/trading
Money laundering
Advertising:
Ad selection
User/site bidding
Spend optimization
5
1. More $  Better prediction
Increasing business value is
achieved by
increasing predictive power.
Example: fraud detection
• False negative: Costs $2000
• False positive: Costs $100
6
The fundamental sources of error
7
2. Data size is a basic lever for predictive power
The training data size is one of
the main determinants of your
model’s predictive power.
8
3. More data  more predictive power
When you use
more training data,
you increase predictive power.
9
For realistic high-value models,
how do things work?
10
4. More sophisticated models  need more data
When you move to more
sophisticated models, you need
more training data.
e.g. for nonparametric regression,
density estimation:
e.g. nonparametric methods like k-
NN (or GBT, RF, SVM, NN, etc)
converge to zero estimation error
for near-arbitrary data:
11
5. More features  need more data
When you use more features,
you need more training data.
e.g. for nonparametric regression,
density estimation:
Note that more features improve
accuracy, speaking generally
(more on that in a different talk, or
ask me)
12
6. More data  better prediction is real
Real empirical ML results
follow the math:
More training data increases
predictive power.
13
How else can
down-sampling the data be harmful,
creating poor results?
14
7. Down-sampling for CV  wrong parameters
The optimal hyperparameters of
a model are actually dependent
on the training set size.
15
8. Down-sampling may be throwing out gems
In many cases the important
data points are too rare to be
further reduced.
• High-interest outliers or small
clusters
• High-value but rare known objects
or events
• Rare but high-value discrete values
or classes
• Missing values means each point is
less informative
• Natural systems with massive
variation
Another thing: non-uniform sampling,
without appropriate corrections, may warp
important probabilities
16
What else should we know,
toward best practices
for big data?
17
9. ML on big data is now possible*
It is now actually possible to
fully train models with very
large amounts of data.
*with Skytree!
18
9. ML on big data is now possible*
Even with full tuning at each size,
to find the optimal parameters!
*with Skytree!
It is now actually possible to
fully train models with very
large amounts of data.
19
Let’s look again at the basic sources of error
20
Let’s look again at the basic sources of error
If your error due to having an insufficient model
class (e.g. linear models like logistic regression)
dominates, adding more data won’t help
Error due to number of data is not your worst
problem
21
Let’s look again at the basic sources of error
If your error due to having incomplete model
optimization (e.g. stochastic gradient descent for
parameters or too-small grid in cross-validation
for hyperparameters) dominates, adding more
data won’t help
Error due to number of data is not your worst
problem
22
10. Your other errors may be holding you back
It is necessary to minimize all
the sources of error at the
same time.
• Training (too-) simple models in
order to handle large datasets may
not gain benefit
• Performing (too-) incomplete
training in order to handle large
datasets may not gain benefit
23
Summary: 10 facts from theory and practice
1. Better prediction  More $
2. Data size is a basic lever for predictive
power
3. More data  predictive power
4. More sophisticated models  need
more data
5. More features  need more data
6. More data  better prediction is real
7. Down-sampling for CV  wrong
parameters
8. Down-sampling may be throwing out
gems
9. ML on big data is now possible*
10.Your other errors may be holding you
back
A written form (14-page white paper)
Is available at our the Skytree booth.
24
Conclusions: 5 practical upshots
• Training on a subsample of the data is giving up measurable predictive power,
and thus significant business value.
• When a dataset contains rare objects or values, which is common,
subsampling can be disastrous.
• Training too-simple models may block the benefit from the data size.
• Performing too-incomplete training may block the benefit from the data size.
• Performing cross-validation on a subsample is incorrect.
When you are ready to max out your data’s potential
with true state-of-the-art ML:
www.skytree.net
Thanks!
Bigger Data. Better Insights.™
Thanks!
Alexander Gray, PhD
CTO, Skytree

More Related Content

What's hot

ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsAmy Hodler
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014Roger Barga
 
Scientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataScientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataWilliam Grosso
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptopRising Media, Inc.
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPeculium Crypto
 
Predictive Analytics: An Executive Primer
Predictive Analytics: An Executive PrimerPredictive Analytics: An Executive Primer
Predictive Analytics: An Executive PrimerRyan Withop
 
Machine Learning 101
Machine Learning 101Machine Learning 101
Machine Learning 101Setu Chokshi
 
1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his macRising Media, Inc.
 
Predictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and advicePredictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and adviceThe Marketing Distillery
 
Machine learning with TensorFlow
Machine learning with TensorFlow  Machine learning with TensorFlow
Machine learning with TensorFlow Eslam Saeed
 
Watson Equipment Advisor
Watson Equipment Advisor Watson Equipment Advisor
Watson Equipment Advisor IBM Watson
 
Xpanse Analytics Platform
Xpanse Analytics PlatformXpanse Analytics Platform
Xpanse Analytics PlatformMichael Keane
 
The current state of prediction in neuroimaging
The current state of prediction in neuroimagingThe current state of prediction in neuroimaging
The current state of prediction in neuroimagingSaigeRutherford
 
Webinar Tutorial - A Beginners Guide To MaxDiff Scaling
Webinar Tutorial - A Beginners Guide To MaxDiff ScalingWebinar Tutorial - A Beginners Guide To MaxDiff Scaling
Webinar Tutorial - A Beginners Guide To MaxDiff ScalingQuestionPro
 
Popular Machine Learning Myths
Popular Machine Learning Myths Popular Machine Learning Myths
Popular Machine Learning Myths Rock Interview
 

What's hot (20)

ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problems
 
ForresterPredictiveWave
ForresterPredictiveWaveForresterPredictiveWave
ForresterPredictiveWave
 
The REAL face of Big Data
The REAL face of Big DataThe REAL face of Big Data
The REAL face of Big Data
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 
Scientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataScientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of data
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedback
 
Predictive Analytics: An Executive Primer
Predictive Analytics: An Executive PrimerPredictive Analytics: An Executive Primer
Predictive Analytics: An Executive Primer
 
Machine Learning 101
Machine Learning 101Machine Learning 101
Machine Learning 101
 
1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his mac
 
Carrying out analysis
Carrying out analysisCarrying out analysis
Carrying out analysis
 
predictive analytics
predictive analyticspredictive analytics
predictive analytics
 
Predictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and advicePredictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and advice
 
Machine learning with TensorFlow
Machine learning with TensorFlow  Machine learning with TensorFlow
Machine learning with TensorFlow
 
Watson Equipment Advisor
Watson Equipment Advisor Watson Equipment Advisor
Watson Equipment Advisor
 
Xpanse Analytics Platform
Xpanse Analytics PlatformXpanse Analytics Platform
Xpanse Analytics Platform
 
The current state of prediction in neuroimaging
The current state of prediction in neuroimagingThe current state of prediction in neuroimaging
The current state of prediction in neuroimaging
 
Webinar Tutorial - A Beginners Guide To MaxDiff Scaling
Webinar Tutorial - A Beginners Guide To MaxDiff ScalingWebinar Tutorial - A Beginners Guide To MaxDiff Scaling
Webinar Tutorial - A Beginners Guide To MaxDiff Scaling
 
Popular Machine Learning Myths
Popular Machine Learning Myths Popular Machine Learning Myths
Popular Machine Learning Myths
 

Viewers also liked

Computacion1
Computacion1Computacion1
Computacion1jorge1597
 
Закон РФ "Об образовании"
Закон РФ "Об образовании"Закон РФ "Об образовании"
Закон РФ "Об образовании"koneqq
 
Grafik
GrafikGrafik
Grafikchiwil
 
Elsa Coupard & Claude Mussou: Curating History with French Audiovisual Archives
Elsa Coupard & Claude Mussou: Curating History with French Audiovisual ArchivesElsa Coupard & Claude Mussou: Curating History with French Audiovisual Archives
Elsa Coupard & Claude Mussou: Curating History with French Audiovisual ArchivesEUscreen
 
Retail sales – United states – october 2016
Retail sales – United states – october 2016Retail sales – United states – october 2016
Retail sales – United states – october 2016paul young cpa, cga
 
Examensarbete_EricssonAB_JesperLarsson_MarcusStenberg
Examensarbete_EricssonAB_JesperLarsson_MarcusStenbergExamensarbete_EricssonAB_JesperLarsson_MarcusStenberg
Examensarbete_EricssonAB_JesperLarsson_MarcusStenbergJesper Larsson
 
Expohomenaxe iesasorey2
Expohomenaxe iesasorey2Expohomenaxe iesasorey2
Expohomenaxe iesasorey2mariasorey
 
Vender ó morir
Vender ó morirVender ó morir
Vender ó morirMike Nieva
 
Proyecto turinnova digital
Proyecto turinnova digitalProyecto turinnova digital
Proyecto turinnova digitalconchini
 
Digital Strategy - Automotive and changes in customer behaviour
Digital Strategy - Automotive and changes in customer behaviour Digital Strategy - Automotive and changes in customer behaviour
Digital Strategy - Automotive and changes in customer behaviour Nigel Hudson
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...DataWorks Summit
 
Funciones basicas de exel de excel
Funciones basicas de exel de excelFunciones basicas de exel de excel
Funciones basicas de exel de excelcatalina55211645
 
Cómo sacar a LinkedIn el máximo partido
Cómo sacar a LinkedIn el máximo partidoCómo sacar a LinkedIn el máximo partido
Cómo sacar a LinkedIn el máximo partidoMaría Rubio
 

Viewers also liked (20)

Computacion1
Computacion1Computacion1
Computacion1
 
Закон РФ "Об образовании"
Закон РФ "Об образовании"Закон РФ "Об образовании"
Закон РФ "Об образовании"
 
Redefine Big Data
Redefine Big DataRedefine Big Data
Redefine Big Data
 
Grafik
GrafikGrafik
Grafik
 
Mapping analysis
Mapping analysisMapping analysis
Mapping analysis
 
His 303 week 5 final paper
His 303 week 5 final paperHis 303 week 5 final paper
His 303 week 5 final paper
 
Elsa Coupard & Claude Mussou: Curating History with French Audiovisual Archives
Elsa Coupard & Claude Mussou: Curating History with French Audiovisual ArchivesElsa Coupard & Claude Mussou: Curating History with French Audiovisual Archives
Elsa Coupard & Claude Mussou: Curating History with French Audiovisual Archives
 
BERZILA
BERZILABERZILA
BERZILA
 
Retail sales – United states – october 2016
Retail sales – United states – october 2016Retail sales – United states – october 2016
Retail sales – United states – october 2016
 
Examensarbete_EricssonAB_JesperLarsson_MarcusStenberg
Examensarbete_EricssonAB_JesperLarsson_MarcusStenbergExamensarbete_EricssonAB_JesperLarsson_MarcusStenberg
Examensarbete_EricssonAB_JesperLarsson_MarcusStenberg
 
Expohomenaxe iesasorey2
Expohomenaxe iesasorey2Expohomenaxe iesasorey2
Expohomenaxe iesasorey2
 
Vender ó morir
Vender ó morirVender ó morir
Vender ó morir
 
Proyecto turinnova digital
Proyecto turinnova digitalProyecto turinnova digital
Proyecto turinnova digital
 
Actualidad y tendencias digital mkt travel
Actualidad y tendencias digital mkt travel Actualidad y tendencias digital mkt travel
Actualidad y tendencias digital mkt travel
 
Apresentação da Calculadora hp 12c
Apresentação da Calculadora hp 12cApresentação da Calculadora hp 12c
Apresentação da Calculadora hp 12c
 
Digital Strategy - Automotive and changes in customer behaviour
Digital Strategy - Automotive and changes in customer behaviour Digital Strategy - Automotive and changes in customer behaviour
Digital Strategy - Automotive and changes in customer behaviour
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
 
Funciones basicas de exel de excel
Funciones basicas de exel de excelFunciones basicas de exel de excel
Funciones basicas de exel de excel
 
Cómo sacar a LinkedIn el máximo partido
Cómo sacar a LinkedIn el máximo partidoCómo sacar a LinkedIn el máximo partido
Cómo sacar a LinkedIn el máximo partido
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 

Similar to Is Bigger Data Really Better? 10 Facts from Theory and Practice

Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analyticshkbhadraa
 
Machine Learning: A Fast Review
Machine Learning: A Fast ReviewMachine Learning: A Fast Review
Machine Learning: A Fast ReviewAhmad Ali Abin
 
machinelearning-191005133446.pdf
machinelearning-191005133446.pdfmachinelearning-191005133446.pdf
machinelearning-191005133446.pdfLellaLinton
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dssNiyitegekabilly
 
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Cloudera, Inc.
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
Machine Learning in Business What It Is and How to Use It
Machine Learning in Business What It Is and How to Use ItMachine Learning in Business What It Is and How to Use It
Machine Learning in Business What It Is and How to Use ItKashish Trivedi
 
Artificial Intelligence for Medicine
Artificial Intelligence for MedicineArtificial Intelligence for Medicine
Artificial Intelligence for MedicineTassilo Klein
 
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...Dario Mangano
 
Learn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkLearn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkiTrainMalaysia1
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learningSmita Agrawal
 
Perspectives on Machine Learning
Perspectives on Machine LearningPerspectives on Machine Learning
Perspectives on Machine LearningDr. Niren Sirohi
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learningSmita Agrawal
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
U5 a1 stages in the decision making process
U5 a1 stages in the decision making processU5 a1 stages in the decision making process
U5 a1 stages in the decision making processPeter R Breach
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2Mahmoud Alfarra
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfarifulislam946965
 

Similar to Is Bigger Data Really Better? 10 Facts from Theory and Practice (20)

Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
 
Machine Learning: A Fast Review
Machine Learning: A Fast ReviewMachine Learning: A Fast Review
Machine Learning: A Fast Review
 
machinelearning-191005133446.pdf
machinelearning-191005133446.pdfmachinelearning-191005133446.pdf
machinelearning-191005133446.pdf
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning

 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Machine Learning in Business What It Is and How to Use It
Machine Learning in Business What It Is and How to Use ItMachine Learning in Business What It Is and How to Use It
Machine Learning in Business What It Is and How to Use It
 
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptxCredit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
 
Artificial Intelligence for Medicine
Artificial Intelligence for MedicineArtificial Intelligence for Medicine
Artificial Intelligence for Medicine
 
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
 
Learn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkLearn How to Make Machine Learning Work
Learn How to Make Machine Learning Work
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learning
 
Perspectives on Machine Learning
Perspectives on Machine LearningPerspectives on Machine Learning
Perspectives on Machine Learning
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learning
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
U5 a1 stages in the decision making process
U5 a1 stages in the decision making processU5 a1 stages in the decision making process
U5 a1 stages in the decision making process
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdf
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Is Bigger Data Really Better? 10 Facts from Theory and Practice

  • 1. Bigger Data. Better Insights.™ Is Bigger Data Really Better? 10 Facts from Theory and Practice Alexander Gray, PhD CTO, Skytree Adj. Assoc. Prof., Georgia Tech
  • 2. 2 Is bigger data necessarily better? If so, when and when not? To what extent? Even if it is, can we realize the gains?
  • 3. 3 First, what is the link between bigger data and bigger business value?
  • 4. Let’s start with your high-value prediction problem Healthcare: Diagnosis Prescription Prognosis Prevention Drug screening Drug efficacy Cost optimization Energy: Remote sensing Automatic equipment operation Telco/data center: Churn Load prediction/provisioning Asset-intensive: Predictive maintenance Prescriptive maintenance Fault diagnosis Dynamic allocation Govt/law enforcement: Association/phone call analysis Threat scoring Security: Data loss prevention Intrusion detection Point-of-compromise identification Malware identification Marketing/sales: Lead scoring Recommendation Personalized pricing Personalized product/service Product/service optimization Optimal next action Opportunity scoring Retail: Demand forecasting Optimal pricing Promo planning Ensemble planning Workforce allocation Demand-driven supply chain Insurance: Loss model Bind model Claims leakage Claims fraud Bank/credit card: Transaction fraud Credit/loan scoring Investing/trading Money laundering Advertising: Ad selection User/site bidding Spend optimization
  • 5. 5 1. More $  Better prediction Increasing business value is achieved by increasing predictive power. Example: fraud detection • False negative: Costs $2000 • False positive: Costs $100
  • 7. 7 2. Data size is a basic lever for predictive power The training data size is one of the main determinants of your model’s predictive power.
  • 8. 8 3. More data  more predictive power When you use more training data, you increase predictive power.
  • 9. 9 For realistic high-value models, how do things work?
  • 10. 10 4. More sophisticated models  need more data When you move to more sophisticated models, you need more training data. e.g. for nonparametric regression, density estimation: e.g. nonparametric methods like k- NN (or GBT, RF, SVM, NN, etc) converge to zero estimation error for near-arbitrary data:
  • 11. 11 5. More features  need more data When you use more features, you need more training data. e.g. for nonparametric regression, density estimation: Note that more features improve accuracy, speaking generally (more on that in a different talk, or ask me)
  • 12. 12 6. More data  better prediction is real Real empirical ML results follow the math: More training data increases predictive power.
  • 13. 13 How else can down-sampling the data be harmful, creating poor results?
  • 14. 14 7. Down-sampling for CV  wrong parameters The optimal hyperparameters of a model are actually dependent on the training set size.
  • 15. 15 8. Down-sampling may be throwing out gems In many cases the important data points are too rare to be further reduced. • High-interest outliers or small clusters • High-value but rare known objects or events • Rare but high-value discrete values or classes • Missing values means each point is less informative • Natural systems with massive variation Another thing: non-uniform sampling, without appropriate corrections, may warp important probabilities
  • 16. 16 What else should we know, toward best practices for big data?
  • 17. 17 9. ML on big data is now possible* It is now actually possible to fully train models with very large amounts of data. *with Skytree!
  • 18. 18 9. ML on big data is now possible* Even with full tuning at each size, to find the optimal parameters! *with Skytree! It is now actually possible to fully train models with very large amounts of data.
  • 19. 19 Let’s look again at the basic sources of error
  • 20. 20 Let’s look again at the basic sources of error If your error due to having an insufficient model class (e.g. linear models like logistic regression) dominates, adding more data won’t help Error due to number of data is not your worst problem
  • 21. 21 Let’s look again at the basic sources of error If your error due to having incomplete model optimization (e.g. stochastic gradient descent for parameters or too-small grid in cross-validation for hyperparameters) dominates, adding more data won’t help Error due to number of data is not your worst problem
  • 22. 22 10. Your other errors may be holding you back It is necessary to minimize all the sources of error at the same time. • Training (too-) simple models in order to handle large datasets may not gain benefit • Performing (too-) incomplete training in order to handle large datasets may not gain benefit
  • 23. 23 Summary: 10 facts from theory and practice 1. Better prediction  More $ 2. Data size is a basic lever for predictive power 3. More data  predictive power 4. More sophisticated models  need more data 5. More features  need more data 6. More data  better prediction is real 7. Down-sampling for CV  wrong parameters 8. Down-sampling may be throwing out gems 9. ML on big data is now possible* 10.Your other errors may be holding you back A written form (14-page white paper) Is available at our the Skytree booth.
  • 24. 24 Conclusions: 5 practical upshots • Training on a subsample of the data is giving up measurable predictive power, and thus significant business value. • When a dataset contains rare objects or values, which is common, subsampling can be disastrous. • Training too-simple models may block the benefit from the data size. • Performing too-incomplete training may block the benefit from the data size. • Performing cross-validation on a subsample is incorrect. When you are ready to max out your data’s potential with true state-of-the-art ML: www.skytree.net Thanks!
  • 25. Bigger Data. Better Insights.™ Thanks! Alexander Gray, PhD CTO, Skytree