SlideShare uma empresa Scribd logo
1 de 12
1
Big Data
Past, Present & Future
Where are We Headed?
Rob Peglar
CTO Americas
Isilon Storage Division
EMC Corporation
rob.peglar@emc.com
@peglarr
2
• In order to understand what’s coming, we must
understand our past
• We must also understand that
Big Data is fundamentally
different than what we’re used to
• Consider the difference between a still photograph
and a movie – and our human perception of them
– More than a collection of still photographs – why?
Prediction is Very Difficult -
Especially About the Future
- Niels Bohr
3
The Past –
and I Mean the Past
• Consider the census…
• From the Latin “censere”
– meaning “to estimate”
• “In those days a decree went out from Emperor Augustus that all
the world should be registered.” Luke 2:1
• The Domesday Book of 1086 – England
– Comprehensive tally of people, their land, and property
• The US Constitution mandates a decennial census
– The 1880 census took eight years (!) to complete
• This led to Hollerith’s punched card tabulator in 1890
– The beginning of automated data processing
– Reduced the census time to one year
4
Sampling – Good or Bad?
• Sampling precision improves optimally
with randomness
– Not sample size
– Jerzy Neyman (Poland, 1934) proved this
• Neyman, J.(1934) "On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection", Journal of
the Royal Statistical Society, 97 (4), 557–625
• Good - Sampling was a solution to information overload
• Bad - Systematic bias in sampling gives wrong conclusions
• A seismic shift is occurring – from
– Sampling, keeping datasets small on purpose, using them once…to
– N=all, keeping datasets large on purpose, using them many times
• Why? The outliers are the most interesting!
– Examples – credit card fraud, language translation, insurability
– Don’t just follow the rules, look for the exceptions
Williams
Tube
1946
1024 bits
5
The Journey from
Clean to Messy
• 1998 – Linden et al, collaborative
filtering patent, working at a Seattle startup selling books
online
– G. Linden J. Jacobi and E. Benson, Collaborative Recommendations Using Item-to-Item Similarity Mappings, US Patent 6,266,649 (to Amazon.com),
Patent and Trademark Office, Washington, D.C., 2001
• “If it works perfectly, Amazon should show you just one
book – the next one you will buy.” (Linden)
• Hypothesis-driven approach becomes data-driven
– “Proving” something (causation)  correlation
• McGregor et al – using big data to improve the NICU
– 16 data streams, 1,260 data points/sec
– Valid improvement of premature infant adverse outcomes
– No “proof” – it helps doctors make better diagnostic decisions
– Carolyn McGregor, "Big Data in Neonatal Intensive Care," Computer, vol. 46, no. 6, pp. 54-59, June 2013, doi:10.1109/MC.2013.157
6
Manholes and Raw Data - Correlations
• 94,000 miles of underground cable in NYC, 51,000 manholes in
just Manhattan w/service boxes below
• 1 in 20 cables laid before 1930; some Edison-era
• Records kept since 1880’s – 38 different terms
– All hand-written, paper, cards, ledgers, etc.
• 2008 - How to prevent fires, exploding manholes?
• Machine-correlate 106 predictors of imminent disaster
– Top 10% predicted were 44% of total failures
• Chris Anderson – “data deluge makes scientific method obsolete”
– http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
• “Datafication” – everything is data
– Numbers to words to images to locations to relationships to feelings …
– Graph theory & graph analysis changes the way we perceive the world
7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
The Present - Architecture
BUSINESS PROCESSINFO PROCESSINGDATA ACQUISITIONDATA CREATION
END USERSANALYSTS / SCIENTISTSARCHITECTS / ENGINEERSPRODUCERS
Shared Nothing
Scale-out Storage + SSD
MPP + In-Memory
Compute
Hadoop
Hi-Speed / -
Resiliency
Networking
Converged
Infrastructure
Cloud
Non-relational
DWH
SYSTEMS INTEGRATION
VOLUMEVELOCITYVARIETY
OBJECTIVES
Stream Processing
Event Management
Data Exploration
Contextualized Data
Modeling / Scenarios
Forecasting
DELIVERY MODELS
Access-Anywhere
Analytics Services
Context-Aware
Business Applications
ON-DEMAND
Location-Based
Services
Alert and
Respond
PUSH
Workflow and
Interaction
Automation
Smart devices
and systems
EMBEDDED
Email and
Messaging
Mobile Apps Data
Transaction and
Usage Logs
Machine and
Sensors
Geolocation
Relationships and
Social Influence
Real-time
Events
Deep
Insights
VALUE
8
The Present – Business Value of Data
• Data is valuable – re-use of data even more so
– Not ephemeral value – can be re-consumed ad infinitum
– Economists call this a “non-rivalrous” good
• Cost/benefit of storage ~ 0 – so keep everything
– Ewan Birney, European Biomatics Information Institute, “Hidden Treasures
In Junk DNA” http://www.scientificamerican.com/article.cfm?id=hidden-treasures-in-junk-dna
– Last 50 years, cost/byte ~1/2x every 2 years
– Density has increased ~50 million times since 1956
• Consider electric cars:
– Battery level indicates when to “fill up” from the power grid
– Power utility monitors grid usage over time
– Correlate both data sets together
• Determine when/where to build recharge stations on which roads
• Recombinant data
– “Old” data combined into new forms for new insights
– “Noisy” datasets enable feedback loops – e.g. better/faster search/index
9
The Future 1 – Wild, Wild West?
• Can we treat data as a corporate asset?
– A ledger entry, like “brand value” (intangible)
– Or is data a tangible asset to be kept on the books?
– Does data have “cash value”? Asset amortization?
– Can a business be legally “liable” for its data collection?
• Facebook book-valued at $6.3B. IPO value: $104B
– Why the difference? Facebook is essentially data
– Or, every FB user is worth ~ $100 (~1B subscribers)
• We will see much more “data value chain” ahead
– Ingest, analyze, sell results, analyze, sell results …downstreaming
– Licensing of data in its infancy – much more to come
– Think about the data just from your car – 40 uPs
10
The Future 2 – Data as Policy -
Can Data save Us from Us?
• “In God We Trust – all others bring data”
– Commonly attributed to W. Edward Deming
• New jobs/titles coming out of the woodwork
– CAO (Chief Analytics Officer), CDO (Data)
– Data Scientist, Data Correlationist, Data Ethicist
• Knowing “what” not “why” is good enough. Is it?
• Remember Bayes’ “inductive probability” (250 yrs!)
– We update our beliefs about something as new data arrives
– Bayes T. (1763) "An Essay towards solving a Problem in the Doctrine of Chances". Phil. Trans., 53, 370–418.
• Data Policy in the immortal words of Yogi Berra:
– “We make too many wrong mistakes”
– “You can observe a lot just by watching.”
11
The Future 3 – N=all?
Keep Everything? Seriously?
• Data Silos or the Data Lake?
– HDFS presents a crisis: i.e. 危機, weiji
• dangerous ‘critical point’ (not crisis; mis-translation)
– Write-once, read-many, modify-never; delete-never?
– Time is not your friend when moving data
• (So, don’t move it between repositories; move it to the CPU)
• One 40GE NIC yields same rate on bus as 28 disks @ 140MB/s
• One million seconds is 277.7 hours (~ 11.5 days)
• 1 PB @ 1 GB/sec is … 1 EB @ 1 TB/sec is …
• Non-shared (1 protocol) or shared (N protocols)?
• Time versus Space – the Essential Judgment
• Cost of Having Data vs. Cost of Not Having Data
12
THANK YOU

Mais conteúdo relacionado

Mais procurados

Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big DataBooz Allen Hamilton
 
Asking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBMAsking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBMpaulp-mc2
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceJedha Bootcamp
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine LearningAngelo Mariano
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunitiesChouaieb NEMRI
 
Data Science and Culture
Data Science and CultureData Science and Culture
Data Science and CultureÍcaro Medeiros
 
Big Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-FrankBig Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-FrankDataWorks Summit
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 

Mais procurados (20)

Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big Data
 
Big data
Big dataBig data
Big data
 
Asking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBMAsking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBM
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Data science
Data scienceData science
Data science
 
U4 l01 What is big data?
U4 l01 What is big data?U4 l01 What is big data?
U4 l01 What is big data?
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine Learning
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
 
Data Science and Culture
Data Science and CultureData Science and Culture
Data Science and Culture
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Big Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-FrankBig Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-Frank
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 

Semelhante a Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014

Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDATAVERSITY
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...InnoTech
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...IT Network marcus evans
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Adam Leadbetter
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
Big Data: What's it Really About?
Big Data: What's it Really About?Big Data: What's it Really About?
Big Data: What's it Really About?inside-BigData.com
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationDoug Denton
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data Lisette ZOUNON
 

Semelhante a Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014 (20)

DBMS
DBMSDBMS
DBMS
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Big data
Big dataBig data
Big data
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success Stories
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Spark
SparkSpark
Spark
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Big Data – Are You Ready?
Big Data – Are You Ready?Big Data – Are You Ready?
Big Data – Are You Ready?
 
Big Data: What's it Really About?
Big Data: What's it Really About?Big Data: What's it Really About?
Big Data: What's it Really About?
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 

Mais de StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 

Mais de StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Último

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014

  • 1. 1 Big Data Past, Present & Future Where are We Headed? Rob Peglar CTO Americas Isilon Storage Division EMC Corporation rob.peglar@emc.com @peglarr
  • 2. 2 • In order to understand what’s coming, we must understand our past • We must also understand that Big Data is fundamentally different than what we’re used to • Consider the difference between a still photograph and a movie – and our human perception of them – More than a collection of still photographs – why? Prediction is Very Difficult - Especially About the Future - Niels Bohr
  • 3. 3 The Past – and I Mean the Past • Consider the census… • From the Latin “censere” – meaning “to estimate” • “In those days a decree went out from Emperor Augustus that all the world should be registered.” Luke 2:1 • The Domesday Book of 1086 – England – Comprehensive tally of people, their land, and property • The US Constitution mandates a decennial census – The 1880 census took eight years (!) to complete • This led to Hollerith’s punched card tabulator in 1890 – The beginning of automated data processing – Reduced the census time to one year
  • 4. 4 Sampling – Good or Bad? • Sampling precision improves optimally with randomness – Not sample size – Jerzy Neyman (Poland, 1934) proved this • Neyman, J.(1934) "On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection", Journal of the Royal Statistical Society, 97 (4), 557–625 • Good - Sampling was a solution to information overload • Bad - Systematic bias in sampling gives wrong conclusions • A seismic shift is occurring – from – Sampling, keeping datasets small on purpose, using them once…to – N=all, keeping datasets large on purpose, using them many times • Why? The outliers are the most interesting! – Examples – credit card fraud, language translation, insurability – Don’t just follow the rules, look for the exceptions Williams Tube 1946 1024 bits
  • 5. 5 The Journey from Clean to Messy • 1998 – Linden et al, collaborative filtering patent, working at a Seattle startup selling books online – G. Linden J. Jacobi and E. Benson, Collaborative Recommendations Using Item-to-Item Similarity Mappings, US Patent 6,266,649 (to Amazon.com), Patent and Trademark Office, Washington, D.C., 2001 • “If it works perfectly, Amazon should show you just one book – the next one you will buy.” (Linden) • Hypothesis-driven approach becomes data-driven – “Proving” something (causation)  correlation • McGregor et al – using big data to improve the NICU – 16 data streams, 1,260 data points/sec – Valid improvement of premature infant adverse outcomes – No “proof” – it helps doctors make better diagnostic decisions – Carolyn McGregor, "Big Data in Neonatal Intensive Care," Computer, vol. 46, no. 6, pp. 54-59, June 2013, doi:10.1109/MC.2013.157
  • 6. 6 Manholes and Raw Data - Correlations • 94,000 miles of underground cable in NYC, 51,000 manholes in just Manhattan w/service boxes below • 1 in 20 cables laid before 1930; some Edison-era • Records kept since 1880’s – 38 different terms – All hand-written, paper, cards, ledgers, etc. • 2008 - How to prevent fires, exploding manholes? • Machine-correlate 106 predictors of imminent disaster – Top 10% predicted were 44% of total failures • Chris Anderson – “data deluge makes scientific method obsolete” – http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory • “Datafication” – everything is data – Numbers to words to images to locations to relationships to feelings … – Graph theory & graph analysis changes the way we perceive the world
  • 7. 7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. The Present - Architecture BUSINESS PROCESSINFO PROCESSINGDATA ACQUISITIONDATA CREATION END USERSANALYSTS / SCIENTISTSARCHITECTS / ENGINEERSPRODUCERS Shared Nothing Scale-out Storage + SSD MPP + In-Memory Compute Hadoop Hi-Speed / - Resiliency Networking Converged Infrastructure Cloud Non-relational DWH SYSTEMS INTEGRATION VOLUMEVELOCITYVARIETY OBJECTIVES Stream Processing Event Management Data Exploration Contextualized Data Modeling / Scenarios Forecasting DELIVERY MODELS Access-Anywhere Analytics Services Context-Aware Business Applications ON-DEMAND Location-Based Services Alert and Respond PUSH Workflow and Interaction Automation Smart devices and systems EMBEDDED Email and Messaging Mobile Apps Data Transaction and Usage Logs Machine and Sensors Geolocation Relationships and Social Influence Real-time Events Deep Insights VALUE
  • 8. 8 The Present – Business Value of Data • Data is valuable – re-use of data even more so – Not ephemeral value – can be re-consumed ad infinitum – Economists call this a “non-rivalrous” good • Cost/benefit of storage ~ 0 – so keep everything – Ewan Birney, European Biomatics Information Institute, “Hidden Treasures In Junk DNA” http://www.scientificamerican.com/article.cfm?id=hidden-treasures-in-junk-dna – Last 50 years, cost/byte ~1/2x every 2 years – Density has increased ~50 million times since 1956 • Consider electric cars: – Battery level indicates when to “fill up” from the power grid – Power utility monitors grid usage over time – Correlate both data sets together • Determine when/where to build recharge stations on which roads • Recombinant data – “Old” data combined into new forms for new insights – “Noisy” datasets enable feedback loops – e.g. better/faster search/index
  • 9. 9 The Future 1 – Wild, Wild West? • Can we treat data as a corporate asset? – A ledger entry, like “brand value” (intangible) – Or is data a tangible asset to be kept on the books? – Does data have “cash value”? Asset amortization? – Can a business be legally “liable” for its data collection? • Facebook book-valued at $6.3B. IPO value: $104B – Why the difference? Facebook is essentially data – Or, every FB user is worth ~ $100 (~1B subscribers) • We will see much more “data value chain” ahead – Ingest, analyze, sell results, analyze, sell results …downstreaming – Licensing of data in its infancy – much more to come – Think about the data just from your car – 40 uPs
  • 10. 10 The Future 2 – Data as Policy - Can Data save Us from Us? • “In God We Trust – all others bring data” – Commonly attributed to W. Edward Deming • New jobs/titles coming out of the woodwork – CAO (Chief Analytics Officer), CDO (Data) – Data Scientist, Data Correlationist, Data Ethicist • Knowing “what” not “why” is good enough. Is it? • Remember Bayes’ “inductive probability” (250 yrs!) – We update our beliefs about something as new data arrives – Bayes T. (1763) "An Essay towards solving a Problem in the Doctrine of Chances". Phil. Trans., 53, 370–418. • Data Policy in the immortal words of Yogi Berra: – “We make too many wrong mistakes” – “You can observe a lot just by watching.”
  • 11. 11 The Future 3 – N=all? Keep Everything? Seriously? • Data Silos or the Data Lake? – HDFS presents a crisis: i.e. 危機, weiji • dangerous ‘critical point’ (not crisis; mis-translation) – Write-once, read-many, modify-never; delete-never? – Time is not your friend when moving data • (So, don’t move it between repositories; move it to the CPU) • One 40GE NIC yields same rate on bus as 28 disks @ 140MB/s • One million seconds is 277.7 hours (~ 11.5 days) • 1 PB @ 1 GB/sec is … 1 EB @ 1 TB/sec is … • Non-shared (1 protocol) or shared (N protocols)? • Time versus Space – the Essential Judgment • Cost of Having Data vs. Cost of Not Having Data