SlideShare uma empresa Scribd logo
1 de 41
MapR: The Next Generation
                                Big Data Platform
©MapR Technologies - Confidential   1
Big is the next big thing

     Big data and Hadoop are exploding


     Companies are being funded


     Books are being written


     Applications sprouting up everywhere




©MapR Technologies - Confidential   2
                                             2
Slow Motion Explosion




©MapR Technologies - Confidential   3
                                        3
Hadoop Explosion




©MapR Technologies - Confidential   4
                                        4
Why Now?

        But Moore’s law has applied for a long time


        Why is Hadoop exploding now?


        Why not 10 years ago?


        Why not 20?




6/1/2012
   ©MapR Technologies - Confidential    5
                                                       5
Size Matters, but …

     If it were just availability of data then existing big companies would
      adopt big data technology first




©MapR Technologies - Confidential      6
                                                          6
Size Matters, but …

     If it were just availability of data then existing big companies would
      adopt big data technology first


                       They didn’t




©MapR Technologies - Confidential      7
                                                          7
Or Maybe Cost

     If it were just a net positive value then finance companies should
      adopt first because they have higher opportunity value / byte




©MapR Technologies - Confidential     8
                                                        8
Or Maybe Cost

     If it were just a net positive value then finance companies should
      adopt first because they have higher opportunity value / byte


                       They didn’t




©MapR Technologies - Confidential     9
                                                        9
Backwards adoption

     Under almost any threshold argument startups would not adopt
      big data technology first




©MapR Technologies - Confidential   10
                                                    10
Backwards adoption

     Under almost any threshold argument startups would not adopt
      big data technology first


                       They did




©MapR Technologies - Confidential   11
                                                    11
Everywhere at Once?

     Something very strange is happening
       –   Big data is being applied at many different scales
       –   At many value scales
       –   By large companies and small




©MapR Technologies - Confidential             12
                                                                12
Everywhere at Once?

     Something very strange is happening
       –   Big data is being applied at many different scales
       –   At many value scales
       –   By large companies and small


                                    Why?




©MapR Technologies - Confidential             13
                                                                13
The Conventional Answer
More data is being produced more quickly
Data sizes are bigger than even a very large computer can hold
Cost to create and store continues to decrease




©MapR Technologies - Confidential      14
Analytics Scaling Laws

     Analytics scaling is all about the 80-20 rule
       –   Big gains for little initial effort
       –   Rapidly diminishing returns
     The key to net value is how costs scale
       –   Old school – exponential scaling
       –   Big data – linear scaling, low constant
     Cost/performance has changed radically
       –   IF you can use many commodity boxes




©MapR Technologies - Confidential                15
You’re kidding, people do that?


                                      We didn’t know that!

                                     We should have
                                     known that

                                    We knew that




©MapR Technologies - Confidential                  16
NSA, non-proliferation
                                      1




                                    0.75

                                                  Industry-wide data consortium
                           Value




                                     0.5
                                                 In-house analytics

                                                Intern with a spreadsheet
                                    0.25

                                               Anybody with eyes

                                      0
                                           0      500             1000      1500   2,000

                                                                  Scale




©MapR Technologies - Confidential                            17
1




                                    0.75




                                               Net value optimum has a
                           Value




                                     0.5       sharp peak well before
                                               maximum effort


                                    0.25




                                      0
                                           0   500            1000       1500   2,000

                                                              Scale




©MapR Technologies - Confidential                        18
But scaling laws are changing
                                         both slope and shape




©MapR Technologies - Confidential   19
1




                                    0.75
                           Value




                                     0.5
                                                                  More than just a little


                                    0.25




                                      0
                                           0   500        1000         1500           2,000

                                                          Scale




©MapR Technologies - Confidential                    20
1




                                    0.75
                           Value




                                     0.5


                                                                  They are changing a LOT!
                                    0.25




                                      0
                                           0   500        1000         1500         2,000

                                                          Scale




©MapR Technologies - Confidential                    21
©MapR Technologies - Confidential   22
©MapR Technologies - Confidential   23
1




                                    0.75
                           Value




                                     0.5




                                    0.25




                                      0
                                           0   500        1000    1500   2,000

                                                          Scale




©MapR Technologies - Confidential                    24
1




                                    0.75
                           Value




                                     0.5




                                    0.25




                                      0
                                           0   500        1000    1500   2,000

                                                          Scale




©MapR Technologies - Confidential                    25
1




                                    0.75

                                                                   A tipping point is reached and
                                                                   things change radically …
                           Value




                                     0.5

                                               Initially, linear cost scaling
                                               actually makes things worse
                                    0.25




                                      0
                                           0            500              1000      1500             2,000

                                                                         Scale




©MapR Technologies - Confidential                                   26
Pre-requisites for Tipping

     To reach the tipping point,
     Algorithms must scale out horizontally
       –   On commodity hardware
       –   That can and will fail
     Data practice must change
       –   Denormalized is the new black
       –   Flexible data dictionaries are the rule
       –   Structured data becomes rare




©MapR Technologies - Confidential              27
But there is more

                                    Especially for large enterprises




©MapR Technologies - Confidential                  28
Physics of startup companies




©MapR Technologies - Confidential                29
For startups

     History is always small
     The future is huge
     Must adopt new technology to survive
     Compatibility is not as important
       –   In fact, incompatibility is assumed




©MapR Technologies - Confidential                30
Physics of large companies



                                                     Absolute growth
                                                     still very large




                                    Startup
                                    phase




©MapR Technologies - Confidential             31
For large businesses

     Present state is always large
     Relative growth is much smaller
     Absolute growth rate can be very large
     Must adopt new technology to survive
       –   Cautiously!
       –   But must integrate technology with legacy
     Compatibility is crucial




©MapR Technologies - Confidential           32
The startup technology picture

                                    No compatibility
                                     requirement




         Old computers
          and software
                                                            Expected hardware
                                                            and software growth

                                       Current computers
                                       and software



©MapR Technologies - Confidential                      33
The large enterprise picture
                                                       Must work
                                                       together




                                       ?
                                    Current hardware
                                    and software
                                                             Proof of concept
                                                              Hadoop cluster


                                                                                Long-term Hadoop
                                                                                cluster




©MapR Technologies - Confidential                              34
So that is why and why now




©MapR Technologies - Confidential             35
                                                                 35
So that is why, and why now



                                    What can you do with it?
                                          And how?




©MapR Technologies - Confidential              36
                                                                  36
Scale-free Computing

     Map-reduce
       –   pure functions for practical batch parallel computation
       –   high level languages like Hive and Pig available
       –   MapR provides standard access systems via NFS and ODBC
     BSP
       –   pure functions for synchronous iterative actor-based compute
       –   Apache Giraph provides practical implementation
     Actors
       –   tuple passing with transformations
       –   Storm provides practical implementation




©MapR Technologies - Confidential           37
Future Proof Schemas

     Denormalize data where possible to avoid seeks
       –   use embedded lists
       –   duplicate data
     Flexible Schemas
       –   use standard system for data serialization
       –   must provide protocol migration without versioning
       –   Protobufs (Google), Avro (Apache) and Thrift can all be used




©MapR Technologies - Confidential            38
Open Compute and Storage

     Big data has mass and inertia
       –   once it lands, it should not move


     Computation must move to the data
       –   map-reduce, Storm, Giraph … all OK
       –   conventional relational models … not OK


     One model is not enough
       –   must allow access by multiple models of computation




©MapR Technologies - Confidential              39
More Information


     Contact:
       –   tdunning@maprtech.com
       –   @ted_dunning


     Slides and such:
       –   http://info.mapr.com/ted-paris-05-2012




©MapR Technologies - Confidential          40
Thank You




©MapR Technologies - Confidential   41

Mais conteúdo relacionado

Semelhante a Next Generation Big Data Platform Explosion

EMC's IT's Cloud Transformation, Thomas Becker, EMC
EMC's IT's Cloud Transformation, Thomas Becker, EMCEMC's IT's Cloud Transformation, Thomas Becker, EMC
EMC's IT's Cloud Transformation, Thomas Becker, EMCCloudOps Summit
 
Dell panel cloud computing - small biz summit 2012
Dell panel   cloud computing - small biz summit 2012Dell panel   cloud computing - small biz summit 2012
Dell panel cloud computing - small biz summit 2012Ramon Ray
 
The Smarter Way to Commercialize Algorithms
The Smarter Way to Commercialize AlgorithmsThe Smarter Way to Commercialize Algorithms
The Smarter Way to Commercialize AlgorithmsCloudNSci
 
Dr Markus Pleier - Datadeluge and big data, how IT operation get transformed
Dr Markus Pleier - Datadeluge and big data, how IT operation get transformedDr Markus Pleier - Datadeluge and big data, how IT operation get transformed
Dr Markus Pleier - Datadeluge and big data, how IT operation get transformedGlobal Business Events
 
2012 Future of Cloud Computing
2012 Future of Cloud Computing 2012 Future of Cloud Computing
2012 Future of Cloud Computing Michael Skok
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise WeAreEsynergy
 
Cloud Connectivity and Amazon Direct Connect
Cloud Connectivity and Amazon Direct ConnectCloud Connectivity and Amazon Direct Connect
Cloud Connectivity and Amazon Direct ConnectExponential_e
 
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012Internap
 
Nyc lunch and learn 03 15 2012 final
Nyc lunch and learn   03 15 2012 finalNyc lunch and learn   03 15 2012 final
Nyc lunch and learn 03 15 2012 finalInternap
 
CloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to ResolutionCloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to ResolutionOpsRamp
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowMapR Technologies
 
Randy Bias - Presentation at Emerging Communications Conference & Awards (eCo...
Randy Bias - Presentation at Emerging Communications Conference & Awards (eCo...Randy Bias - Presentation at Emerging Communications Conference & Awards (eCo...
Randy Bias - Presentation at Emerging Communications Conference & Awards (eCo...eCommConf
 
Managing your Cloud with Confidence
Managing your Cloud with Confidence Managing your Cloud with Confidence
Managing your Cloud with Confidence CA Nimsoft
 
Cloud Computing and Startups
Cloud Computing and StartupsCloud Computing and Startups
Cloud Computing and Startupsmidtownninja
 
Bringing Shadow IT into the Light with a Centralized IT Cloud Migration Strategy
Bringing Shadow IT into the Light with a Centralized IT Cloud Migration StrategyBringing Shadow IT into the Light with a Centralized IT Cloud Migration Strategy
Bringing Shadow IT into the Light with a Centralized IT Cloud Migration StrategycVidya Networks
 
Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Ted Dunning
 

Semelhante a Next Generation Big Data Platform Explosion (20)

EMC's IT's Cloud Transformation, Thomas Becker, EMC
EMC's IT's Cloud Transformation, Thomas Becker, EMCEMC's IT's Cloud Transformation, Thomas Becker, EMC
EMC's IT's Cloud Transformation, Thomas Becker, EMC
 
Dell panel cloud computing - small biz summit 2012
Dell panel   cloud computing - small biz summit 2012Dell panel   cloud computing - small biz summit 2012
Dell panel cloud computing - small biz summit 2012
 
The Smarter Way to Commercialize Algorithms
The Smarter Way to Commercialize AlgorithmsThe Smarter Way to Commercialize Algorithms
The Smarter Way to Commercialize Algorithms
 
Dr Markus Pleier - Datadeluge and big data, how IT operation get transformed
Dr Markus Pleier - Datadeluge and big data, how IT operation get transformedDr Markus Pleier - Datadeluge and big data, how IT operation get transformed
Dr Markus Pleier - Datadeluge and big data, how IT operation get transformed
 
Antonio piraino v1
Antonio piraino v1Antonio piraino v1
Antonio piraino v1
 
2012 Future of Cloud Computing
2012 Future of Cloud Computing 2012 Future of Cloud Computing
2012 Future of Cloud Computing
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
 
CloudCamp
CloudCampCloudCamp
CloudCamp
 
Cloud Connectivity and Amazon Direct Connect
Cloud Connectivity and Amazon Direct ConnectCloud Connectivity and Amazon Direct Connect
Cloud Connectivity and Amazon Direct Connect
 
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
 
Nyc lunch and learn 03 15 2012 final
Nyc lunch and learn   03 15 2012 finalNyc lunch and learn   03 15 2012 final
Nyc lunch and learn 03 15 2012 final
 
CloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to ResolutionCloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to Resolution
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
 
Randy Bias - Presentation at Emerging Communications Conference & Awards (eCo...
Randy Bias - Presentation at Emerging Communications Conference & Awards (eCo...Randy Bias - Presentation at Emerging Communications Conference & Awards (eCo...
Randy Bias - Presentation at Emerging Communications Conference & Awards (eCo...
 
Boston hug
Boston hugBoston hug
Boston hug
 
Managing your Cloud with Confidence
Managing your Cloud with Confidence Managing your Cloud with Confidence
Managing your Cloud with Confidence
 
OWF12/Java Sacha labourey
OWF12/Java Sacha laboureyOWF12/Java Sacha labourey
OWF12/Java Sacha labourey
 
Cloud Computing and Startups
Cloud Computing and StartupsCloud Computing and Startups
Cloud Computing and Startups
 
Bringing Shadow IT into the Light with a Centralized IT Cloud Migration Strategy
Bringing Shadow IT into the Light with a Centralized IT Cloud Migration StrategyBringing Shadow IT into the Light with a Centralized IT Cloud Migration Strategy
Bringing Shadow IT into the Light with a Centralized IT Cloud Migration Strategy
 
Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Bda-dunning-2012-12-06
Bda-dunning-2012-12-06
 

Mais de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 

Mais de Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 

Último

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Next Generation Big Data Platform Explosion

  • 1. MapR: The Next Generation Big Data Platform ©MapR Technologies - Confidential 1
  • 2. Big is the next big thing  Big data and Hadoop are exploding  Companies are being funded  Books are being written  Applications sprouting up everywhere ©MapR Technologies - Confidential 2 2
  • 3. Slow Motion Explosion ©MapR Technologies - Confidential 3 3
  • 5. Why Now?  But Moore’s law has applied for a long time  Why is Hadoop exploding now?  Why not 10 years ago?  Why not 20? 6/1/2012 ©MapR Technologies - Confidential 5 5
  • 6. Size Matters, but …  If it were just availability of data then existing big companies would adopt big data technology first ©MapR Technologies - Confidential 6 6
  • 7. Size Matters, but …  If it were just availability of data then existing big companies would adopt big data technology first They didn’t ©MapR Technologies - Confidential 7 7
  • 8. Or Maybe Cost  If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte ©MapR Technologies - Confidential 8 8
  • 9. Or Maybe Cost  If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte They didn’t ©MapR Technologies - Confidential 9 9
  • 10. Backwards adoption  Under almost any threshold argument startups would not adopt big data technology first ©MapR Technologies - Confidential 10 10
  • 11. Backwards adoption  Under almost any threshold argument startups would not adopt big data technology first They did ©MapR Technologies - Confidential 11 11
  • 12. Everywhere at Once?  Something very strange is happening – Big data is being applied at many different scales – At many value scales – By large companies and small ©MapR Technologies - Confidential 12 12
  • 13. Everywhere at Once?  Something very strange is happening – Big data is being applied at many different scales – At many value scales – By large companies and small Why? ©MapR Technologies - Confidential 13 13
  • 14. The Conventional Answer More data is being produced more quickly Data sizes are bigger than even a very large computer can hold Cost to create and store continues to decrease ©MapR Technologies - Confidential 14
  • 15. Analytics Scaling Laws  Analytics scaling is all about the 80-20 rule – Big gains for little initial effort – Rapidly diminishing returns  The key to net value is how costs scale – Old school – exponential scaling – Big data – linear scaling, low constant  Cost/performance has changed radically – IF you can use many commodity boxes ©MapR Technologies - Confidential 15
  • 16. You’re kidding, people do that? We didn’t know that! We should have known that We knew that ©MapR Technologies - Confidential 16
  • 17. NSA, non-proliferation 1 0.75 Industry-wide data consortium Value 0.5 In-house analytics Intern with a spreadsheet 0.25 Anybody with eyes 0 0 500 1000 1500 2,000 Scale ©MapR Technologies - Confidential 17
  • 18. 1 0.75 Net value optimum has a Value 0.5 sharp peak well before maximum effort 0.25 0 0 500 1000 1500 2,000 Scale ©MapR Technologies - Confidential 18
  • 19. But scaling laws are changing both slope and shape ©MapR Technologies - Confidential 19
  • 20. 1 0.75 Value 0.5 More than just a little 0.25 0 0 500 1000 1500 2,000 Scale ©MapR Technologies - Confidential 20
  • 21. 1 0.75 Value 0.5 They are changing a LOT! 0.25 0 0 500 1000 1500 2,000 Scale ©MapR Technologies - Confidential 21
  • 22. ©MapR Technologies - Confidential 22
  • 23. ©MapR Technologies - Confidential 23
  • 24. 1 0.75 Value 0.5 0.25 0 0 500 1000 1500 2,000 Scale ©MapR Technologies - Confidential 24
  • 25. 1 0.75 Value 0.5 0.25 0 0 500 1000 1500 2,000 Scale ©MapR Technologies - Confidential 25
  • 26. 1 0.75 A tipping point is reached and things change radically … Value 0.5 Initially, linear cost scaling actually makes things worse 0.25 0 0 500 1000 1500 2,000 Scale ©MapR Technologies - Confidential 26
  • 27. Pre-requisites for Tipping  To reach the tipping point,  Algorithms must scale out horizontally – On commodity hardware – That can and will fail  Data practice must change – Denormalized is the new black – Flexible data dictionaries are the rule – Structured data becomes rare ©MapR Technologies - Confidential 27
  • 28. But there is more Especially for large enterprises ©MapR Technologies - Confidential 28
  • 29. Physics of startup companies ©MapR Technologies - Confidential 29
  • 30. For startups  History is always small  The future is huge  Must adopt new technology to survive  Compatibility is not as important – In fact, incompatibility is assumed ©MapR Technologies - Confidential 30
  • 31. Physics of large companies Absolute growth still very large Startup phase ©MapR Technologies - Confidential 31
  • 32. For large businesses  Present state is always large  Relative growth is much smaller  Absolute growth rate can be very large  Must adopt new technology to survive – Cautiously! – But must integrate technology with legacy  Compatibility is crucial ©MapR Technologies - Confidential 32
  • 33. The startup technology picture No compatibility requirement Old computers and software Expected hardware and software growth Current computers and software ©MapR Technologies - Confidential 33
  • 34. The large enterprise picture Must work together ? Current hardware and software Proof of concept Hadoop cluster Long-term Hadoop cluster ©MapR Technologies - Confidential 34
  • 35. So that is why and why now ©MapR Technologies - Confidential 35 35
  • 36. So that is why, and why now What can you do with it? And how? ©MapR Technologies - Confidential 36 36
  • 37. Scale-free Computing  Map-reduce – pure functions for practical batch parallel computation – high level languages like Hive and Pig available – MapR provides standard access systems via NFS and ODBC  BSP – pure functions for synchronous iterative actor-based compute – Apache Giraph provides practical implementation  Actors – tuple passing with transformations – Storm provides practical implementation ©MapR Technologies - Confidential 37
  • 38. Future Proof Schemas  Denormalize data where possible to avoid seeks – use embedded lists – duplicate data  Flexible Schemas – use standard system for data serialization – must provide protocol migration without versioning – Protobufs (Google), Avro (Apache) and Thrift can all be used ©MapR Technologies - Confidential 38
  • 39. Open Compute and Storage  Big data has mass and inertia – once it lands, it should not move  Computation must move to the data – map-reduce, Storm, Giraph … all OK – conventional relational models … not OK  One model is not enough – must allow access by multiple models of computation ©MapR Technologies - Confidential 39
  • 40. More Information  Contact: – tdunning@maprtech.com – @ted_dunning  Slides and such: – http://info.mapr.com/ted-paris-05-2012 ©MapR Technologies - Confidential 40
  • 41. Thank You ©MapR Technologies - Confidential 41

Notas do Editor

  1. Why is big data sooo fashionable with big and small companies from different industries? What has suddenly changed?
  2. Google searches are up 10x over just four years ago.
  3. Hadoop use is exploding. We chose this example, which shows job trends for Hadoop. Further evidence that you should pay attention during this talk.
  4. But we have seen constant growth for a long time. And simple growth would only explain some kinds of companies starting with big data (probably big ones) and then slow adoption. Databases started with big companies and took 20 years or more to reach everywhere because the need exceeded cost at different times for different companies. The internet, on the other hand, largely happened to everybody at the same time so it changed things in nearly all industries at all scales nearly simultaneously. Why is big data exploding right now and why is it exploding at all?
  5. The different kinds of scaling laws have different shape and I think that shape is the key.
  6. The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase.
  7. In classical analytics, the cost of doing analytics increases sharply.
  8. The result is a net value that has a sharp optimum in the area where value is increasing rapidly and cost is not yet increasing so rapidly.
  9. New techniques such as Hadoop result in linear scaling of cost. This is a change in shape and it causes a qualitative change in the way that costs trade off against value to give net value. As technology improves, the slope of this cost line is also changing rapidly over time.
  10. This next sequence shows how the net value changes with different slope linear cost models.
  11. Notice how the best net value has jumped up significantly
  12. And as the line approaches horizontal, the highest net value occurs at dramatically larger data scale.
  13. Constant time implies constantfactor of growth. Thus the accumulation of all of history before 10 time units ago is less than half the accumulation in the last 10 units alone. This is true at all time.
  14. Startups use this fact to their advantage and completely change everything to allow time-efficient development initially with conversion to computer-efficient systems later.
  15. Here the later history is shown after the initial exponential growth phase. This changes the economics of the company dramatically.
  16. The startup can throw away history because it is so small. That means that the startup has almost no compatibility requirement because the data lost due to lack of compatibility is a small fraction of the total data.
  17. A large enterprise cannot do that. They have to have access to the old data and have to share between old data and Hadoop accessible data.This doesn’t have to happen with the proof of concept level, but it really must happen when hadoop first goes to production.