SlideShare uma empresa Scribd logo
1 de 45
Baixar para ler offline
Big Data and MicroStrategy:
           Building a Bridge for the Elephant

                 Paul Groom, Chief Innovation Officer
Jan 2013
Let’s start at…

           The End.
Panacea
You…built the E
              DW
You…built the BICC
and yes you built…
lots of cool reports and dashboards
Epilogue
A comfortable status quo
How are you really judged?




                             • Fast?
                             • Consistent?
                             • All users?
Rrrrrriiiiiiinnnnnngggggg!




                 Back to the real world
Disruption
Disruptor: New Data
Disruptor: Social Media & Sentiment
Disruptor:




             Data ?
Disruptor: More Connected Users
Disruptor: Data Discovery Tools

Choices for engaging quickly with data




Business users head’s distracted from core BI!
BI Wild West
Where it matters
Lots of variety of DW and EDW
The Reality of the DW


 analytical workload
EDW says no or not now!
…and CFO says no big upgrades
Pragmatism



…ok so you enable plenty of caching,
 limit drill anywhere
 and add Intelligent Cubes
And then came…
Distraction
                                          or
                                   Boon




http://oris-rake.deviantart.com/
Scalable, resilient, bit bucket
Experimenting




   © 20th Century Fox
The Hadoop stack



                                              Pig              Hive
         ZooKepper / Ambari




                                      HBase
                                                MapReduce
                              Oozie



                                                    HCatalog


                                               HDFS
Hadoop Performance Reality
• Hadoop is batch oriented
• HDFS access is fast but crude
• MapReduce is powerful but has overheads
     – ~30 second base response time
     – Too much latency in stack and processing model
     – Trade-off in optimization and latency
• MapReduce complex
     – Typically multiple Java routines



https://www.facebook.com/notes/facebook-engineering/
under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-
corona/10151142560538920
SQL to the Rescue
• So MapReduce is complicated

    – use Hive (SQL) as the easy way out


                                             Pig              Hive
        ZooKepper / Ambari




                                     HBase

                                               MapReduce
                             Oozie




                                                   HCatalog


                                              HDFS
Hive
• Simplifies access
    Hive is great, but Hadoop’s execution engine
    “
    makes even the smallest queries take minutes!”
• Only basic SQL support
• Concurrency needs careful system admin
• It’s not a silver bullet for interactive BI usage
Conclusion


 Hadoop just too slow
 for interactive BI!
         “while hadoop shines as a processing
          platform, it is painfully slow as a query tool”

   …loss of train-of-thought
Hive is based on Hadoop which is a batch processing system. Accordingly,
this system does not and cannot promise low latencies on queries. The
paradigm here is strictly of submitting jobs and being notified when the jobs
are completed as opposed to real time queries. As a result it should not be
compared with systems like Oracle where analysis is done on a
significantly smaller amount of data but the analysis proceeds much more
iteratively with the response times between iterations being less than a few
minutes. For Hive queries response times for even the smallest jobs
can be of the order of 5-10 minutes and for larger jobs this may even
run into hours.

I remain skeptical on the practical performance of the Hive query approach
and have yet to talk to any beta customers. A more practical approach is
loading some of the Hadoop data into the in-memory cube with the new
Hadoop connector.
Why can’t Hadoop
Why can’t I have a   be in-memory?
giant icubes?
Remember…


Lots of these
Hadoop inherently disk oriented



Not so many of these
Typically low ratio of CPU to Disk
Larger cubes

 Issues: Time to Populate, Proliferation
Alternative - In-memory Processing


  Analyticsdo the work!
    Cores requires CPU,
  RAM keeps the data close
    Scale with the data
Goals: Minimise Disruption, Cut Latency
• Don’t change the existing BI and analytics
• Support more creative and dynamic BI
• Don’t introduce yet more slow disk
     – Help the DW investment
•   No complex ETL, just pull data as required
•   Pull data simply and intelligently from Hadoop
•   Simplify – less cubes, caches
•   Improve sharing of data
•   Increase concurrency and throughput
     – Its all about queries per hour!
• Minimal DBA requirement
Kognitio Hadoop Connectors
HDFS Connector
• Connector defines access to hdfs file system
• External table accesses row-based data
  in hdfs
• Dynamic access or “pin” data into memory
• Selected hdfs file(s) loaded into memory




Filter Agent Connector
• Connector uploads agent to Hadoop nodes
• Query passes selections and relevant
  predicates to agent
• Data filtering and projection takes place
  locally on each Hadoop node
• Only data of interest is loaded into memory
  via parallel load streams
BI – Central Governance

Centrally defined data models
Persist data in natural store
Fetch when needed, agile
Available to all tools
                 Analytical power
Engineering for Success




 Thomas Herbrich
connect
                                   NA: +1 855  KOGNITIO
www.kognitio.com                   EMEA: +44 1344 300 770

linkedin.com/companies/kognitio    twitter.com/kognitio

tinyurl.com/kognitio               youtube.com/kognitio

Mais conteúdo relacionado

Mais procurados

start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3David Byte
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorEdureka!
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Jean-Pierre König
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudLeons Petražickis
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...Yahoo Developer Network
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopDataWorks Summit
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsCloudera, Inc.
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Hadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itHadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itKognitio
 

Mais procurados (20)

start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3
 
Flexible Design
Flexible DesignFlexible Design
Flexible Design
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop Administrator
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with Hadoop
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Hadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itHadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix it
 

Destaque

Shfaim no age 25 1 13
Shfaim no age 25 1 13Shfaim no age 25 1 13
Shfaim no age 25 1 13Alex Sklar
 
My flat plan designs
My flat plan designsMy flat plan designs
My flat plan designslaurenmorgan
 
Android and android phones
Android and android phonesAndroid and android phones
Android and android phonescarizzapantangco
 
Collections Databases; Making the system work for you
Collections Databases; Making the system work for youCollections Databases; Making the system work for you
Collections Databases; Making the system work for youirowson
 
עוני קיימות וזקנה
עוני קיימות וזקנהעוני קיימות וזקנה
עוני קיימות וזקנהAlex Sklar
 
The hybrids are coming: The Era of Touchscreen Hybrids
The hybrids are coming: The Era of Touchscreen HybridsThe hybrids are coming: The Era of Touchscreen Hybrids
The hybrids are coming: The Era of Touchscreen HybridsJohn Whalen
 

Destaque (7)

Shfaim no age 25 1 13
Shfaim no age 25 1 13Shfaim no age 25 1 13
Shfaim no age 25 1 13
 
My flat plan designs
My flat plan designsMy flat plan designs
My flat plan designs
 
Android and android phones
Android and android phonesAndroid and android phones
Android and android phones
 
Collections Databases; Making the system work for you
Collections Databases; Making the system work for youCollections Databases; Making the system work for you
Collections Databases; Making the system work for you
 
עוני קיימות וזקנה
עוני קיימות וזקנהעוני קיימות וזקנה
עוני קיימות וזקנה
 
The hybrids are coming: The Era of Touchscreen Hybrids
The hybrids are coming: The Era of Touchscreen HybridsThe hybrids are coming: The Era of Touchscreen Hybrids
The hybrids are coming: The Era of Touchscreen Hybrids
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 

Semelhante a Big data and mstr bridge the elephant

Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)John Dougherty
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersAdaryl "Bob" Wakefield, MBA
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupCaserta
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Hadoop, Infrastructure and Stack
Hadoop, Infrastructure and StackHadoop, Infrastructure and Stack
Hadoop, Infrastructure and StackJohn Dougherty
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoopGeoff Hendrey
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 

Semelhante a Big data and mstr bridge the elephant (20)

Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Hadoop, Infrastructure and Stack
Hadoop, Infrastructure and StackHadoop, Infrastructure and Stack
Hadoop, Infrastructure and Stack
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Big data and mstr bridge the elephant

  • 1. Big Data and MicroStrategy: Building a Bridge for the Elephant Paul Groom, Chief Innovation Officer Jan 2013
  • 6. and yes you built… lots of cool reports and dashboards
  • 8. How are you really judged? • Fast? • Consistent? • All users?
  • 9.
  • 10. Rrrrrriiiiiiinnnnnngggggg! Back to the real world
  • 13. Disruptor: Social Media & Sentiment
  • 14. Disruptor: Data ?
  • 16. Disruptor: Data Discovery Tools Choices for engaging quickly with data Business users head’s distracted from core BI!
  • 19.
  • 20. Lots of variety of DW and EDW
  • 21. The Reality of the DW analytical workload
  • 22. EDW says no or not now! …and CFO says no big upgrades
  • 23. Pragmatism …ok so you enable plenty of caching, limit drill anywhere and add Intelligent Cubes
  • 24.
  • 26. Distraction or Boon http://oris-rake.deviantart.com/
  • 28. Experimenting © 20th Century Fox
  • 29. The Hadoop stack Pig Hive ZooKepper / Ambari HBase MapReduce Oozie HCatalog HDFS
  • 30. Hadoop Performance Reality • Hadoop is batch oriented • HDFS access is fast but crude • MapReduce is powerful but has overheads – ~30 second base response time – Too much latency in stack and processing model – Trade-off in optimization and latency • MapReduce complex – Typically multiple Java routines https://www.facebook.com/notes/facebook-engineering/ under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with- corona/10151142560538920
  • 31. SQL to the Rescue • So MapReduce is complicated – use Hive (SQL) as the easy way out Pig Hive ZooKepper / Ambari HBase MapReduce Oozie HCatalog HDFS
  • 32. Hive • Simplifies access Hive is great, but Hadoop’s execution engine “ makes even the smallest queries take minutes!” • Only basic SQL support • Concurrency needs careful system admin • It’s not a silver bullet for interactive BI usage
  • 33. Conclusion Hadoop just too slow for interactive BI! “while hadoop shines as a processing platform, it is painfully slow as a query tool” …loss of train-of-thought
  • 34. Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real time queries. As a result it should not be compared with systems like Oracle where analysis is done on a significantly smaller amount of data but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes. For Hive queries response times for even the smallest jobs can be of the order of 5-10 minutes and for larger jobs this may even run into hours. I remain skeptical on the practical performance of the Hive query approach and have yet to talk to any beta customers. A more practical approach is loading some of the Hadoop data into the in-memory cube with the new Hadoop connector.
  • 35.
  • 36. Why can’t Hadoop Why can’t I have a be in-memory? giant icubes?
  • 37. Remember… Lots of these Hadoop inherently disk oriented Not so many of these Typically low ratio of CPU to Disk
  • 38. Larger cubes Issues: Time to Populate, Proliferation
  • 39. Alternative - In-memory Processing Analyticsdo the work! Cores requires CPU, RAM keeps the data close Scale with the data
  • 40. Goals: Minimise Disruption, Cut Latency • Don’t change the existing BI and analytics • Support more creative and dynamic BI • Don’t introduce yet more slow disk – Help the DW investment • No complex ETL, just pull data as required • Pull data simply and intelligently from Hadoop • Simplify – less cubes, caches • Improve sharing of data • Increase concurrency and throughput – Its all about queries per hour! • Minimal DBA requirement
  • 41.
  • 42. Kognitio Hadoop Connectors HDFS Connector • Connector defines access to hdfs file system • External table accesses row-based data in hdfs • Dynamic access or “pin” data into memory • Selected hdfs file(s) loaded into memory Filter Agent Connector • Connector uploads agent to Hadoop nodes • Query passes selections and relevant predicates to agent • Data filtering and projection takes place locally on each Hadoop node • Only data of interest is loaded into memory via parallel load streams
  • 43. BI – Central Governance Centrally defined data models Persist data in natural store Fetch when needed, agile Available to all tools Analytical power
  • 44. Engineering for Success Thomas Herbrich
  • 45. connect NA: +1 855  KOGNITIO www.kognitio.com EMEA: +44 1344 300 770 linkedin.com/companies/kognitio twitter.com/kognitio tinyurl.com/kognitio youtube.com/kognitio