O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Wake up and smell the data

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 40 Anúncio

Wake up and smell the data

Baixar para ler offline

Big data is a big part of the disruption hitting this market, but not in the way most people think. It's not replacing the data warehouse, but it is changing the technology stack. It doesn't eliminate data management, but it does redefine enterprise data architecture. Big data is and isn't many things. It's important to understand which information uses are well supported and which have yet to be addressed. Otherwise you risk replacing one set of problems with another. Come to this session to hear some observations on what big data is, isn't and aspires to be.
A video is available, starts at 1:03 into this Strata online event: http://www.youtube.com/watch?v=gLsHI1ZglKw

Big data is a big part of the disruption hitting this market, but not in the way most people think. It's not replacing the data warehouse, but it is changing the technology stack. It doesn't eliminate data management, but it does redefine enterprise data architecture. Big data is and isn't many things. It's important to understand which information uses are well supported and which have yet to be addressed. Otherwise you risk replacing one set of problems with another. Come to this session to hear some observations on what big data is, isn't and aspires to be.
A video is available, starts at 1:03 into this Strata online event: http://www.youtube.com/watch?v=gLsHI1ZglKw

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Anúncio

Semelhante a Wake up and smell the data (20)

Mais de mark madsen (20)

Anúncio

Mais recentes (20)

Wake up and smell the data

  1. 1. Wake Up and  Smell the Data February, 2013 Mark Madsen www.ThirdNature.net @markmadsen
  2. 2. Caveat The focus of this talk is on information processing  and delivery, leaving out many aspects of big data  in the automation / execution sense.
  3. 3. Big Data, Big Hype $876 Gajillion (analyst estimates of the big data market)
  4. 4. We’ve been here before Bill Schmarzo, EMC
  5. 5. Big Data, Big Nonsense Big data is subjective, based on bigness at a point in time? McKinsey focused on the least interesting aspect of big data. Source: McKinsey
  6. 6. Data volume is the oldest, easiest problem Image courtesy of Teradata
  7. 7. Technology Capability and Data Volume Source: Noumenal, Inc.
  8. 8. Origin of BI and data warehouse concepts The general concept of a  separate architecture for BI  has been around longer, but  this paper by Devlin and  Murphy is the first formal  data warehouse architecture  and definition published. 8 “An architecture for a business and information system”, B. A. Devlin, P. T. Murphy, IBM Systems Journal, Vol.27, No. 1, (1988) Slide 8Copyright Third Nature, Inc.
  9. 9. Our ideas about information and how it’s used are outdated.
  10. 10. Metadata catalog
  11. 11. Report
  12. 12. Report library
  13. 13. BI is using broken metaphors We think of BI as publishing, which it isn’t.
  14. 14. When you first give people access to information  that was unavailable… OH GOD I can see into forever
  15. 15. After a while the response is more measured
  16. 16. User autonomy is a tradeoff Autonomy is a tradeoff in  most data warehouses:  control at the expense of  complexity. Complexity for casual users  can lead to messes. So we err on the side of  simplifying user access in  three ways…
  17. 17. Centralize: that solves all problems! Creates bottlenecks Causes scale problems Enforces a single model In some organizations and areas of business “data warehouse” is a bad word.
  18. 18. Standardize: it’s simpler for everyone
  19. 19. The “E” in EDW was a lie…
  20. 20. Measurement started with the convenient data The convenient data is  transactional data. ▪ Goes in the DW and is used, even  if it isn’t the right measurement. The difficult and misleading data  is declarative data. ▪ What people say and what they  do require ground truth. The inconvenient data is  observational data. ▪ It’s not neat, clean, or designed  into most systems of operation. We need to build data systems  that integrate all three.
  21. 21. Value: There’s a pony in there somewhere
  22. 22. Many current views miss the point Using Big Data
  23. 23. It’s not about “big” Using Big Data And “big” is often not as big as you think it is.
  24. 24. It’s not really about data, either Using Big Data If there’s no process for applying information in a specific context then you are producing expensive trivia.
  25. 25. Two keys to making big data worthwhile Value: Goal  solution not Solution  goal Actionability: Simple “value” isn’t enough. Information has to be actionable, somehow.
  26. 26. Planning data strategy means understanding the  context of data use so we can provide infrastructure Monitor Analyze Exceptions Analyze Causes Decide Act No problem No idea Do nothing We need to focus on what people do with data as the primary task, not on the data or the technology. Copyright Third Nature, Inc.
  27. 27. General model for organizational use of data Collect new data Monitor Analyze Exceptions Analyze Causes Decide Act No problem No idea Do nothing Act on the process Usually days/longer timeframe Act within the process Usually real-time to daily
  28. 28. You need to be able to support both paths Collect new data Monitor Analyze Exceptions Analyze Causes Decide Act Act on the process Act within the process Conventional BI Causal analysis, i.e. “data science”
  29. 29. How do you manage the business in today’s environment? Our simplistic notions of BI with stable models, ordered data  and predictability are being replaced by concepts from  decision support and complex adaptive systems (CAS). Simple Complicated Complex Assumption: Order Assumption: Unorder Assumption: Disorder Cause and effect is repeatable  & predictable  Cause and effect is separated  in time & space, repeatable,  learnable Cause and effect is coherent  in retrospect only, modelable but changing Known Knowable Unpredictable Standard processes, clear  metrics, best practice Analytical techniques to  determine options, effects Experiment to create possible  options Sense, categorize, respond Sense, analyze, respond Test, sense, respond Reporting, dashboards Ad‐hoc, OLAP, exploration Data science, casual analysis Situational context governs data useCopyright Third Nature, Inc.
  30. 30. BI/DW environment support varies for these contexts Handles this really well  (most of the time). Basic BI Analysis Data science, analytics Assumption: Order Assumption: Unorder Assumption: Disorder Cause and effect is repeatable  & predictable  Cause and effect is separated  in time & space, repeatable,  learnable Cause and effect is coherent  in retrospect only, modelable but changing Known Knowable Unpredictable Standard processes, clear  metrics, best practice Analytical techniques to  determine options, effects Experiment to create possible  options, test hypotheses Sense, categorize, respond Sense, analyze, respond Test, sense, respond Reporting, dashboards Ad‐hoc, OLAP, data discovery Casual analysis, simulation Handles this sort of  ok, sometimes. This, not so much. Copyright Third Nature, Inc.
  31. 31. TANSTAAFL Technologies are not  perfect replacements for  one another. When replacing the old  with the new (or ignoring  the new over the old) you  always make tradeoffs,  and usually you won’t  see them for a long time.
  32. 32. The usage models for conventional BI Collect new data Monitor Analyze Exceptions Analyze Causes Decide Act No problem No idea Do nothing Act on the process Usually days/longer timeframe Act within the process Usually real-time to daily This is what we’ve been doing with BI so far: static reporting, dashboards, ad-hoc query, OLAP
  33. 33. The usage models for analytics and “big data”  Collect new data Monitor Analyze Exceptions Analyze Causes Decide Act No problem No idea Do nothing Act on the process Usually days/longer timeframe Act within the process Usually real-time to daily Analytics and big data is focused on new use cases: deeper analysis, causes, prediction, optimizing decisions This isn’t ad-hoc, reporting, or OLAP.
  34. 34. Analytics embiggens the data volume problem Many of the processing problems are O(n2) or worse, so  moderate data can be a problem for DB‐based platforms
  35. 35. New and growing use cases drive the need to expand The use cases are now interactive applications, lower latency  data, complex analytics and discovery rather than reporting.
  36. 36. Big Data Shift in a Nutshell The old model for data ▪ Centralized publishing ▪ Read only ▪ Integrate before use ▪ Record only important data ▪ Retrieval‐focused ▪ Single method of access ▪ Human‐level latency The new model for data ▪ Community creation ▪ Read‐write ▪ Integrate at time of use ▪ Record all the data ▪ Processing‐focused ▪ Multiple methods of access ▪ Machine‐level latency It’s an architectural reconfiguration, just like web 2.0
  37. 37. “The future, according to some scientists, will be exactly like  the past, only far more expensive.” ~ John Sladek
  38. 38. About the Presenter Mark Madsen is president of Third  Nature, a research and advisory firm  focused on analytics, business  intelligence and data management.  Mark is an award‐winning author,  architect and CTO whose work has been  featured in numerous industry  publications. Over the past ten years  Mark received awards for his work from  the American Productivity & Quality  Center, TDWI, and the Smithsonian  Institute. He is an international speaker,  a contributor at Forbes Online and  Information Management. For more  information or to contact Mark, follow  @markmadsen on Twitter or visit   http://ThirdNature.net 
  39. 39. About Third Nature Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, and performance management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place. Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors. We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.
  40. 40. CC Image Attributions Thanks to the people who supplied the creative commons licensed images used in this presentation: Outdated gumshoe.jpg – http://flickr.com/photos/olivander/372385317/ Card catalog – http://flickr.com/photos/deborahfitchett/2372385317/ book of hours manuscript2.jpg ‐ http://flickr.com/photos/jeffrey/89461374/ royal library san lorenzo.jpg ‐ http://flickr.com/photos/cuellar/370663920/ uniform_umbrellas.jpg ‐ http://www.flickr.com/photos/mortimer/221051561/ ponies in field.jpg ‐ http://www.flickr.com/photos/bulle_de/352732514/ caged_tower_melbourne.jpg ‐ http://www.flickr.com/photos/vermininc/2227512763

×