O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Big Data: Architectures and Approaches

12.030 visualizações

Publicada em

ThoughtWorkers David Elliman and Ashok Subramanian present how the big data world is moving quickly with predictions of amazing industry growth. For more information on how the 'Internet of Things' is playing an increasingly larger role, read David's blog post or watch the video from the London-based event. http://www.thoughtworks.com/insights/blog/big-data-and-internet-things

Publicada em: Tecnologia
  • Get HERE to Read PDF eBook === http://ebookdfsrewsa.justdied.com/ ebookdfsrewsa.justdied.com2702136141-l-annee-du-football-2005.html
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Get Now to Download PDF eBook === http://ebookdfsrewsa.justdied.com/ ebookdfsrewsa.justdied.com2070407853-l-annee-1998-dans-le-monde-les-principaux-evenements-en-france-et-a-l-etranger.html
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Visit Here to Read This eBook === http://ebookdfsrewsa.justdied.com/ ebookdfsrewsa.justdied.com383656856X-ando-l-oeuvre-complet-de-1975-a-nos-jours.html
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Klik heretofree downloadBest Book http://freetosebooks.xyz/Clymer-Arctic-Cat-:-Snowmobile-Shop-Manual-1990-1998.html
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • ACCESS that WEBSITE Over for All Ebooks (Unlimited) ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... DOWNLOAD FULL EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M }
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Big Data: Architectures and Approaches

  1. 1. w e l c o m e BIG DATA Architectures and Approaches David Elliman & Ashok Subramanian
  2. 2. Luke Barrett 1971-2014
  3. 3. http://upload.wikimedia.org/wikipedia/commons/f/f0/DARPA_Big_Data.jpg BIG DATA
  4. 4. https://www.flickr.com/photos/katerha/8380451137/
  5. 5. 1944 https://www.flickr.com/photos/timetrax/376152628/sizes/l 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  6. 6. 1961 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  7. 7. 1971 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  8. 8. 1996 https://www.flickr.com/photos/epsos/8336691931 ge becomes more cost effective for storing da 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  9. 9. 1996 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  10. 10. 1998 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  11. 11. 1998 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 https://www.usenix.org/conference/1999-usenix-annual-technical-conference/big-data-and-next-wave-infrastress-problems
  12. 12. 2004 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  13. 13. 2006 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  14. 14. 2008 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  15. 15. 2010 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  16. 16. 2013 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 "alottabytes"
  17. 17. 2015 https://www.flickr.com/photos/will-lion/2595830716/ 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
  18. 18. https://www.flickr.com/photos/taedc/6998468974
  19. 19. http://blogs.gartner.com/doug-laney/batman-on-big-data/
  20. 20. https://www.flickr.com/photos/10ch/3347658610/
  21. 21. THE OPPORTUNITY
  22. 22. <- 1990 DATA INSIGHT DATA INSIGHT DATA INSIGHT 1990s - 2000 2000 ->
  23. 23. Key Takeaways • This isn’t a new problem • The problem isn’t going away • Remember to focus on the VALUE https://www.flickr.com/photos/djwtwo/8331524425/
  24. 24. Where do we… https://www.flickr.com/photos/ekosystem/4334671818/
  25. 25. https://www.flickr.com/photos/libraryacu/7695938410/
  26. 26. Complexity Value Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics What happened? Why did it happen? What will happen? How can we make it happen? Analytics - Goals
  27. 27. https://www.flickr.com/photos/lopetz/3912416793/ REAL TIME BATCH
  28. 28. Volume Velocity REAL TIME BATCH
  29. 29. https://www.flickr.com/photos/ingythewingy/5510406450/
  30. 30. THINK BIG S M A L L A C T S M A L L A C T Small is the New Big (Seth Godin)
  31. 31. https://www.flickr.com/photos/pauldineen/4529216647/
  32. 32. “80% of the work in any data project is in cleaning the data” – D J Patil https://www.flickr.com/photos/desideratum/8595251348/
  33. 33. https://www.flickr.com/photos/22280677@N07/2504310138/
  34. 34. https://www.flickr.com/photos/jm3/4814208649/
  35. 35. SQL
  36. 36. https://www.flickr.com/photos/marc_smith/6793088143/
  37. 37. Key Takeaways • Start small • Start with the ? • Iteratively follow the value • Using freely available tooling • Volume vs Velocity https://www.flickr.com/photos/djwtwo/8331524425/
  38. 38. Scaling the Solution https://www.flickr.com/photos/auntiep/4310240/
  39. 39. https://www.flickr.com/photos/111692634@N04/11407095913/
  40. 40. –attributed to Gene Amdahl 1967 “Amdahl’s law is used to find the maximum expected improvement to an overall system when only part of the system is improved.”
  41. 41. https://twitter.com/PieCalculus/status/459485747842523136/photo/1
  42. 42. https://www.flickr.com/photos/rofi/2097239111/
  43. 43. Batch Speed Serving Query query = function(all data) All Data Lambda Architecture
  44. 44. Scaled Data Store Event Processing Network QueryAll Data Lambda Architecture Batch View Realtime View Batch Write Random Write
  45. 45. Batch Speed Serving Query query = function(all data) All Data Lambda Architecture
  46. 46. Client Master Node JobTracker Name Node Metadata Operations to Get Block Info Job assignment to cluster Task Tracker Slave Node Data Node Map Reduce Task Tracker Slave Node Data Node Map Reduce Task Tracker Slave Node Data Node Map Reduce Task Tracker Slave Node Data Node Map Reduce 1 3 1 2 1 5 6 4 Data Replication on Multiple Nodes DataWrite DataRead Batch - Hadoop (MR1)
  47. 47. Batch - MapReduce Map Shuffle Reduce
  48. 48. Batch - Cascading
  49. 49. Batch - Spark
  50. 50. Segment Servers Query processing and data storage Network Interconnect Master Servers Query planning & dispatch External Sources Loading, streaming, etc. SQL or MapReduceBatch - MPP database
  51. 51. Batch Speed Serving Query query = function(all data) All Data Lambda Architecture
  52. 52. Speed - Storm
  53. 53. CEP
  54. 54. Batch Speed Serving Query query = function(all data) All Data Lambda Architecture
  55. 55. Lambda Architecture - Serving
  56. 56. http://www.wallzhq.com/wp-content/uploads/2014/02/matrix_binary-wide.jpg
  57. 57. Pull-based Batch Loads Enterprise Data Models Complex ETL Logic Poorly Suited to Non-Relational Data Emergent design is difficult Conventional Architectures
  58. 58. Pivotal Business Data Lake Architecture http://www.gopivotal.com/sites/default/files/Pivotal-Business-Data-Lake-Technical_Brochure_WEB.PDF
  59. 59. DATA CORE RAW FACTUAL DATA HISTORIZED EVENTS RETAIN BUSINESS KEY DATA LINEAGE
  60. 60. DATA INGESTION EVENT DRIVEN MESSAGE QUEUE TRICKLE FEED BATCH LOAD
  61. 61. INFORMATION PUBLISHING TOPICAL QUEUES POST PROCESSING
  62. 62. INFORMATION TIER PURPOSE BUILT DATA SUBSETS TRANSFORMATION DATA GOVERNANCE MDM CONCERNS POST PROCESSING
  63. 63. PRESENTATION TIER BUSINESS VALUE APPLICATIONS DATA SERVICES AD HOC QUERYING WRITE BACK?
  64. 64. Transformation Logic Data Post Processing Near Real Time Feed Emergent Design & Agile Delivery
  65. 65. Apache Kafka Apache Storm
  66. 66. Micro-data-services
  67. 67. Drive Towards In Memory Processing
  68. 68. https://www.tele-task.de/archive/lecture/overview/5721/
  69. 69. Remember https://www.flickr.com/photos/anjin/695894443/
  70. 70. Data Structures Algorithmshttps://www.flickr.com/photos/herrolsen/7645876896/
  71. 71. Raw Data Data Structure Algorithm Insight
  72. 72. Key Takeaways • Embrace the cloud • Fit the Architecture to the problem • Remember Knuth https://www.flickr.com/photos/djwtwo/8331524425/
  73. 73. https://www.flickr.com/photos/tim_norris/2789759648/ SUMMARY
  74. 74. http://www.datameer.com/blog/uncategorized/the-hadoop-ecosystem-visualized-in-datameer.html 48 30 26 22 18 18 16 15 15 15 13 13 13 13 12 0 13 25 38 50 63 Hadoop Ecosystem
  75. 75. https://www.flickr.com/photos/classblog/5136926303/ Commercial Open Source
  76. 76. https://blog.cloudera.com/blog/2011/10/the-community-effect/
  77. 77. https://www.flickr.com/photos/ctsi-global/6556284907/
  78. 78. https://www.flickr.com/photos/will-lion/2597608152/
  79. 79. https://www.flickr.com/photos/jurvetson/14105339228/
  80. 80. Open Questions http://talkmarketing.co.uk/wp-content/uploads/2013/07/Open-Ended-Questions.jpg
  81. 81. https://www.flickr.com/photos/typoatelier/5615759848/
  82. 82. https://www.flickr.com/photos/rembcc/3802038945/
  83. 83. https://www.flickr.com/photos/sidelong/246816211/
  84. 84. No matter how much you speed up the computers or the way you put computers together, the real issues are at the DATA LEVEL
  85. 85. https://www.flickr.com/photos/opensourceway/5556249000/
  86. 86. Enterprise Master Data Management
  87. 87. Localised Formats
  88. 88. Single System of Record
  89. 89. SoR is a process not a place
  90. 90. Database Integration (by another name)
  91. 91. http://www.bain.com/infographics/big-data/ Organisational Models

×