O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 114 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Big Data Tech Stack (20)

Anúncio

Mais recentes (20)

Anúncio

Big Data Tech Stack

  1. 1. Big Data Tech Stack Big Data 2015 by Abdullah Cetin CAVDAR
  2. 2. Me :)
  3. 3. Graduated from @HU
  4. 4. PhD Student @METU
  5. 5. Ex Entrepreneur I had 3 start-ups
  6. 6. Senior Software Engineer @Udemy
  7. 7. Founder and Organizer of meetup.com/ankara-big-data-meetup
  8. 8. What's Big Data Big data is data that exceeds the processing capacity of conventional database systems.
  9. 9. What's Big Data Big data is when the data itself becomes part of the problem.
  10. 10. 4V's of Big Data
  11. 11. Multitude of Data Types Structured Semi-structured Unstructured
  12. 12. Data Data Data
  13. 13. What We Need? Store Join Index Analytics Aggregate Visualize
  14. 14. Challenge The challenge in big data analytics is to dig deeply quickly (real time?) and widely
  15. 15. "ilities" or NFR? Availability Scalability Security Performance ...
  16. 16. Solution?
  17. 17. Big Data Tech Stack
  18. 18. What're essential components?
  19. 19. Data Sources
  20. 20. Multiple internal & external data sources
  21. 21. Creates a data lake
  22. 22. Different Volume, Variety, Velocity
  23. 23. Aim is to create a funnel after proper validation and cleaning
  24. 24. Ingestion Layer
  25. 25. Signal-to-Noise ratio 10:90
  26. 26. separate the noise from relevant info
  27. 27. It has capability to Validate Cleanse Transform Reduce Integrate
  28. 28. Distributed Storage Layer
  29. 29. Fault tolerance Parallelization
  30. 30. HDFS massively scalable distributed file system
  31. 31. HDFS
  32. 32. HDFS Architecture
  33. 33. Non-relational, distributed data?
  34. 34. NoSQL
  35. 35. CAP theorem Consistency, Availability, Partition Tolerance
  36. 36. Ingestion to DFS Sqoop, Flume, MapReduce, ETL
  37. 37. Infrastructure & Platform Layer
  38. 38. Computing & Scalability
  39. 39. Hadoop?
  40. 40. Vertical Scaling
  41. 41. Vertical Scaling
  42. 42. Vertical Scaling
  43. 43. Horizontal Scaling
  44. 44. Horizontal Scaling
  45. 45. Horizontal Scaling
  46. 46. MapReduce is the main computation paradigm
  47. 47. MapReduce
  48. 48. Hadoop 2
  49. 49. What's new?
  50. 50. What's new?
  51. 51. H1 vs. H2
  52. 52. One cluster, distributed storage, distributed scheduler, many types of applications.
  53. 53. Blueprints NoSQL with HBase Stream Processing with Storm/Spark Graph Processing with Giraph SQL on Hadoop with Impala Columnar Data Formats
  54. 54. Security Layer
  55. 55. Data need to be protected Meet compliance requirements Individual's privacy
  56. 56. Proper authorization and authentication needed
  57. 57. What can we do? Authentication protocol like Kerberos Enable file layer encryption Use SSL, certificates and trusted keys Provision with Chef, Puppet or Ansible like tools Log all the communication for detecting anomalies Monitor whole system
  58. 58. Monitoring Layer
  59. 59. Get a complete picture of our Big Data tech stack
  60. 60. Satisfy SLAs with min downtime
  61. 61. DataDog
  62. 62. New Relic (Overview)
  63. 63. New Relic (Databases)
  64. 64. Analytics Engine
  65. 65. Co-Existence with Traditional BI Data warehouse in the traditional way Distributed MR processing on big data stores
  66. 66. Mediate data in either direction i.e use Hive/HBase with Sqoop
  67. 67. Real-time analysis can leverage low-latency NoSQL stores i.e Cassandra, Vertica, ...
  68. 68. R may be used for complex statistical algorithms
  69. 69. Search Engines
  70. 70. Huge volume and variety of data
  71. 71. “needle in a haystack”
  72. 72. Need blazing fast search mechanism to index and search for big data analytics
  73. 73. Elastic Search, Solr, ...
  74. 74. Real-time Processing
  75. 75. In memory?
  76. 76. Apache Spark
  77. 77. Storm, Kinesis, Flink, ...
  78. 78. Visualization Layer
  79. 79. Gain insight faster Look at different aspects of data visually
  80. 80. Tableau
  81. 81. ChartIO
  82. 82. Lambda Architecture
  83. 83. Lambda Architecture / MapR
  84. 84. Don't forget
  85. 85. There is no "One Size Fits All" solution
  86. 86. We need Continuous Development
  87. 87. Thank You :)

×