Big Data Technology

•Transferir como ODP, PDF•

2 gostaram•1,116 visualizações

The slides cover Map Reduce and Hadoop as basic technologies for Big Data processing. Based on this, the Hadoop ecosystem is explained along with extensions and concepts such as Lambda Architecture for real-time event-processing. The presentation ends with giving an outlook on future technologies.

Engenharia Tecnologia Notícias e política

1 / 31
BIG DATA TECHNOLOGY
● Juanjo Mostazo
● c-base Berlin
● May 2014

2 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends

3 / 31
M/R: Motivation
● Process
big amount
of data to
produce
other data
● Scale up vs
Scale out

4 / 31
M/R: What is it?
● Different programming paradigm
● Based on a google paper (2004)
● Automatic parallelization and distribution
● I/O Scheduling
● Fault tolerance
● Status and monitoring

5 / 31
M/R: The paradigm
● Input & Output: set of key/value pairs
● Big amount of data group & sort
● Job = Two phases = Mapper & Reducer
● Map (in_key, in_value) →
list(interm_key, interm_value)
● Reduce (interm_key,
list(interm_value)) →
list (out_key, out_value)

8 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends

9 / 31
Hadoop: What is it?
● Framework based on GMR / GFS
● Apache project
● Developed in Java
● Multiple applications
● Used by many companies
● De-facto standard in community

11 / 31
Hadoop: HDFS concepts
● Distributed file system. Layer
on top ext3, xfs...
● Works better on huge files
● Redundancy (default 3)
● Bad seeking, no append!
● Good rack scale. Not good
data center scale
● File divided in 128Mb –
256Mb blocks
● Computation is sent to data!

17 / 31
Hadoop: Advanced
● Distributed caches
● Partitioner
● Sort comparator
● Group comparator
● Combiner
● Input format & Record reader
● MultiInput
● MultiOutput
● Compression

18 / 31
Hadoop: Conclusions
● Simplify large-scale computation
● Hide parallel programming issues
● Easy to get into & develop (huge doc)
● Deeply used & maintained by community
● Possibility to throw away RDBMs! (Bottleneck)

19 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends

21 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends

22 / 31
Lambda Architecture: Motivation
● Real time use cases
● Business analytics
● Batch processing vs Real Time
● Problem!
● Low latency read & update
● Scalable & fault tolerant
● Something else needed!

26 / 31
Lambda Architecture: Lambdoop
● Unified technology stack
● High level programming environment
● Management tools

27 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends

28 / 31
New trends: Architecture
● Hadoop vs Hadoop2
● Columnar storage

29 / 31
New trends: Storm
● Stream processing
● Tuples
● Streams
● Spouts
● Bolts
● Topologies
● Twitter

30 / 31
New trends: Spark
● Next generation MapReduce
● Integrated but not dependent on Hadoop
● Fast memory optimized execution engine
● Avoids many Hadoop problems
● Overhead
● High latency
● Many disk writes
● In-memory cache
● Flexible executions graph
● Much faster than MapReduce (up to 100x)
● Shark (SQL)
● Support streaming (beta)

31 / 31
BIG DATA TECHNOLOGY
● Juanjo Mostazo
● juanj.mostazo@gmail.com
● http://www.slideshare.net/juanjmostazo/mr-hadoop-cbase

Mais conteúdo relacionado

Destaque

The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures. The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years. This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture. Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.

Big Data Architectures

Guido Schmutz

Presenter: Alvaro Agea, Big Data Architect at Stratio Big Data analysis is commonly associated with batch processing of data stored in distributed file systems. The advent of streaming data is exposing the shortcomings of the traditional data analysis. Users aiming to combine both worlds - batch processing and streaming - had to turn to unreliable in-house developments. We propose Stratio META to meet this new need. META is a technology based on a structured NoSQL datastore with advanced indexing capabilities. META includes an efficient query planner designed from scratch. The planner determines which is the optimal path to execute a query and which components should be involved.

Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...

DataStax Academy

Importance of Big Data Analytics

Impetus Technologies

Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination of both will bring the most value.

Introduction to Streaming Analytics

Guido Schmutz

At StampedeCon 2014, Scott Shaw (Hortonworks) and Kit Menke (Enteprise Holdings) presented "Storm – Streaming Data Analytics at Scale" Storm’s primary purpose is to provide real-time analytics against fast moving data before its stored. The use cases range from fraud detection, machine learning, to ETL. Storm has been clocked at over 1 million tuples processed per second per node. It’s fast, scalable, and language agnostic. This session provides an architecture overview as well as a real-world discussion of its use and implementation at Enterprise Holdings.

Storm – Streaming Data Analytics at Scale - StampedeCon 2014

StampedeCon

Introduction to Streaming Analytics

Guido Schmutz

Destaque (6)

Big Data Architectures

Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...

Importance of Big Data Analytics

Introduction to Streaming Analytics

Storm – Streaming Data Analytics at Scale - StampedeCon 2014

Introduction to Streaming Analytics

Semelhante a Big Data Technology

Mr hadoop seedrocket

SeedRocket

Big data processing systems research

Vasia Kalavri

An Introduction to MapReduce

Frane Bandov

How to get started in Big Data for master's students

Mohamed Nadjib MAMI

Cloud Computing course presentation, Tarbiat Modares University By: Sina Ebrahimi, Mohammadreza Noei Advisor: Sadegh Dorri Nogoorani, PhD. Presentation Data: 1397/03/07 Video Link in Aparat: https://www.aparat.com/v/N5VbK Video Link on TMU Cloud: http://cloud.modares.ac.ir/public.php?service=files&t=9ecb8d2dd08df6f990a3eb63f42011f7 This presenation's pptx file (some animations may be lost in slideshare) : http://cloud.modares.ac.ir/public.php?service=files&t=f62282dbd205abaa66de2512d9fdfc83

An Introduction to MapReduce

Sina Ebrahimi

Data Science is concerned with the analysis of large amounts of data. When the volume of data is really large, it requires the use of cooperating, distributed machines. The most popular method of doing this is Hadoop, a collection of programs to perform computations on connected machines in a cluster. Hadoop began life as an open-source implementation of MapReduce, an idea first developed and implemented by Google for its own clusters. Though Hadoop's MapReduce is Java-based, and quite complex, this talk focuses on the "streaming" facility, which allows Python programmers to use MapReduce in a clean and simple way. We will present the core ideas of MapReduce and show you how to implement a MapReduce computation using Python streaming. The presentation will also include an overview of the various components of the Hadoop "ecosystem." NYC Data Science Academy is excited to welcome Sam Kamin who will be presenting an Introduction to Hadoop for Python Programmers a well as a discussion of MapReduce with Streaming Python. Sam Kamin was a professor in the University of Illinois Computer Science Department. His research was in programming languages, high-performance computing, and educational technology. He taught a wide variety of courses, and served as the Director of Undergraduate Programs. He retired as Emeritus Associate Professor, and worked at Google until taking his current position as VP of Data Engineering in NYC Data Science Academy. -------------------------------------- Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot! If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.

Streaming Python on Hadoop

Vivian S. Zhang

MapReduce

robjk

MapReduce

robjk

Lambda architecture @ Indix

Rajesh Muppalla

There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products such as Oracle Big Data SQL on the Oracle Big Data Appliance along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we'll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete "data fabric" solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.

Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :

Mark Rittman

Big Data Processing

Michael Ming Lei

Distributed computing poli

ivascucristian

Comparing Distributed Indexing To Mapreduce or Not?

TerrierTeam

BlaBlaCar Elastic Search Feedback

sinfomicien

Main map reduce

Masoumeh Rezaei Jam

Software Design Practices for Large-Scale Automation

Hao Xu

Map and Reduce

Christopher Schleiden

My mapreduce1 presentation

Noha Elprince

2014 hadoop wrocław jug

Wojciech Langiewicz

How Apache Spark fits into the Big Data landscape

Paco Nathan

Semelhante a Big Data Technology (20)

Mr hadoop seedrocket

Big data processing systems research

An Introduction to MapReduce

How to get started in Big Data for master's students

An Introduction to MapReduce

Streaming Python on Hadoop

MapReduce

Lambda architecture @ Indix

Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :

Big Data Processing

Distributed computing poli

Comparing Distributed Indexing To Mapreduce or Not?

BlaBlaCar Elastic Search Feedback

Main map reduce

Software Design Practices for Large-Scale Automation

Map and Reduce

My mapreduce1 presentation

2014 hadoop wrocław jug

How Apache Spark fits into the Big Data landscape

Último

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking Escorts Service Available Whatsapp SABANA ☎️ : [+91-7001035870] Escorts Service are always ready to make their clients happy. Their exotic looks and sexy personalities are sure to turn heads. You can enjoy with them, including massages and erotic encounters. Our area Escorts are young and sexy, so you can expect to have an exotic time with them. They are trained to satiate your naughty nerves and they can handle anything that you want. They are also intelligent, so they know how to make you feel comfortable and relaxed Independent Escorts Service They know all the sex positions and can satisfy you in any way that you desire. They can even give you erotic massages to help you relax before your session. This is essential, because a man who is stressed won’t be receptive to the pleasures of sex. They also know how to play with your sexy organs, so you’ll have plenty of foreplay and cuddling. P252024SS SERVICE ✅ ❣️ ⭐➡️HOT & SEXY MODELS // COLLEGE GIRLS HOUSE WIFE RUSSIAN , AIR HOSTES ,VIP MODELS . AVAILABLE FOR COMPLETE ENJOYMENT WITH HIGH PROFILE INDIAN MODEL AVAILABLE HOTEL & HOME ★ SAFE AND SECURE HIGH CLASS SERVICE AFFORDABLE RATE ★ SATISFACTION,UNLIMITED ENJOYMENT. ★ All Meetings are confidential and no information is provided to any one at any cost. ★ EXCLUSIVE PROFILes Are Safe and Consensual with Most Limits Respected ★ Service Available In: - HOME & HOTEL Star Hotel Service .In Call & Out call SeRvIcEs : ★ A-Level (star escort) ★ Strip-tease ★ BBBJ (Bareback Blowjob)Receive advanced sexual techniques in different mode make their life more pleasurable. ★ Spending time in hotel rooms ★ BJ (Blowjob Without a Condom) ★ Completion (Oral to completion) ★ Covered (Covered blowjob Without condom ★ANAL SERVICES.

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking

dharasingh5698

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to 25K High Profile Escorts In Pune Booking Now open +91- 8005736733 Why you Choose Us- +91- 8005736733 HOT⇄ 8005736733 Mr ashu ji Call Mr ashu Ji +91- 8005736733 (V030524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models • Foreigner Models • TV Actress and Celebrities • Receptionist • Air Hostess • Call Center Working Girls/Women • Hi-Tech Co. Girls/Women • Housewife

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...

SUHANI PANDEY

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Indian Girls Waiting For You To Fuck Booking Contact Details WhatsApp Chat: +91-6297143586 pune Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts pune understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 01-may-2024(v.n)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

Call Girls in Nagpur High Profile

Thermal Engineering-R & A / C - unit - V

DineshKumar4165

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974 🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in-call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Full night for more than 1 person: Contact us at 🔝 9953056974 🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974 🔝. Thank you for considering us!

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...

9953056974 Low Rate Call Girls In Saket, Delhi NCR

AKTU Computer Networks notes --- Unit 3.pdf

ankushspencer015

UNIT - IV - Air Compressors and its Performance

sivaprakash250

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss This Chance Of Getting Into My Sexy Boobs? Booking Contact Details WhatsApp Chat: +91-8250192130 pune Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts pune understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 30-april-2024(v.n)

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

ranjana rawat

Java Programming :Event Handling(Types of Events)

simmis5

Unit 1 - Soil Classification and Compaction.pdf

RagavanV2

Generative AI or GenAI technology based PPT

bhaskargani46

Unleashing the Power of the SORA AI lastest leap

RishantSharmaFr

Double rodded leveling 1 pdf activity 01

KreezheaRecto

LIST OF EXPERIMENTS: 1. Implement simple vector addition in Tensor Flow. 2. Implement a regression model in Keras. 3. Implement a perception in TensorFlow/Keras Environment. 4. Implement a Feed Forward Network in TensorFlow/Keras. 5. Implement an image classifier using CNN in TensorFlow/Keras. 6. Improve the deep Learning model by fine tuning hyper parameters. 7. Implement a Transfer Learning concept in image classification. 8. Using a pre trained model on Keras for transfer learning. 9. Perform Sentimental Analysis using RNN. 10. Implement an LSTM based Auto encoding inTensorflow/Keras. 11. Image generation using GAN. ADDITIONAL EXPERIMENTS 12. Train a deep Learning model to classify a given image using pre trained model. 13. Recommendation system from sales data using Deep Learning. 14. Implement Object detection using CNN. 15. Implement any simple Reinforcement Algorithm for an NLP problem.

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

Asst.prof M.Gokilavani

Welcome to the April edition of WIPAC Monthly, the magazine brought to you by Water Industry Process Automation & Control. In this month's edition, along with the latest news from the industry we have articles on: The use of artificial intelligence and self-service platforms to improve water sustainability A feature article on measuring wastewater spills An article on the National Underground Asset Register Have a good month, Oliver

Water Industry Process Automation & Control Monthly - April 2024

Water Industry Process Automation & Control

Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking Booking Now open +91- 7737669865 Why you Choose Us- +91- 7737669865 HOT⇄ 7737669865 Mr ashu ji Call Mr ashu Ji +91- 7737669865 (V020524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models

Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking

roncy bisnoi

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking Booking Now open +91- 7737669865 Why you Choose Us- +91- 7737669865 HOT⇄ 7737669865 Mr ashu ji Call Mr ashu Ji +91- 7737669865 (V020524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking

roncy bisnoi

The Educational Administration: Theory and Practice publishes prominent empirical and conceptual articles focused on timely and critical leadership and policy issues of educational organizations. The journal embraces traditional and emergent research paradigms, methods, and issues. The journal particularly promotes the publication of rigorous and relevant scholarly work that enhances linkages among and utility for educational policy, practice, and research arenas. The goal of the editorial team and the journal’s editorial board is to promote sound scholarship and a clear and continuing dialogue among scholars and practitioners from a broad spectrum of education. Educational Administration: Theory and Practice presents prominent empirical and conceptual articles focused on timely and critical leadership and policy issues facing educational organizations. As an editorial team, we embrace traditional and emergent theoretical frameworks, research methods, and topics. We particularly promote the publication of rigorous and relevant scholarly work with utility for educational policy, practice, and research. The journal’s primary focus is on studies of educational leadership, organizations, leadership development, and policy as they relate to elementary and secondary levels of education. Examinations of leadership and policy that fall outside K-12 are considered insofar as there are meaningful connections to the K-12 arena (e.g., college pipeline). International comparative investigations are welcome to the extent they have implications for a broad audience.s.

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

Christo Ananth

notes on Evolution Of Analytic Scalability.ppt

MsecMca

Double Revolving field theory-how the rotor develops torque

BhangaleSonal

Big Data Technology

1. 1 / 31 BIG DATA TECHNOLOGY ● Juanjo Mostazo ● c-base Berlin ● May 2014

2. 2 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends

3. 3 / 31 M/R: Motivation ● Process big amount of data to produce other data ● Scale up vs Scale out

4. 4 / 31 M/R: What is it? ● Different programming paradigm ● Based on a google paper (2004) ● Automatic parallelization and distribution ● I/O Scheduling ● Fault tolerance ● Status and monitoring

5. 5 / 31 M/R: The paradigm ● Input & Output: set of key/value pairs ● Big amount of data group & sort ● Job = Two phases = Mapper & Reducer ● Map (in_key, in_value) → list(interm_key, interm_value) ● Reduce (interm_key, list(interm_value)) → list (out_key, out_value)

6. 6 / 31 M/R: Example (word counter)

7. 7 / 31 M/R: Workflow

8. 8 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends

9. 9 / 31 Hadoop: What is it? ● Framework based on GMR / GFS ● Apache project ● Developed in Java ● Multiple applications ● Used by many companies ● De-facto standard in community

10. 10 / 31 Hadoop: HDFS Architecture

11. 11 / 31 Hadoop: HDFS concepts ● Distributed file system. Layer on top ext3, xfs... ● Works better on huge files ● Redundancy (default 3) ● Bad seeking, no append! ● Good rack scale. Not good data center scale ● File divided in 128Mb – 256Mb blocks ● Computation is sent to data!

12. 12 / 31 Hadoop: Architecture v1

13. 13 / 31 Hadoop: Architecture v2

14. 14 / 31 Hadoop: Architecture v3

15. 15 / 31 M/R: Example (word counter)

16. 16 / 31 Hadoop: Clustering

17. 17 / 31 Hadoop: Advanced ● Distributed caches ● Partitioner ● Sort comparator ● Group comparator ● Combiner ● Input format & Record reader ● MultiInput ● MultiOutput ● Compression

18. 18 / 31 Hadoop: Conclusions ● Simplify large-scale computation ● Hide parallel programming issues ● Easy to get into & develop (huge doc) ● Deeply used & maintained by community ● Possibility to throw away RDBMs! (Bottleneck)

19. 19 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends

20. 20 / 31 Hadoop: Ecosystem

21. 21 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends

22. 22 / 31 Lambda Architecture: Motivation ● Real time use cases ● Business analytics ● Batch processing vs Real Time ● Problem! ● Low latency read & update ● Scalable & fault tolerant ● Something else needed!

23. 23 / 31 Lambda Architecture: Schema

24. 24 / 31 Lambda Architecture: Example 1

25. 25 / 31 Lambda Architecture: Example 2

26. 26 / 31 Lambda Architecture: Lambdoop ● Unified technology stack ● High level programming environment ● Management tools

27. 27 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends

28. 28 / 31 New trends: Architecture ● Hadoop vs Hadoop2 ● Columnar storage

29. 29 / 31 New trends: Storm ● Stream processing ● Tuples ● Streams ● Spouts ● Bolts ● Topologies ● Twitter

30. 30 / 31 New trends: Spark ● Next generation MapReduce ● Integrated but not dependent on Hadoop ● Fast memory optimized execution engine ● Avoids many Hadoop problems ● Overhead ● High latency ● Many disk writes ● In-memory cache ● Flexible executions graph ● Much faster than MapReduce (up to 100x) ● Shark (SQL) ● Support streaming (beta)

31. 31 / 31 BIG DATA TECHNOLOGY ● Juanjo Mostazo ● juanj.mostazo@gmail.com ● http://www.slideshare.net/juanjmostazo/mr-hadoop-cbase

Big Data Technology

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (6)

Semelhante a Big Data Technology

Semelhante a Big Data Technology (20)

Último

Último (20)

Big Data Technology