SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
FEARLESS engineering Acharya institute of technology, Bengaluru. Grand Welcome to the world of, “bigdata_community”
FEARLESS engineering Acharya institute of technology, Bengaluru. 
Introduction to bigdata and installation of single-node apache hadoop cluster 
Presented By, Mahantesh C. Angadi Nagarjuna DN Manoj PT Under the Guidance of, Prof. Manjunath tN Prof. Amogh pk Dept. of ISE AIT, Bengaluru Session-1: 24 March 2014
FEARLESS engineering Contents 
 Purpose of This Talk 
 Introduction 
 Terminologies 
 Cloud Computing 
 BigData 
 Traditional Approaches to Solve BigData Problems 
 Hadoop and its Characteristics 
 Architecture of Hadoop
FEARLESS engineering Contents 
Why Only Hadoop? 
Advantages of Hadoop 
Limitations of Hadoop 
Job Opportunities in BigData & Hadoop 
Conclusion 
References
FEARLESS engineering Purpose of This talk 
To understand the terminologies 
To introduce you to BigData and Hadoop 
To be able to clearly differentiate between Cloud, BigData, Hadoop 
Traditional approaches to handle BigData 
Characteristics of Hadoop 
Explain how Hadoop works? 
Get friendly with MapReduce and HDFS
FEARLESS engineering continued… 
Get aware about Hadoop Ecosystem 
Advantages and Limitations of Hadoop 
Job opportunities in Hadoop 
Conclusion
FEARLESS engineering introduction 
Today We Live in the Data Age. 
Due to Internet of Things (IoT), the speed of ingestion of data is keeps on increasing and increasing. 
So, the World is getting more “Hungrier and Hungrier for Data”
FEARLESS engineering terminologies 
 Cloud Computing 
 BigData 
 Hadoop 
 Distributed Computing 
 Parallel Computing 
 Utility Computing 
 Data Scientist
FEARLESS engineering Letz start the journey…!
FEARLESS engineering Cloud computing 
“Computing may someday be organized as a public utility, just as the telephone system is organized as a public utility” - John McCarthy, 1961 
The word “Cloud” is first time used in a technical perspective by HP and Compaq people. 
Cloud Computing is a Utility Computing that involves a large number of computers connected through a communication network such as the Internet, provides services based-on-demand.
FEARLESS engineering utility computing 
Utility computing is the packaging of computing resources, such as computation, storage and services, as a metered service. 
This model has the advantage of a low or no initial cost to acquire computer resources;
FEARLESS engineering Distributed computing 
Distributed computing refers to the use of distributed systems to solve computational problems. 
Here a problem will be divided into many no. of small tasks, each of which is solved by one or more computers, which communicate with each other by passing messages. 
Parallel Computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently.
FEARLESS engineering Why bigdata deserves our attention? 
 Everyday we create 2.5 quintillions bytes of data, 90% this data is unstructured. 
90% of the data in the World today has been created in the last two years alone. 
 By the end of 2015, CISCO estimates that global Internet traffic will reach 4.8 Zettabytes a year. 
 BigData would create 4.4. million jobs by 2015. 
There is a shortage of 140,000-190,000 BigData professionals in the United States alone…!
FEARLESS engineering what happens in an internet minute…?
FEARLESS engineering what is bigdata…? 
 BigData is the any amount of data that is structured and/or unstructured data which is beyond the storage and processing capabilities of a single physical machine and traditional database techniques. 
 Data that has extra large Volume, comes from Variety of sources, Variety of formats and comes at us with a great Velocity is normally refers to as BigData.
FEARLESS engineering 3 v’s of bigdata
FEARLESS engineering Rise of bigdata adoption
FEARLESS engineering Rise of bigdata adoption 
Data Scientist is the Hottest job of 21st Century…! - Harvard Business Review Magazine 
Positions such as Data Scientist, Data Analytics were doesn’t exist few years ago. 
Today Companies are fighting to recruit these specialists. 
The market is not Growing at the rate it wants to grow:- Because skills shortage is looming, so they increase Salaries up…! 
Data Scientists take huge amounts of data & attempt to pull useful “Business Insights” from that raw data.
FEARLESS engineering Traditional approaches to solve bigdata problems
FEARLESS engineering Traditional approach: Storage area network (san) Application Servers
FEARLESS engineering Characteristics of Storage area network (san) 
SAN can be visualize as, one massive storage that can give us Infinite Storage. 
Moving Data to Computational Nodes. 
It has multiple Application Servers. 
Programs run on each Application Server. 
All the data is stored in one SAN. 
Before Execution, each server Gets the data from SAN. 
After Execution, each server Writes the output to SAN.
FEARLESS engineering Problems with Storage area network (san) 
Huge dependency between networks 
Huge bandwidth demand 
Scaling up and scaling down is not a smooth process 
Partial failures are also difficult to handle 
A lot of processing power is spent on Transferring the Data 
Data Synchronization is required during exchange
FEARLESS engineering moore’s law continued… Moore’s Law: 
“The number of Transistors per silicon chip, that can be placed in a Processor will double approximately every two years, for half the cost.” 
It is named after Gordon Moore, the founder of Intel.
FEARLESS engineering hadoop 
Inspired by Google. 
Google is originated in the year 1998. 
They faced serious challenge in early 2000 to handle the BigData. 
In 2004 Google related two papers: - GFS: Google File System - MapReduce: A Programming Model
FEARLESS engineering what is hadoop…? 
 Apache Hadoop is an open-source software framework, used to manage BigData. 
 Its built and used by a global community of contributors and users. 
It’s not only a tool, it’s a Framework of tools. 
 Moving computation is cheaper than moving data. 
 Most important Hadoop sub-projects: i. HDFS: Hadoop Distributed File System ii. MapReduce: A Programming Model
FEARLESS engineering founders of hadoop
FEARLESS engineering why the name hadoop…? “Hadoop“ is simply the name of a stuffed toy ELEPHANT that belonged to the son of its creator “DOUG CUTTING”.
FEARLESS engineering 
Scalable– New nodes can be added without changing data formats. 
Cost-effective– It parallelly processes huge datasets on large clusters of commodity computers. 
Efficient and Flexible- It is schema-less, and can absorb any type of data, from any number of sources. 
Fault-tolerant and Reliable- It handles failures of nodes easily because od Replication. 
Easy to use- It uses simple Map and Reduce functions to process the data. 
It is developed in Java but it can support Python & others too. 
Characteristics of hadoop
FEARLESS engineering who uses hadoop…?
FEARLESS engineering Hadoop ecosystem
FEARLESS engineering Hadoop core components 
Hadoop core has two major components: 
1.HDFS 
a.Name Node 
b.Secondary Name Node 
c.Data Node 
2.MapReduce Engine 
a.Job Tracker 
b.Task Tracker
FEARLESS engineering Architecture of hadoop
FEARLESS engineering overview of hadoop
FEARLESS engineering Hadoop Distributed File System 
Pioneered by Google File System (GFS) 
It consists of three major components - 
i. Name Node 
•It is responsible for the distribution of the data throughout the Hadoop cluster. 
ii. Secondary Name Node (Backup Node) 
•It regularly contacts Name Node and maintains an up to date snapshots of Name Node's directory information. 
iii. Data Node 
•It responsible to store the chunk of data that is assigned to it by the Name Node.
FEARLESS engineering Mapreduce 
Pioneered by Google, Popularized by Yahoo (Apache). 
It consists of two major components – i. Job Tracker 
•It is responsible for scheduling the task to slave nodes. 
•So it consults the Name Node and assigns the task to the nodes which has the data on which task would be performed. ii. Task Tracker 
•It has the actual logic to perform the task, so it performs Map and Reduce functions on the data assigned to it by Master Node.
FEARLESS engineering Distributed model
FEARLESS engineering Task tracker and data nodes
FEARLESS engineering Master/slave architecture
FEARLESS engineering continued…
FEARLESS engineering continued…
FEARLESS engineering Mapreduce example: wordcount
FEARLESS engineering Advantages of hadoop 
Moving Computation is far better than Moving Data 
Runs on commodity hardware 
It’s a Master/Slave architecture 
It handles all types of node failures by live Heartbeats 
It handles assigning tasks to nodes 
It has Rack awareness between nodes 
So, Programmers only need to concentrate on getting business values from BigData
FEARLESS engineering limitations of hadoop 
Do you think Hadoop is a “Golden_Bullet” that can solve all kinds of problems…? - The answer is NO…!!! 
Not suitable, if data is too small. 
Not suitable, if there is a dependency between the data. 
Not suitable, if Job cannot be divided into small chunks. 
Not suitable, to process real-time and stream-based processing.
FEARLESS engineering Closer home AADHAR – Government of India’s UIDAI project is considered as one of the largest BigData project in the World...! 
FEARLESS engineering Closer home 
Feb 14th 2011 – IBM’s Super Computer “WATSON” built using BigData Technology. 
Its not online & its process like a Human Brain…!
FEARLESS engineering Job opportunities Roles and Profiles: 
Hadoop Administrator Pre-req: Networking, Admin 
Hadoop Developer Pre-req: Programming Expertise Preferably Java/Python 
Data Scientist and Data Analytics Pre-req: Mathematics, Statistical Background Scripting languages like Perl etc.
FEARLESS engineering Conception about bigdata “BigData is like Teen_Age_Sex: Everyone talks about it, nobody really knows how to do it…??? Everyone thinks that everyone else is doing it, so everyone claims they are doing it…!!! But anyone who actually tries this will be Terrible at it...!!! ” -Dan Ariely, Behavioral Economics Guru.
FEARLESS engineering conclusion 
•Big Data brings new and exciting opportunities to companies who utilize the platforms available. 
•In this Information Era, BigData technology has got its own importance for businesses. 
•It has got lot of opportunities in the upcoming days.
FEARLESS engineering references WWW.APCHE.ORG WWW.BIGDATAUNIVERSITY.COM WWW.CLOUDERA.ORG WWW.HORTONWORKS.COM WWW.WIKIPEDIA.ORG WWW.EDUREKHA.COM WWW.EXPLAININGCOMPUTERS.COM WWW.YAHOO.COM WWW.GOOGLE.COM WWW.INFORMATIONWEEK.COM WWW.CS.BERKELEY.EDU WWW.IBM.COM WWW.INTEL.COM WWW.STACKOVERFLOW.COM WWW.TECHTARGET.COM WWW.MICHAEL-NOLL.COM WWW.MEETUP.COM AND MANY MORE…!!!
FEARLESS engineering Any queries…???
FEARLESS engineering thank you one and all For your patience  <3

Mais conteúdo relacionado

Mais procurados

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Simplilearn
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Spark streaming
Spark streamingSpark streaming
Spark streamingWhiteklay
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Data Analytics and Business Intelligence
Data Analytics and Business IntelligenceData Analytics and Business Intelligence
Data Analytics and Business IntelligenceChris Ortega, MBA
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Differencejeetendra mandal
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
Hive partitioning best practices
Hive partitioning  best practicesHive partitioning  best practices
Hive partitioning best practicesNabeel Moidu
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
Big data Presentation
Big data PresentationBig data Presentation
Big data PresentationAswadmehar
 

Mais procurados (20)

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Big data
Big dataBig data
Big data
 
Types of no sql databases
Types of no sql databasesTypes of no sql databases
Types of no sql databases
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Big data
Big dataBig data
Big data
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Data Analytics and Business Intelligence
Data Analytics and Business IntelligenceData Analytics and Business Intelligence
Data Analytics and Business Intelligence
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Difference
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Hive partitioning best practices
Hive partitioning  best practicesHive partitioning  best practices
Hive partitioning best practices
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 

Semelhante a Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData Community

Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopAnusha sweety
 
Big data management
Big data managementBig data management
Big data managementzeba khanam
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop EMC
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)Xavier Constant
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCEAM Publications,India
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceAssignment Help
 

Semelhante a Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData Community (20)

Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big data management
Big data managementBig data management
Big data management
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 

Último

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Último (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData Community

  • 1. FEARLESS engineering Acharya institute of technology, Bengaluru. Grand Welcome to the world of, “bigdata_community”
  • 2. FEARLESS engineering Acharya institute of technology, Bengaluru. Introduction to bigdata and installation of single-node apache hadoop cluster Presented By, Mahantesh C. Angadi Nagarjuna DN Manoj PT Under the Guidance of, Prof. Manjunath tN Prof. Amogh pk Dept. of ISE AIT, Bengaluru Session-1: 24 March 2014
  • 3. FEARLESS engineering Contents  Purpose of This Talk  Introduction  Terminologies  Cloud Computing  BigData  Traditional Approaches to Solve BigData Problems  Hadoop and its Characteristics  Architecture of Hadoop
  • 4. FEARLESS engineering Contents Why Only Hadoop? Advantages of Hadoop Limitations of Hadoop Job Opportunities in BigData & Hadoop Conclusion References
  • 5. FEARLESS engineering Purpose of This talk To understand the terminologies To introduce you to BigData and Hadoop To be able to clearly differentiate between Cloud, BigData, Hadoop Traditional approaches to handle BigData Characteristics of Hadoop Explain how Hadoop works? Get friendly with MapReduce and HDFS
  • 6. FEARLESS engineering continued… Get aware about Hadoop Ecosystem Advantages and Limitations of Hadoop Job opportunities in Hadoop Conclusion
  • 7. FEARLESS engineering introduction Today We Live in the Data Age. Due to Internet of Things (IoT), the speed of ingestion of data is keeps on increasing and increasing. So, the World is getting more “Hungrier and Hungrier for Data”
  • 8. FEARLESS engineering terminologies  Cloud Computing  BigData  Hadoop  Distributed Computing  Parallel Computing  Utility Computing  Data Scientist
  • 9. FEARLESS engineering Letz start the journey…!
  • 10. FEARLESS engineering Cloud computing “Computing may someday be organized as a public utility, just as the telephone system is organized as a public utility” - John McCarthy, 1961 The word “Cloud” is first time used in a technical perspective by HP and Compaq people. Cloud Computing is a Utility Computing that involves a large number of computers connected through a communication network such as the Internet, provides services based-on-demand.
  • 11. FEARLESS engineering utility computing Utility computing is the packaging of computing resources, such as computation, storage and services, as a metered service. This model has the advantage of a low or no initial cost to acquire computer resources;
  • 12. FEARLESS engineering Distributed computing Distributed computing refers to the use of distributed systems to solve computational problems. Here a problem will be divided into many no. of small tasks, each of which is solved by one or more computers, which communicate with each other by passing messages. Parallel Computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently.
  • 13. FEARLESS engineering Why bigdata deserves our attention?  Everyday we create 2.5 quintillions bytes of data, 90% this data is unstructured. 90% of the data in the World today has been created in the last two years alone.  By the end of 2015, CISCO estimates that global Internet traffic will reach 4.8 Zettabytes a year.  BigData would create 4.4. million jobs by 2015. There is a shortage of 140,000-190,000 BigData professionals in the United States alone…!
  • 14. FEARLESS engineering what happens in an internet minute…?
  • 15. FEARLESS engineering what is bigdata…?  BigData is the any amount of data that is structured and/or unstructured data which is beyond the storage and processing capabilities of a single physical machine and traditional database techniques.  Data that has extra large Volume, comes from Variety of sources, Variety of formats and comes at us with a great Velocity is normally refers to as BigData.
  • 16. FEARLESS engineering 3 v’s of bigdata
  • 17. FEARLESS engineering Rise of bigdata adoption
  • 18. FEARLESS engineering Rise of bigdata adoption Data Scientist is the Hottest job of 21st Century…! - Harvard Business Review Magazine Positions such as Data Scientist, Data Analytics were doesn’t exist few years ago. Today Companies are fighting to recruit these specialists. The market is not Growing at the rate it wants to grow:- Because skills shortage is looming, so they increase Salaries up…! Data Scientists take huge amounts of data & attempt to pull useful “Business Insights” from that raw data.
  • 19. FEARLESS engineering Traditional approaches to solve bigdata problems
  • 20. FEARLESS engineering Traditional approach: Storage area network (san) Application Servers
  • 21. FEARLESS engineering Characteristics of Storage area network (san) SAN can be visualize as, one massive storage that can give us Infinite Storage. Moving Data to Computational Nodes. It has multiple Application Servers. Programs run on each Application Server. All the data is stored in one SAN. Before Execution, each server Gets the data from SAN. After Execution, each server Writes the output to SAN.
  • 22. FEARLESS engineering Problems with Storage area network (san) Huge dependency between networks Huge bandwidth demand Scaling up and scaling down is not a smooth process Partial failures are also difficult to handle A lot of processing power is spent on Transferring the Data Data Synchronization is required during exchange
  • 23. FEARLESS engineering moore’s law continued… Moore’s Law: “The number of Transistors per silicon chip, that can be placed in a Processor will double approximately every two years, for half the cost.” It is named after Gordon Moore, the founder of Intel.
  • 24. FEARLESS engineering hadoop Inspired by Google. Google is originated in the year 1998. They faced serious challenge in early 2000 to handle the BigData. In 2004 Google related two papers: - GFS: Google File System - MapReduce: A Programming Model
  • 25. FEARLESS engineering what is hadoop…?  Apache Hadoop is an open-source software framework, used to manage BigData.  Its built and used by a global community of contributors and users. It’s not only a tool, it’s a Framework of tools.  Moving computation is cheaper than moving data.  Most important Hadoop sub-projects: i. HDFS: Hadoop Distributed File System ii. MapReduce: A Programming Model
  • 27. FEARLESS engineering why the name hadoop…? “Hadoop“ is simply the name of a stuffed toy ELEPHANT that belonged to the son of its creator “DOUG CUTTING”.
  • 28. FEARLESS engineering Scalable– New nodes can be added without changing data formats. Cost-effective– It parallelly processes huge datasets on large clusters of commodity computers. Efficient and Flexible- It is schema-less, and can absorb any type of data, from any number of sources. Fault-tolerant and Reliable- It handles failures of nodes easily because od Replication. Easy to use- It uses simple Map and Reduce functions to process the data. It is developed in Java but it can support Python & others too. Characteristics of hadoop
  • 29. FEARLESS engineering who uses hadoop…?
  • 31. FEARLESS engineering Hadoop core components Hadoop core has two major components: 1.HDFS a.Name Node b.Secondary Name Node c.Data Node 2.MapReduce Engine a.Job Tracker b.Task Tracker
  • 34. FEARLESS engineering Hadoop Distributed File System Pioneered by Google File System (GFS) It consists of three major components - i. Name Node •It is responsible for the distribution of the data throughout the Hadoop cluster. ii. Secondary Name Node (Backup Node) •It regularly contacts Name Node and maintains an up to date snapshots of Name Node's directory information. iii. Data Node •It responsible to store the chunk of data that is assigned to it by the Name Node.
  • 35. FEARLESS engineering Mapreduce Pioneered by Google, Popularized by Yahoo (Apache). It consists of two major components – i. Job Tracker •It is responsible for scheduling the task to slave nodes. •So it consults the Name Node and assigns the task to the nodes which has the data on which task would be performed. ii. Task Tracker •It has the actual logic to perform the task, so it performs Map and Reduce functions on the data assigned to it by Master Node.
  • 37. FEARLESS engineering Task tracker and data nodes
  • 41. FEARLESS engineering Mapreduce example: wordcount
  • 42. FEARLESS engineering Advantages of hadoop Moving Computation is far better than Moving Data Runs on commodity hardware It’s a Master/Slave architecture It handles all types of node failures by live Heartbeats It handles assigning tasks to nodes It has Rack awareness between nodes So, Programmers only need to concentrate on getting business values from BigData
  • 43. FEARLESS engineering limitations of hadoop Do you think Hadoop is a “Golden_Bullet” that can solve all kinds of problems…? - The answer is NO…!!! Not suitable, if data is too small. Not suitable, if there is a dependency between the data. Not suitable, if Job cannot be divided into small chunks. Not suitable, to process real-time and stream-based processing.
  • 44. FEARLESS engineering Closer home AADHAR – Government of India’s UIDAI project is considered as one of the largest BigData project in the World...! 
  • 45. FEARLESS engineering Closer home Feb 14th 2011 – IBM’s Super Computer “WATSON” built using BigData Technology. Its not online & its process like a Human Brain…!
  • 46. FEARLESS engineering Job opportunities Roles and Profiles: Hadoop Administrator Pre-req: Networking, Admin Hadoop Developer Pre-req: Programming Expertise Preferably Java/Python Data Scientist and Data Analytics Pre-req: Mathematics, Statistical Background Scripting languages like Perl etc.
  • 47. FEARLESS engineering Conception about bigdata “BigData is like Teen_Age_Sex: Everyone talks about it, nobody really knows how to do it…??? Everyone thinks that everyone else is doing it, so everyone claims they are doing it…!!! But anyone who actually tries this will be Terrible at it...!!! ” -Dan Ariely, Behavioral Economics Guru.
  • 48. FEARLESS engineering conclusion •Big Data brings new and exciting opportunities to companies who utilize the platforms available. •In this Information Era, BigData technology has got its own importance for businesses. •It has got lot of opportunities in the upcoming days.
  • 49. FEARLESS engineering references WWW.APCHE.ORG WWW.BIGDATAUNIVERSITY.COM WWW.CLOUDERA.ORG WWW.HORTONWORKS.COM WWW.WIKIPEDIA.ORG WWW.EDUREKHA.COM WWW.EXPLAININGCOMPUTERS.COM WWW.YAHOO.COM WWW.GOOGLE.COM WWW.INFORMATIONWEEK.COM WWW.CS.BERKELEY.EDU WWW.IBM.COM WWW.INTEL.COM WWW.STACKOVERFLOW.COM WWW.TECHTARGET.COM WWW.MICHAEL-NOLL.COM WWW.MEETUP.COM AND MANY MORE…!!!
  • 50. FEARLESS engineering Any queries…???
  • 51. FEARLESS engineering thank you one and all For your patience  <3