O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Unit 3 intro.pptx

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Getting started big data
Getting started big data
Carregando em…3
×

Confira estes a seguir

1 de 6 Anúncio

Mais Conteúdo rRelacionado

Semelhante a Unit 3 intro.pptx (20)

Mais recentes (20)

Anúncio

Unit 3 intro.pptx

  1. 1. HADOOP • Using the solution provided by Google, Doug Cutting and his team developed an Open Source Projectcalled HADOOP. • Hadoop runs applications using the MapReduce algorithm, where the data is processed inparallel withothers. • In short, Hadoop is used to develop applications that could perform complete statistical analysisonhugeamountsof data.
  2. 2. HADOOP • Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. • The Hadoop framework application works in an environment that provides distributed storageandcomputationacross clusters of computers. • Hadoop is designed to scale up from single server to thousands of machines, each offering localcomputation andstorage.
  3. 3. History of Hadoop
  4. 4. History of Hadoop • Hadoopisanopen-sourcesoftwareframeworkforstoringand processinglargedatasets ranginginsizefromgigabytestopetabytes • HadoopwasdevelopedattheApacheSoftwareFoundation. • In2008,Hadoopdefeatedthesupercomputersandbecamethefastest systemontheplanetforsortingterabytesofdata • TherearebasicallytwocomponentsinHadoop: 1.HadoopDistributedFileSystem(HDFS) -Itallowsyoutostoredataofvariousformatsacrossacluster 2.Yarn -ForresourcemanagementinHadoop. -Itallowsparallelprocessingoverthedatathat isstoredacross HDFS
  5. 5. Basics of Hadoop • Hadoopisanopen sourcesoftware frameworkfor storingdataandrunningapplicationsoncluster ofcommodityhardware • Itprovidesmassivestorage foranykindofdata,enormous processing power andtheabilityto handlevirtually limitless concurrenttasksorjobs • Adataresidinginalocalfilesystem ofapersonal computersystem, inHadoop,dataresides ina distributedfilesystem whichiscalled asa HadoopDistributedFileSystem-HDFS • TheprocessingmodelisbasedonDataLocality’conceptwhereincomputationallogicissent to clusternodes(server) containingdata • Thiscomputationallogicisnothing,butacompiledversion of aprogramwritten inahigh-level languagesuchasJava. • Suchaprogram,processes datastored inHadoopHDFS
  6. 6. Advantages and Disadvantages of Hadoop • Varied Data Source • Cost-effective • Performance • Fault-Tolerant • Highly available • Low Network Traffic • High throughput • Open source • Scalable • Ease of use • Compatibility • Multiple Language supported • Issue with small file • Vulnerable by Nature • Processing Overhead • Supports on Batch processing • Iterative Processing • Security Advantages Disadvantages

×