SlideShare uma empresa Scribd logo
1 de 14
 Introduction to Distributed Programming
› Background of Hadoop
› What is Hadoop ?
› How Hadoop works ?
 Installing Hadoop
› Setting up SSH
› Setting up Environment Variables
› Running Hadoop
› Web-Based Cluster
 Components of Hadoop
› Working with Hadoop File-System
› Understanding Hadoop Map-Reduce
› Reading and Writing
 Writing Basic Map Reduce Program
› Getting the Patent Data Set
› Constructing Basic Map-Reduce Program
› Working with Hadoop Streaming
› Improving Performance with Combiners
 Advanced MapReduce
› Summarization Patterns
› Filtering Patterns
› Data Organization Patterns
› Join Patterns
› Meta Patterns
› Input and Output Patterns
 Programming Practices
› Developing Map-Reduce Programs
› Monitoring and Debugging on a cluster
› Tuning for performance
 Hadoop Cookbook
› Passing Job-Specific Parameters to your tasks
› Probing for Task-Specific Parameters
› Partitioning into multiple output files
› Inputting from and output to database
› Keeping Output in Sorted Order
 Managing Hadoop
› Checking System’s Health
› Setting permissions
› Managing Quotas , Enabling Trash ,
Adding/Deleting Nodes, Recovering from a
failed NameNode
 Running Hadoop in the Cloud
› Introducing Amazon Web Services
› Setting up AWS and Setting up cloud on EC2
› Running Map-Reduce Programs on EC2
› Cleaning up and Shutting down your EC2
instances.
› Amazon Elastic Map-Reduce and other AWS
Services
 Programming with Pig
› Thinking like a pig
› Installing Pig
› Running Pig
› Learning Pig Latin through Grunt
› Pig Latin Syntax
› Working with UDF
› Working with Scripts
 Getting Started on Hive
 Data Types and File Formats
 HiveQL – Data Definition
 HiveQL - Data Manipulation
 HiveQL – Queries, Views and Indexes
 Schema Design , Tuning & Record
Formats
 Hive Integration with Oozie
 Hive and Amazon Web Services
 NoSQL Database
› Why No SQL ?
› Aggregate Data Models
› Distribution Models
› Consistency
 No SQL DBs
› Key-Value DataBases
› Document Databases
› Column Family Stores
› Graph Databases
 MongoDB
› Introduction
› MongoDB through JavaScript Shell
› Writing Programs using MongoDB
› Document Oriented Data
› Queries and Aggregation
› Updates, Atomic Operations and Deletes
› Indexing, Replication and Sharding
 Mahout – Machine Learning
› Introduction
› Recommenders
 Representing Recommender Data
 Making Recommendations
› Clustering
 Clustering Algorithms in Mahout
› Classification
 Training a Classifier
 Evaluating and Tuning a Classifier
 Moving Data in and out of Hadoop
› Flume
› Oozie
› Sqoop
› Hbase
 Data Serialization Formats
› XML, JSON
› SequenceFiles, Protocol Buffers, Thrift and
Avro
 Utilizing Data Structures and Algorithms
› Modelling Data & Solving Problems with
Graphs
› Parallelized Bloom Filter Creation in Map-
Reduce
 Programming Pipelines with Pig
› Using Pig to find malicious actors in log data.
› Optimizing user workflow with Pig.
 Crunch
 Cascading
 Puppet
 Unit Testing Map-Reduce
 Heavyweight Job Testing using
LocalJobRunner
 Debugging User-Space Problems

Mais conteúdo relacionado

Mais procurados

Introduction to apache spark
Introduction to apache sparkIntroduction to apache spark
Introduction to apache spark
UserReport
 

Mais procurados (19)

Hadoop
HadoopHadoop
Hadoop
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Big data
Big dataBig data
Big data
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Introduction to apache spark
Introduction to apache sparkIntroduction to apache spark
Introduction to apache spark
 
Nextag talk
Nextag talkNextag talk
Nextag talk
 
Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
 
Basic Hadoop Architecture V1 vs V2
Basic  Hadoop Architecture  V1 vs V2Basic  Hadoop Architecture  V1 vs V2
Basic Hadoop Architecture V1 vs V2
 
Geek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaGeek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and Scala
 
R and-hadoop
R and-hadoopR and-hadoop
R and-hadoop
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Intro to Spark
Intro to SparkIntro to Spark
Intro to Spark
 

Semelhante a Hadoop course curriculm

Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
Hadoop course contents latest
Hadoop course contents latestHadoop course contents latest
Hadoop course contents latest
sandsys technologies
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
Cloudera, Inc.
 
Haoop ppt
Haoop pptHaoop ppt
Haoop ppt
orsenit
 

Semelhante a Hadoop course curriculm (20)

Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainings
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Hadoop Training in Hyderabad
Hadoop Training in HyderabadHadoop Training in Hyderabad
Hadoop Training in Hyderabad
 
Hadoop Training in Hyderabad
Hadoop Training in HyderabadHadoop Training in Hyderabad
Hadoop Training in Hyderabad
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Couch db
Couch dbCouch db
Couch db
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Hadoop course contents latest
Hadoop course contents latestHadoop course contents latest
Hadoop course contents latest
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Hadoop 80hr v1.0
Hadoop 80hr v1.0Hadoop 80hr v1.0
Hadoop 80hr v1.0
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Haoop ppt
Haoop pptHaoop ppt
Haoop ppt
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 

Último (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

Hadoop course curriculm

  • 1.
  • 2.  Introduction to Distributed Programming › Background of Hadoop › What is Hadoop ? › How Hadoop works ?  Installing Hadoop › Setting up SSH › Setting up Environment Variables › Running Hadoop › Web-Based Cluster
  • 3.  Components of Hadoop › Working with Hadoop File-System › Understanding Hadoop Map-Reduce › Reading and Writing  Writing Basic Map Reduce Program › Getting the Patent Data Set › Constructing Basic Map-Reduce Program › Working with Hadoop Streaming › Improving Performance with Combiners
  • 4.  Advanced MapReduce › Summarization Patterns › Filtering Patterns › Data Organization Patterns › Join Patterns › Meta Patterns › Input and Output Patterns  Programming Practices › Developing Map-Reduce Programs › Monitoring and Debugging on a cluster › Tuning for performance
  • 5.  Hadoop Cookbook › Passing Job-Specific Parameters to your tasks › Probing for Task-Specific Parameters › Partitioning into multiple output files › Inputting from and output to database › Keeping Output in Sorted Order  Managing Hadoop › Checking System’s Health › Setting permissions › Managing Quotas , Enabling Trash , Adding/Deleting Nodes, Recovering from a failed NameNode
  • 6.  Running Hadoop in the Cloud › Introducing Amazon Web Services › Setting up AWS and Setting up cloud on EC2 › Running Map-Reduce Programs on EC2 › Cleaning up and Shutting down your EC2 instances. › Amazon Elastic Map-Reduce and other AWS Services
  • 7.  Programming with Pig › Thinking like a pig › Installing Pig › Running Pig › Learning Pig Latin through Grunt › Pig Latin Syntax › Working with UDF › Working with Scripts
  • 8.  Getting Started on Hive  Data Types and File Formats  HiveQL – Data Definition  HiveQL - Data Manipulation  HiveQL – Queries, Views and Indexes  Schema Design , Tuning & Record Formats  Hive Integration with Oozie  Hive and Amazon Web Services
  • 9.  NoSQL Database › Why No SQL ? › Aggregate Data Models › Distribution Models › Consistency  No SQL DBs › Key-Value DataBases › Document Databases › Column Family Stores › Graph Databases
  • 10.  MongoDB › Introduction › MongoDB through JavaScript Shell › Writing Programs using MongoDB › Document Oriented Data › Queries and Aggregation › Updates, Atomic Operations and Deletes › Indexing, Replication and Sharding
  • 11.  Mahout – Machine Learning › Introduction › Recommenders  Representing Recommender Data  Making Recommendations › Clustering  Clustering Algorithms in Mahout › Classification  Training a Classifier  Evaluating and Tuning a Classifier
  • 12.  Moving Data in and out of Hadoop › Flume › Oozie › Sqoop › Hbase  Data Serialization Formats › XML, JSON › SequenceFiles, Protocol Buffers, Thrift and Avro
  • 13.  Utilizing Data Structures and Algorithms › Modelling Data & Solving Problems with Graphs › Parallelized Bloom Filter Creation in Map- Reduce  Programming Pipelines with Pig › Using Pig to find malicious actors in log data. › Optimizing user workflow with Pig.
  • 14.  Crunch  Cascading  Puppet  Unit Testing Map-Reduce  Heavyweight Job Testing using LocalJobRunner  Debugging User-Space Problems