Many enterprises, in the modern IT industry, are forced to process and distribute large amounts of data on a regular basis. Basic database management systems and tools become ineffective in dealing with processing and storing such large amounts of data. Knowledge and expertise in dealing with Big Data management applications has become a necessity within the IT industry. CloudAce offers two separate Big Data Training programs that focus around Apache’s Hadoop platform,
This is a 30 hours instructor lead developer training course delivers the key concepts and expertise necessary to create robust data processing applications using Apache Hadoop.Through lecture and interactive, hands-on exercises, attendees will learn Hadoop and its ecosystem components.
The training is desgined with a vendor neutral approach .However upon completion of the course, attendees can clear Hadoop developer certification from Cloudera or from HortonWorks. Certification is a great differentiator; it helps establish individuals as leaders in their field, providing customers with tangible evidence of skills and expertise.
About Our Trainers
By participating in our Big Data Training programs, you will be placed under the guidance of a certified cloud computing professional that has worked with us as a Technical Lead for over 9 years, dealing extensively with Big Data analytics, development, and implementation. Our trainer holds the Hadoop developer and Hadoop administrator certifications, also boasting a wealth of teaching experience. Our trainer also has intensive hands on experience in the implementation of algorithms like decision trees, support vector machines, random forest, naïve bayees, neural networks, genetic algorithm, conjoint analysis, principal component analysis, etc.
Hadoop Developer Training
Our Hadoop Developer Big Data Training program consists of a total of 14 modules that detail the platform’s functionalities, advantages and drawbacks. Participants will benefit from an in-depth understanding of the Apache Hadoop platform and will come to learn how to program and tune the program to perform relevant analytics. Participants will learn how to setup Hadoop clusters and also be introduced to common and advanced algorithms and programs. The program also covers the various components of Hadoop Ecosystem extensively.
The duration of the program is 30 hours, completed over the course of 4 days
The Hadoop Developer training program will be conducted in a classroom.
The fee for the Hadoop Developer tutorial is 24,000 INR, exclusive of service taxes.
Upon completion of this training, successful participants will receive the certification of “Hadoop Developer”.
1. CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial
Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. Contact No. +91 9000798810, Email:
trainings@cloudace.in, www.cloudace.in
Hadoop Developer
Course: Hadoop Developer
Duration: 4 Days of Training
Many enterprises, in the modern IT industry, are forced to process and distribute large amounts of
data on a regular basis. Basic database management systems and tools become ineffective in dealing
with processing and storing such large amounts of data. Knowledge and expertise in dealing with Big
Data management applications has become a necessity within the IT industry. CloudAce offers two
separate Big Data Training programs that focus around Apache’s Hadoop platform,
This is a 30 hours instructor lead developer training course delivers the key concepts and expertise
necessary to create robust data processing applications using Apache Hadoop.Through lecture and
interactive, hands-on exercises, attendees will learn Hadoop and its ecosystem components.
The training is desgined with a vendor neutral approach .However upon completion of the course,
attendees can clear Hadoop developer certification from Cloudera or from HortonWorks.
Certification is a great differentiator; it helps establish individuals as leaders in their field, providing
customers with tangible evidence of skills and expertise.
About Our Trainers
By participating in our Big Data Training programs, you will be placed under the guidance of a
certified cloud computing professional that has worked with us as a Technical Lead for over 9 years,
dealing extensively with Big Data analytics, development, and implementation. Our trainer holds the
Hadoop developer and Hadoop administrator certifications, also boasting a wealth of teaching
experience. Our trainer also has intensive hands on experience in the implementation of algorithms
like decision trees, support vector machines, random forest, naïve bayees, neural networks, genetic
algorithm, conjoint analysis, principal component analysis, etc.
Hadoop Developer Training
Our Hadoop Developer Big Data Training program consists of a total of 14 modules that detail the
platform’s functionalities, advantages and drawbacks. Participants will benefit from an in-depth
understanding of the Apache Hadoop platform and will come to learn how to program and tune the
program to perform relevant analytics. Participants will learn how to setup Hadoop clusters and also
be introduced to common and advanced algorithms and programs. The program also covers the
various components of Hadoop Ecosystem extensively.
2. CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial
Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. Contact No. +91 9000798810, Email:
trainings@cloudace.in, www.cloudace.in
The duration of the program is 30 hours, completed over the course of 4 days
The Hadoop Developer training program will be conducted in a classroom.
The fee for the Hadoop Developer tutorial is 24,000 INR, exclusive of service taxes.
Upon completion of this training, successful participants will receive the certification of
“Hadoop Developer”.
The agenda for the course is outlined below
• Module 1 : Big Data – An Overview
o What is Cloud Computing
o What is Grid Computing
o What is Virtualization
o How above three are inter-related to each other
o What is Big Data
o Introduction to Analytics and the need for big data analytics
o Hadoop Solutions - Big Picture
o Hadoop distributions
o Comparing Hadoop Vs. Traditional systems
o Volunteer Computing
o Data Retrieval - Radom Access Vs. Sequential Access
o NoSQL Databases
• Module 2 : The Motivation of Hadoop
o Problems with traditional large-scale systems
o Requirements for a new approach
• Module 3 : Hadoop Basic Concepts
o What is Hadoop?
o The Hadoop Distributed File System
o How MapReduce Works
o Anatomy of a Hadoop Cluster
• Module 4 : Hadoop Demons
o Namenode
o Datanode
o Secondary namenode
o Job tracker
3. CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial
Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. Contact No. +91 9000798810, Email:
trainings@cloudace.in, www.cloudace.in
o Task tracker
• Module 5 : Hadoop in Detail
o Blocks and Splits
o Replication
o Data high availability
o Data Integrity
o Cluster architecture and block placement
• Module 6 : Programming Practices and Performance Tuning
o Developing MapReduce Programs in
Local Mode
Pseudo-distributed Mode
Fully distributed mode
• Module 7 : Writing a MapReduce Program
o Examining a Sample MapReduce Program
o Basic API Concepts
o The Driver Code
o The Mapper
o The Reducer
o Hadoop's Streaming API
• Module 8 : Setup Hadoop Cluster
o Install and configure Apache Hadoop
o Make a fully distributed Hadoop cluster on a single laptop/desktop
o Install and configure Cloudera Hadoop distribution in fully distributed mode
o Install and configure Horton Works Hadoop distribution in fully distributed mode
o Monitoring the cluster
o Getting used to management console of Cloudera and Horton Works
• Module 9 : Delving Deeper Into the Hadoop API
o Using Combiners
o The configure and close Methods
o SequenceFiles
o Partitioners
o Counters
4. CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial
Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. Contact No. +91 9000798810, Email:
trainings@cloudace.in, www.cloudace.in
o Directly Accessing HDFS
o ToolRunner
o Using The Distributed Cache
• Module 10 : Common MapReduce Algorithms
o Sorting and Searching
o Indexing
o Classification/Machine Learning
o Term Frequency - Inverse Document Frequency
o Word Co-Occurrence
o Hands-On Exercise: Creating an Inverted Index
• Module 11 : Debugging MapReduce Programs
o Testing with MRUnit
o Logging
o Other Debugging Strategies
• Module 12 : Advanced MapReduce Programming
o A Recap of the MapReduce Flow
o Custom Writables and WritableComparables
o The Secondary Sort
o Creating InputFormats and OutputFormats
o Pipelining Jobs With Oozie
o Map-Side Joins
o Reduce-Side Joins
• Module 13 : Monitoring and Debugging on Production Cluster
o Counters
o Skipping Bad Records
o Rerunning Failed tasks with Isolation Runner
• Module 14 : Tuning For Performance
o Reducing network traffic with combiner
o Reducing the amount of input data
o Using Compression
o Reusing the JVM
o Running with speculative execution
o Refactoring code and rewriting algorithms Parameters affecting Performance
5. CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial
Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. Contact No. +91 9000798810, Email:
trainings@cloudace.in, www.cloudace.in
o Other Performance Aspects
Hadoop Ecosystem covered as part of Hadoop Developer
• Eco system component: HBase
o Hbase concepts
o Install and configure hbase on cluster
o Create database, Develop and run sample applications
• Eco system component:ZooKeeper
o ZooKeeper concepts
o Install and configure ZooKeeper
o Use ZooKeeper for cluster maintenance
• Eco system component: Hive
o Hive concepts
o Install and configure hive on cluster
o Create database, access it console
o Develop and run sample applications in Java/Python to access hive
• Eco system component: Sqoop
o Install and configure sqoop on cluster
o Import data from Oracle/Mysql to hive
• Eco system component: PIG
o Install and configure PIG
o Write sample Pig Latin scripts
• Eco system component: Flume and Chukwa
o Flume and Chukwa concepts
o Install and configure flume on cluster
o Create a sample application to capture logs from Apache using flume
• Overview of other Eco system component:
o Oozie, Avro, Thrift, Rest, Mahout, Cassandra, YARN, MR2 etc.
• Analytics Basics
o Analytics and big data analytics
o Commonly used analytics algorithms
6. CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial
Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. Contact No. +91 9000798810, Email:
trainings@cloudace.in, www.cloudace.in
o Analytics tools like R and Weka
o Mahout
Training Duration - 4 Days classroom Training
Course Fee - 24,000 INR + Service Taxes per Participant ( excludes Exam Fees)
For further information please email us at trainings@cloudace.in or call Mr. Rohit @
9000798810