SlideShare uma empresa Scribd logo
1 de 18
BIG DATA
The following topics will be covered in our
BIG DATA
Online Training:
Copyright @ 2015 Learntek. All Rights Reserved. 2
What is Hadoop?
Big Data Hadoop Training: Hadoop is a free, Java -based programming
framework that supports the processing of large data sets in a distributed
computing environment. It is part of the Apache project sponsored by the
Apache Software Foundation. Hadoop makes it possible to run applications on
systems with thousands of nodes involving thousands of terabytes of storage
capacity. Its distributed file system facilitates rapid data transfer rates among
nodes and allows the system to continue operating uninterrupted in case of a
node failure. This approach lowers the risk of catastrophic system failure, even
if a significant number of nodes become inoperative.
Copyright @ 2015 Learntek. All Rights Reserved.
Why Hadoop?
• Large Volumes of Data: Ability to store and process huge amounts of variety (structure,
unstructured and semi structured) of data, quickly. With data volumes and varieties
constantly increasing, especially from social media and the Internet of Things (IoT), that’s a
key consideration.
• Computing Power: Hadoop’s distributed computing model processes big data fast. The more
computing nodes you use, the more processing power you have.
• Fault Tolerance: Data and application processing are protected against hardware failure. If a
node goes down, jobs are automatically redirected to other nodes to make sure the
distributed computing does not fail. Multiple copies of all data are stored automatically.
• Flexibility: Unlike traditional relational database, you don’t have to process data before
storing it, You can store as much data as you want and decide how to use it later. That
includes unstructured data like text, images and videos etc.
• Low Cost: The open-source framework is free and used commodity hardware to store large
quantities of data.
• Scalability: You can easily grow your system to handle more data simply by adding nodes.
Little administration is required.
Copyright @ 2015 Learntek. All Rights Reserved. 4
Big Data Hadoop Training: Hadoop Introduction
• Big Data Hadoop Training:
Introduction to Data and System
• Types of Data
• Traditional way of dealing large
data and its problems
• Types of Systems & Scaling
• What is Big Data
• Challenges in Big Data
• Challenges in Traditional
Application
• New Requirements
• What is Hadoop? Why Hadoop?
• Brief history of Hadoop
• Features of Hadoop
• Hadoop and RDBMS
• Hadoop Ecosystem’s overview
Copyright @ 2015 Learntek. All Rights Reserved. 5
Hadoop Installation
• Installation in detail
• Creating Ubuntu image in
VMwareDownloading Hadoop
• Installing SSH
• Configuring Hadoop, HDFS &
MapReduce
• Download, Installation &
Configuration Hive
• Download, Installation &
Configuration Pig
• Download, Installation &
Configuration Sqoop
• Download, Installation &
Configuration Hive
• Configuring Hadoop in Different
Modes
Copyright @ 2015 Learntek. All Rights Reserved. 6
Hadoop Distribute File System (HDFS)
Copyright @ 2015 Learntek. All Rights Reserved. 7
• File System – Concepts
• Blocks
• Replication Factor
• Version File
• Safe mode
• Namespace IDs
• Purpose of Name Node
• Purpose of Data Node
• Purpose of Secondary Name
Node
• Purpose of Job Tracker
• Purpose of Task Tracker
• HDFS Shell Commands –
copy, delete, create
directories etc.
• Reading and Writing in HDFS
• Difference of Unix
Commands and HDFS
commands
• Hadoop Admin Commands
• Hands on exercise with Unix
and HDFS commands
• Read / Write in HDFS –
Internal Process between
Client, NameNode &
DataNodes.
• Accessing HDFS using Java
API
• Various Ways of Accessing
HDFS
• Understanding HDFS Java
classes and methods
• Admin: 1. Commissioning /
DeCommissioning DataNode
• Balancer
• Replication Policy
• Network Distance / Topology
Script
Map Reduce Programming
• About MapReduce
• Understanding block and
input splits
• MapReduce Data types
• Understanding Writable
• Data Flow in MapReduce
Application
• Understanding MapReduce
problem on datasets
• MapReduce and Functional
Programming
• Writing MapReduce
Application
• Understanding Mapper
function
• Understanding Reducer
Function
• Understanding Driver
• Usage of Combiner
• Understanding Partitioner
• Usage of Distributed Cache
• Passing the parameters to
mapper and reducer
• Analysing the Results
• Log files
• Input Formats and Output
Formats
• Counters, Skipping Bad and
unwanted Records
• Writing Join’s in MapReduce
with 2 Input files. Join Types.
• Execute MapReduce Job –
Insights.
• Exercise’s on MapReduce.
• Job Scheduling: Type of
Schedulers.
Copyright @ 2015 Learntek. All Rights Reserved. 8
Hive
• Hive concepts
• Schema on Read VS Schema on
Write
• Hive architecture
• Install and configure hive on
cluster
• Meta Store – Purpose & Type of
Configurations
• Different type of tables in Hive
• Buckets
• Partitions
• Joins in hive
• Hive Query Language
• Hive Data Types
• Data Loading into Hive Tables
• Hive Query Execution
• Hive library functions
• Hive UDF
• Hive Limitations
Copyright @ 2015 Learntek. All Rights Reserved. 9
Pig
• Pig basics
• Install and configure PIG on a cluster
• PIG Library functions
• Pig Vs Hive
• Write sample Pig Latin scripts
• Modes of running PIG
• Running in Grunt shell
• Running as Java program
• PIG UDFs
Copyright @ 2015 Learntek. All Rights Reserved. 10
HBase
• HBase concepts
• HBase architecture
• Region server architecture
• File storage architecture
• HBase basics
• Column access
• Scans
• HBase use cases
• Install and configure HBase on a
multi node cluster
• Create database, Develop and
run sample applications
• Access data stored in HBase
using Java API
Copyright @ 2015 Learntek. All Rights Reserved. 11
Sqoop
• Install and configure Sqoop on cluster
• Connecting to RDBMS
• Installing Mysql
• Import data from Mysql to hive
• Export data to Mysql
• Internal mechanism of import/export
Copyright @ 2015 Learntek. All Rights Reserved. 12
Oozie
• Introduction to OOZIE
• Oozie architecture
• XML file specifications
• Specifying Work flow
• Control nodes
• Oozie job coordinator
Copyright @ 2015 Learntek. All Rights Reserved. 13
Flume
• Introduction to Flume
• Configuration and Setup
• Flume Sink with example
• Channel
• Flume Source with example
• Complex flume architecture
Copyright @ 2015 Learntek. All Rights Reserved. 14
ZooKeeper
• Introduction to ZooKeeper
• Challenges in distributed Applications
• Coordination
• ZooKeeper : Design Goals
• Data Model and Hierarchical namespace
• Cilent APIs
Copyright @ 2015 Learntek. All Rights Reserved. 15
YARN
• Hadoop 1.0 Limitations
• MapReduce Limitations
• History of Hadoop 2.0
• HDFS 2: Architecture
• HDFS 2: Quorum based storage
• HDFS 2: High availability
• HDFS 2: Federation
• YARN Architecture
• Classic vs YARN
• YARN Apps
• YARN multitenancy
• YARN Capacity Scheduler
Copyright @ 2015 Learntek. All Rights Reserved. 16
Prerequisites :
• Knowledge in any programming language, Database knowledge and
Linux Operating system. Core Java or Python knowledge helpful.
Copyright @ 2015 Learntek. All Rights Reserved. 17
Copyright @ 2015 Learntek. All Rights Reserved. 18

Mais conteúdo relacionado

Mais procurados

Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environmentsDataWorks Summit
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructuredatastack
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with SupersetDataWorks Summit
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationshadooparchbook
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialhadooparchbook
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 

Mais procurados (18)

Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Big data course
Big data  courseBig data  course
Big data course
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 

Semelhante a Big data - Online Training

Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete informationbhargavi804095
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightNilesh Gule
 

Semelhante a Big data - Online Training (20)

Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsight
 

Mais de Learntek1

Aws sys ops administrator
Aws sys ops administratorAws sys ops administrator
Aws sys ops administratorLearntek1
 
Angular js Online Training
Angular js Online TrainingAngular js Online Training
Angular js Online TrainingLearntek1
 
Selenium Online Training
Selenium  Online TrainingSelenium  Online Training
Selenium Online TrainingLearntek1
 
React js Online Training
React js Online TrainingReact js Online Training
React js Online TrainingLearntek1
 
Machine learning using spark Online Training
Machine learning using spark Online TrainingMachine learning using spark Online Training
Machine learning using spark Online TrainingLearntek1
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online TrainingLearntek1
 
Scala & Spark Online Training
Scala & Spark Online TrainingScala & Spark Online Training
Scala & Spark Online TrainingLearntek1
 

Mais de Learntek1 (7)

Aws sys ops administrator
Aws sys ops administratorAws sys ops administrator
Aws sys ops administrator
 
Angular js Online Training
Angular js Online TrainingAngular js Online Training
Angular js Online Training
 
Selenium Online Training
Selenium  Online TrainingSelenium  Online Training
Selenium Online Training
 
React js Online Training
React js Online TrainingReact js Online Training
React js Online Training
 
Machine learning using spark Online Training
Machine learning using spark Online TrainingMachine learning using spark Online Training
Machine learning using spark Online Training
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online Training
 
Scala & Spark Online Training
Scala & Spark Online TrainingScala & Spark Online Training
Scala & Spark Online Training
 

Último

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 

Último (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 

Big data - Online Training

  • 2. The following topics will be covered in our BIG DATA Online Training: Copyright @ 2015 Learntek. All Rights Reserved. 2
  • 3. What is Hadoop? Big Data Hadoop Training: Hadoop is a free, Java -based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes of storage capacity. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative. Copyright @ 2015 Learntek. All Rights Reserved.
  • 4. Why Hadoop? • Large Volumes of Data: Ability to store and process huge amounts of variety (structure, unstructured and semi structured) of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that’s a key consideration. • Computing Power: Hadoop’s distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have. • Fault Tolerance: Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically. • Flexibility: Unlike traditional relational database, you don’t have to process data before storing it, You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos etc. • Low Cost: The open-source framework is free and used commodity hardware to store large quantities of data. • Scalability: You can easily grow your system to handle more data simply by adding nodes. Little administration is required. Copyright @ 2015 Learntek. All Rights Reserved. 4
  • 5. Big Data Hadoop Training: Hadoop Introduction • Big Data Hadoop Training: Introduction to Data and System • Types of Data • Traditional way of dealing large data and its problems • Types of Systems & Scaling • What is Big Data • Challenges in Big Data • Challenges in Traditional Application • New Requirements • What is Hadoop? Why Hadoop? • Brief history of Hadoop • Features of Hadoop • Hadoop and RDBMS • Hadoop Ecosystem’s overview Copyright @ 2015 Learntek. All Rights Reserved. 5
  • 6. Hadoop Installation • Installation in detail • Creating Ubuntu image in VMwareDownloading Hadoop • Installing SSH • Configuring Hadoop, HDFS & MapReduce • Download, Installation & Configuration Hive • Download, Installation & Configuration Pig • Download, Installation & Configuration Sqoop • Download, Installation & Configuration Hive • Configuring Hadoop in Different Modes Copyright @ 2015 Learntek. All Rights Reserved. 6
  • 7. Hadoop Distribute File System (HDFS) Copyright @ 2015 Learntek. All Rights Reserved. 7 • File System – Concepts • Blocks • Replication Factor • Version File • Safe mode • Namespace IDs • Purpose of Name Node • Purpose of Data Node • Purpose of Secondary Name Node • Purpose of Job Tracker • Purpose of Task Tracker • HDFS Shell Commands – copy, delete, create directories etc. • Reading and Writing in HDFS • Difference of Unix Commands and HDFS commands • Hadoop Admin Commands • Hands on exercise with Unix and HDFS commands • Read / Write in HDFS – Internal Process between Client, NameNode & DataNodes. • Accessing HDFS using Java API • Various Ways of Accessing HDFS • Understanding HDFS Java classes and methods • Admin: 1. Commissioning / DeCommissioning DataNode • Balancer • Replication Policy • Network Distance / Topology Script
  • 8. Map Reduce Programming • About MapReduce • Understanding block and input splits • MapReduce Data types • Understanding Writable • Data Flow in MapReduce Application • Understanding MapReduce problem on datasets • MapReduce and Functional Programming • Writing MapReduce Application • Understanding Mapper function • Understanding Reducer Function • Understanding Driver • Usage of Combiner • Understanding Partitioner • Usage of Distributed Cache • Passing the parameters to mapper and reducer • Analysing the Results • Log files • Input Formats and Output Formats • Counters, Skipping Bad and unwanted Records • Writing Join’s in MapReduce with 2 Input files. Join Types. • Execute MapReduce Job – Insights. • Exercise’s on MapReduce. • Job Scheduling: Type of Schedulers. Copyright @ 2015 Learntek. All Rights Reserved. 8
  • 9. Hive • Hive concepts • Schema on Read VS Schema on Write • Hive architecture • Install and configure hive on cluster • Meta Store – Purpose & Type of Configurations • Different type of tables in Hive • Buckets • Partitions • Joins in hive • Hive Query Language • Hive Data Types • Data Loading into Hive Tables • Hive Query Execution • Hive library functions • Hive UDF • Hive Limitations Copyright @ 2015 Learntek. All Rights Reserved. 9
  • 10. Pig • Pig basics • Install and configure PIG on a cluster • PIG Library functions • Pig Vs Hive • Write sample Pig Latin scripts • Modes of running PIG • Running in Grunt shell • Running as Java program • PIG UDFs Copyright @ 2015 Learntek. All Rights Reserved. 10
  • 11. HBase • HBase concepts • HBase architecture • Region server architecture • File storage architecture • HBase basics • Column access • Scans • HBase use cases • Install and configure HBase on a multi node cluster • Create database, Develop and run sample applications • Access data stored in HBase using Java API Copyright @ 2015 Learntek. All Rights Reserved. 11
  • 12. Sqoop • Install and configure Sqoop on cluster • Connecting to RDBMS • Installing Mysql • Import data from Mysql to hive • Export data to Mysql • Internal mechanism of import/export Copyright @ 2015 Learntek. All Rights Reserved. 12
  • 13. Oozie • Introduction to OOZIE • Oozie architecture • XML file specifications • Specifying Work flow • Control nodes • Oozie job coordinator Copyright @ 2015 Learntek. All Rights Reserved. 13
  • 14. Flume • Introduction to Flume • Configuration and Setup • Flume Sink with example • Channel • Flume Source with example • Complex flume architecture Copyright @ 2015 Learntek. All Rights Reserved. 14
  • 15. ZooKeeper • Introduction to ZooKeeper • Challenges in distributed Applications • Coordination • ZooKeeper : Design Goals • Data Model and Hierarchical namespace • Cilent APIs Copyright @ 2015 Learntek. All Rights Reserved. 15
  • 16. YARN • Hadoop 1.0 Limitations • MapReduce Limitations • History of Hadoop 2.0 • HDFS 2: Architecture • HDFS 2: Quorum based storage • HDFS 2: High availability • HDFS 2: Federation • YARN Architecture • Classic vs YARN • YARN Apps • YARN multitenancy • YARN Capacity Scheduler Copyright @ 2015 Learntek. All Rights Reserved. 16
  • 17. Prerequisites : • Knowledge in any programming language, Database knowledge and Linux Operating system. Core Java or Python knowledge helpful. Copyright @ 2015 Learntek. All Rights Reserved. 17
  • 18. Copyright @ 2015 Learntek. All Rights Reserved. 18