SlideShare a Scribd company logo
1 of 63
General Hadoop Questions
What are the different vendor specific distributions of Hadoop?1
What are the different Hadoop configuration files?
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
Master and
slaves
2
What are the 3 modes in which Hadoop can run?
Standalone mode Pseudo-distributed
mode
1 2 3
Fully-distributed
mode
This is the default mode.
It uses the local
filesystem and a single
Java process to run the
Hadoop services
It uses a single node
Hadoop deployment to
execute all the Hadoop
services
It uses separate nodes
to run Hadoop master
and slave services
3
What are the differences between Regular file system and HDFS?4
1
2
3
1
2
3
Regular File System HDFS
Data is maintained in a single system
If the machine crashes, data recovery
is very difficult due to low fault tolerance
Seek time is more and hence it takes more time
to process the data
Data is distributed and maintained on
multiple systems
If a datanode crashes, data can still be
recovered from other nodes in the cluster
Time taken to read data is comparatively
more as there is local data read to disc and
coordination of data from multiple systems
HDFS Questions
Why is HDFS fault tolerant?
HDFS is fault tolerant as it replicates data on different datanodes. By default, a
block of data gets replicated on 3 datanodes.
Data
Data
block1 Data
block2
Data
block3
Data gets divided into multiple blocks
Data blocks are stored in different datanodes. If one node crashes, the
data can still be retrieved from other datanodes. This makes HDFS
fault tolerant
5
Explain the architecture of HDFS.
Namenode
Client
MetaData (Name, replicas, ā€¦.):
/home/foo/data, 3, ā€¦.
Metadata ops
Block ops
Rack 1
DatanodesDatanodes
Rack 2
Client
Write Write
Replication
Read
6
Explain the architecture of HDFS.
Namenode
Client
MetaData (Name, replicas, ā€¦.):
/home/foo/data, 3, ā€¦.
Metadata ops
Block ops
Rack 1
DatanodesDatanodes
Rack 2
Client
Write Write
Replication
Read
NameNode is the master severs that host metadata in disc and RAM. It holds
information about the various datanodes, their location, the size of each
block, etc.
Namenode
Metadata in Disk
Edit log Fsimage
Metadata in RAM
Metadata (Name, replicas,ā€¦.):
/home/foo/data, 3, ā€¦
6
Explain the architecture of HDFS.
Namenode
Client
MetaData (Name, replicas, ā€¦.):
/home/foo/data, 3, ā€¦.
Metadata ops
Block ops
Rack 1
DatanodesDatanodes
Rack 2
Client
Write Write
Replication
Read
ā€¢ Datanodes hold the actual data blocks and send block reports to Namenode every
10 seconds.
ā€¢ Datanode stores and retrieves the blocks when asked by the Namenode. It reads
and writes clientā€™s request and performs block creation, deletion and replication on
instruction from the Namenode
6
What are the 2 types of metadata a Namenode server holds?
Namenode server
Metadata in Disk
Metadata in RAM
Edit log Fsimage
Metadata (Name, replicas,ā€¦.):
/home/foo/data, 3, ā€¦
7
What is difference between Federation and High Availability?
HDFS Federation HDFS High Availability
There is no limitation to the number of namenodes and
the namenodes are not related to each other
There are 2 namenodes which are related to each
other. Both active and standby namenodes work all the
time
All the namenodes share a pool of metadata in which
each namenode will have its dedicated pool
At a time, active namenode will be up and running
while standby namenode will be idle
and updating itā€™s metadata once in a while
Provides fault tolerance i.e. if one namenode goes
down, that will not affect the data of the other
namenode
Requires two separate machines. On first, the active
namenode will be configured while the secondary
namenode will be configured on the other system
8
If you have an input file of 350 MB, how many input splits will be created by
HDFS and what is the size of each input split?
128
MB
350 MB
Data
128
MB
94
MB
ā€¢ Each block by default is divided into 128 MB.
ā€¢ The size of all blocks except the last block will be 128 MB.
ā€¢ So, there are 3 input splits in total.
ā€¢ The size of each split is 128 MB, 128 MB and 94 MB.
9
How does Rack Awareness work in HDFS?
HDFS Rack Awareness is about having knowledge of different data nodes and how
it is distributed across the racks of a Hadoop Cluster
Block A
Block B
Block C
By default, each block of data gets replicated thrice on various datanodes present
on different racks
10
How does Rack Awareness work in HDFS?
HDFS Rack Awareness is about having knowledge of different data nodes and how
it is distributed across the racks of a Hadoop Cluster
Block A
Block B
Block C
2 identical blocks cannot be placed on the same datanode
10
How does Rack Awareness work in HDFS?
HDFS Rack Awareness is about having knowledge of different data nodes and how
it is distributed across the racks of a Hadoop Cluster
Block A
Block B
Block C
When a cluster is rack aware, all the replicas of a block cannot be placed on the
same rack
10
How does Rack Awareness work in HDFS?
HDFS Rack Awareness is about having knowledge of different data nodes and how
it is distributed across the racks of a Hadoop Cluster
Block A
Block B
Block C
If a datanode crashes, you can retrieve the data block from different datanodes
10
How can you restart Namenode and all the daemons in Hadoop?
Following are the methods to do so:
Stop the Namenode with ./sbin /Hadoop-daemon.sh stop namenode and
then start the Namenode using ./sbin/Hadoop-daemon.sh start namenode
Stop all the daemons with ./sbin /stop-all.sh and then start the daemons using
./sbin/start-all.sh
1
2
11
Which command will help you find the status of blocks and filesystem health?
hdfs fsck <path> -files -blocks
hdfs fsck / -files ā€“blocks ā€“locations > dfs-fsck.log
To check the status of
the blocks
To check the health
status of filesystem
12
What would happen if you store too many small files in a cluster on
HDFS?
small files
ā€¢ Storing a lot of small files on HDFS generates a lot of metadata
files
ā€¢ Storing these metadata in the RAM is a challenge as each file,
block or directory takes 150 bytes just for metadata
ā€¢ Thus, the cumulative size of all the metadata will be too big
13
How to copy data from local system on to HDFS?
Following command helps to copy data from local file system into HDFS:
hadoop fs ā€“copyFromLocal [source] [destination]
Example: hadoop fs ā€“copyFromLocal /tmp/data.csv /user/test/data.csv
14
When do you use dfsadmin ā€“refreshNodes and rmadmin ā€“refreshNodes
command?
dfsadmin -refreshNodes
This is used to run HDFS client and it refreshes node
configuration for the NameNode
rmadmin -refreshNodes This is used to perform administrative tasks for
ResourceManager
15
These commands are used to refresh the node information while commissioning or
decommissioning of nodes is done
Is there anyway to change replication of files on HDFS after they are already
written to HDFS?
16
Following are the ways to change the replication of files on HDFS:
We can change the dfs.replication value to a particular number in $HADOOP_HOME/conf/hadoop-site.xml file
which will start replicating to the factor of that number for any new content that comes in
If you want to change the replication factor for a particular file or directory, then use:
$HADOOP_HOME/bin/Hadoop dfs ā€“setrep ā€“w4 /path of the file
Example: $HADOOP_HOME/bin/Hadoop dfs ā€“setrep ā€“w4 /user/temp/test.csv
Who takes care of replication consistency in a Hadoop cluster and what do
you mean by under/over replicated blocks?
17
NameNode
Namenode takes care of replication consistency in a Hadoop cluster and
fsck command gives the information regarding over and under replicated
block
Under-replicated blocks:
ā€¢ These are blocks that do not meet their target replication for the file they
belong to
ā€¢ HDFS will automatically create new replicas of under-replicated blocks
until they meet the target replication
Over-replicated blocks:
ā€¢ These are blocks that exceed their target replication for the file they
belong to
ā€¢ Normally, over-replication is not a problem, and HDFS will automatically
delete excess replicas
MapReduce Questions
What is distributed cache in MapReduce?
It is a mechanism supported by the Hadoop MapReduce framework. The data coming from
the disk can be cached and made available for all worker nodes where the map/reduce
tasks are running for a given job
Once a file is cached for our job, Hadoop will make it available on each datanode where
map/reduce tasks are running
Copy the file to HDFS: $ hdfs dfs-put /user/Simplilearn/lib/jar_file.jar
DistributedCache.addFileToClasspath(new
path(ā€œ/user/Simplilearn/lib/jar_file.jarā€), conf)
Setup the applicationā€™s JobConf:
Add it in Driver class
18
What role do RecordReader, Combiner and Partitioner play in a MapReduce
operation?
RecordReader
RecordReader communicates with the InputSplit and converts the
data into key-value pairs suitable for reading
by the mapper
Combiner
Combiner is also known as the mini reducer and for every
combiner, there is one mapper. It substitutes intermediate key value
pairs and passes it to the partitioner
Partitioner
Partitioner decides how many reduced tasks would be used to
summarize the data. Partitioner also confirms how outputs from the
combiners are sent to the reducers. It controls the partitioning of keys
of the intermediate map outputs
19
Why is MapReduce slower in processing data in comparision to other
processing frameworks?
20
MapReduce uses batch
processing to process data
Mostly uses Java language which
is difficult to program as it has
multiple lines of code
Reads data from the disk and,
after a particular iteration, sends
results to the HDFS. Such a
process increases latency and
makes graph processing slow
For a MapReduce job, is it possible to change the number of mappers to be
created?
21
By default, the number of mappers is always equal to the number of input splits. So, it
cannot be changed
Example: If you have 1GB of file that is split into 8 blocks (of 128MB each),
so there will be only 8 mappers running on the cluster
But, there are different ways in which you can either set a property or customize your code
to change the number of mappers
Name some Hadoop specific data types that are used in a MapReduce
program.
22
Following are some Hadoop specific data types used in a MapReduce program:
IntWritable FloatWritable LongWritable BooleanWritableDoubleWritable
ArrayWritable MapWritable ObjectWritable
What is speculative execution in Hadoop?23
ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute
another instance of the same task on another node
ā€¢ The task that finishes first will be accepted and the other task is killed
What is speculative execution in Hadoop?
Scheduler
Node A
Task slow Task progress
23
ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute
another instance of the same task on another node
ā€¢ The task that finishes first will be accepted and the other task is killed
What is speculative execution in Hadoop?
Scheduler
Node A
Task slow Task progress
Node B
Task duplicate
Launch
speculative
23
ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute
another instance of the same task on another node
ā€¢ The task that finishes first will be accepted and the other task is killed
What is speculative execution in Hadoop?
Output
Node A
Task slow
Node B
Task duplicate
23
ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute
another instance of the same task on another node
ā€¢ The task that finishes first will be accepted and the other task is killed
What is speculative execution in Hadoop?
Output
Node A
Task slow
Node B
Task duplicate
If Node A task is slower, then the output is
accepted from Node B
23
ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute
another instance of the same task on another node
ā€¢ The task that finishes first will be accepted and the other task is killed
How is identity mapper different from chain mapper?
Identity Mapper Chain Mapper
It the default mapper which is chosen when no mapper
is specified in MapReduce driver class
It implements identity function, which directly writes all
its key-value pairs into output
This class is used to run multiple mappers in a single
map task
It is defined in
org.apache.Hadoop.mapreduce.lib.chain.ChainMapper
package
It is defined in old MapReduce API (MR1) in
org.apache.Hadoop.mapred.lib.package
The output of the first mapper becomes the input to the
second mapper, second to third and so on
24
What are the major configuration parameters required in a MapReduce
program?
Input location of the job in HDFS
Output location of the job in HDFS
Input and output format
Classes containing map and reduce functions
.jar file for mapper, reducer and driver classes
1
2
3
4
5
25
What do you mean by map-side join and reduce-side join in MapReduce?
Map-side join Reduce-side join
Here the join is performed by the mapper Here the join is performed by the reducer
Each input data must be divided in same
number of partitions
Input to each map is in the form of a structured
partition and is in sorted order
No need to have the dataset in a structured
form (or partitioned)
Easier to implement than the map side join as
the sorting and shuffling phase sends the
values having identical keys to the same
reducer
26
What is the role of OutputCommitter class in a MapReduce job?
OutputCommitter describes the commit of task output for a MapReduce job
Example:
org.apache.hadoop.mapreduce.OutputCommitter
public abstract class OutputCommmitter extends OutputCommitter
27
MapReduce framework relies on the OutputCommitter of the job to:
ā€¢ Setā€™s up the job initialization
ā€¢ Cleanup the job after the job completion
ā€¢ Setup the task temporary output
ā€¢ Check whether a task need a commit
ā€¢ Commit of the task output
ā€¢ Discard the task commit
Explain the process of spilling in MapReduce.
ā€¢ Spilling is a process of copying the data from memory buffer to disc when
the content of the buffer reaches a certain threshold size
ā€¢ Spilling happens when there is not enough memory to fit all of the mapper
output
28
ā€¢ By default, a background thread starts spilling the content from memory to
disc after 80% of the buffer size is filled
ā€¢ For a 100 MB size buffer, the spilling will start after the content of the buffer
reach a size of 80 MB
How can you set the mappers and reducers for a MapReduce job?
The number of mappers and reducers can be set in the command line using:
-D mapred.map.tasks=5 ā€“D mapred.reduce.tasks=2
In the code, one can configure JobConf variables:
job.setNumMapTasks(5); // 5 mappers
job.setNumReduceTasks(2); // 2 reducers
29
What happens when a node running a map task fails before sending the
output to the reducer?
If such a case happens, map tasks will be assigned to a new node and the
whole task will be run again to re-create the map output
Map Task New node Run again to create the
map output
30
Can we write the output of MapReduce in different formats?
Yes, we can write the output of MapReduce in different formats.
Following are the examples:
TextOutputFormat MapFileOutputFormatSequenceFileOutputFormat
SequenceFileAsBinaryOutputFormat DBOutputFormat
31
Default output format and it writes records as
lines of text
Useful to write sequence files when the output
files need to be fed into another mapreduce
jobs as input files
Used to write output as map files
Used for writing to relational databases
and HBase. This format also sends the
reduce output to a SQL table
Another variant of SequenceFileInputFormat. It
also writes keys and values to sequence file in
binary format
YARN
What benefits did YARN bring in Hadoop 2.0 and how did it solve the issues
of MapReduce V1?
Managing jobs using a single job tracker and utilization of
computational resources was inefficient in MapReduce 1
In Hadoop 1.0, MapReduce performed both data
processing and resource management
Data processing Resource management
MapReduce consisted of Job Tracker and Task Tracker
32
What benefits did YARN bring in Hadoop 2.0 and how did it solve the issues
of MapReduce V1?
Managing jobs using a single job tracker and utilization of
computational resources was inefficient in MR 1
In Hadoop 1.0, MapReduce performed both data
processing and resource management
Data processing Resource management
MapReduce consisted of Job Tracker and Task Tracker
32
Scalability
Availability
issue
Resource
utilization
Canā€™t run non-
MapReduce
jobs
Following were some major issues:
What benefits did YARN bring in Hadoop 2.0 and how did it solve the issues
of MapReduce V1?
Scalability
Can have a cluster size of
more than 10,000 nodes
and can run
more than 1,00,000
concurrent tasks
Resource utilization Multitenancy
Can use open-source and
propriety data access
engines and perform real-
time analysis and running
ad-hoc query
Compatibility
Allows dynamic
allocation of cluster
resources to improve
resource utilization
Applications developed for
Hadoop 1 runs on YARN
without any disruption or
availability issues
32
Explain how YARN allocates resources to an application with the help of its
architecture.
Resource
ManagerClient
Job Submission
Submit job
request Resource Manager manages the resource
allocation in the cluster
33
Explain how YARN allocates resources to an application with the help of its
architecture.
Resource
ManagerClient
Job Submission
Submit job
request
Applications
Manager
Scheduler
ā€¢ Scheduler allocates resources to various
running applications
ā€¢ Schedules resources based on the
requirements of the applications
ā€¢ Does not monitor or track the status of the
applications
ā€¢ Applications Manager accepts job
submissions
ā€¢ Monitors and restarts application masters
in case of failure
33
Explain how YARN allocates resources to an application with the help of its
architecture.
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
ā€¢ Node Manager is a tracker that
tracks the jobs running
ā€¢ Monitors each containerā€™s resource
utilization
33
Explain how YARN allocates resources to an application with the help of its
architecture.
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
ā€¢ Application Master manages resource
needs of individual applications
ā€¢ Interacts with Scheduler to acquire
required resources
ā€¢ Interacts with Node Manager to execute
and monitor tasks
33
Explain how YARN allocates resources to an application with the help of its
architecture.
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
ā€¢ Container is a collection of resources
like RAM, CPU, Network Bandwidth
ā€¢ Provides rights to an application to use
specific amount of resources
33
(a) NodeManager
(b) ApplicationMaster
(c) ResourceManager
(d) Scheduler
34
Which of the following has occupied the place of JobTracker of
MapReduce V1?
Which of the following has occupied the place of JobTracker of
MapReduce V1?
(a) NodeManager
(b) ApplicationMaster
(c) ResourceManager
(d) Scheduler
34
Write the YARN commands to check the status of an application and kill an
application.
To check the status of an application:
yarn application -status ApplicationID
To kill or terminate an application:
yarn application ā€“kill ApplicationID
35
Can we have more than 1 ResourceManager in a YARN based cluster?
Yes, there can be more than 1 ResourceManager in case of a High Availability cluster
Active
ResourceManager
Standby
ResourceManager
36
Can we have more than 1 ResourceManager in a YARN based cluster?
Yes, there can be more than 1 ResourceManager in case of a High Availability cluster
Active
ResourceManager
Standby
ResourceManager
36
At a particular time, there can only be one active ResourceManager. In case the active
ResourceManager fails, then the standby ResourceManager comes to rescue
What are the different schedulers available in YARN?.
The different schedulers available in YARN are:
1. FIFO scheduler: The FIFO Scheduler places applications in a queue and runs
them in the order of submission (first in, first out)
2. Capacity scheduler: A separate dedicated queue allows the small job to
start as soon as it is submitted. The large job finishes later than when using the
FIFO Scheduler
3. Fair scheduler: There is no need to reserve a set amount of capacity, since it
will dynamically balance resources between all the running jobs
37
What happens if a ResourceManager fails while executing an application in a
high availability cluster?
If a ResourceManager fails in case of a high availability cluster, the newly active
ResourceManager instructs the ApplicationsMaster to abort
Resource Manager recovers its running state by taking advantage of the container
statuses sent from all Node Managers
2
3
38
In a high availability cluster, if one ResourceManager fails, another
ResourceManager becomes active1
In a cluster of 10 datanodes, each having 16 GB RAM and 10 cores, what
would be the total processing capacity of the cluster?
39
ā€¢ Every node in a Hadoop cluster would have multiple
processes running and these would need RAM
ā€¢ The machine will also have its own processes, would also
need some ram usage
ā€¢ So, if have 10 datanodes, you need to deduct at least 20-
30% towards the overheads, cloudera based services, etc.
ā€¢ you could have 11-12 GB available on every machine for
processing and 6-7 cores. Multiply that by 10. Thatā€™s the
processing capacity
ā€¢ Every node in a Hadoop cluster have multiple processes
running and these processes need RAM
ā€¢ The machine which has its own processes, would also need
some ram usage
ā€¢ So, if you have 10 datanodes, you need to allocate at least
20-30% towards the overheads, cloudera based services,
etc.
ā€¢ You could have 11-12 GB and 6-7 cores available on every
machine for processing. Multiply that by 10 and thatā€™s the
processing capacity
What happens if requested memory or CPU cores goes beyond
the size of container allocation?
40
If an application needs more memory and CPU cores, it cannot fit
into a container allocation. So the application fails
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions & Answers | Simplilearn

More Related Content

What's hot

The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...Databricks
Ā 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark ArchitectureAlexey Grishchenko
Ā 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
Ā 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
Ā 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoDatabricks
Ā 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
Ā 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Edureka!
Ā 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
Ā 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Ā 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
Ā 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
Ā 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Ā 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
Ā 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFSBhavesh Padharia
Ā 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
Ā 
Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveDataWorks Summit
Ā 
What to Expect From Oracle database 19c
What to Expect From Oracle database 19cWhat to Expect From Oracle database 19c
What to Expect From Oracle database 19cMaria Colgan
Ā 

What's hot (20)

The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
Ā 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Ā 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Ā 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Ā 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
Ā 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Ā 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
Ā 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Ā 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
Ā 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
Ā 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Ā 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
Ā 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
Ā 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
Ā 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
Ā 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
Ā 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Ā 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
Ā 
Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache Hive
Ā 
What to Expect From Oracle database 19c
What to Expect From Oracle database 19cWhat to Expect From Oracle database 19c
What to Expect From Oracle database 19c
Ā 

Similar to Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions & Answers | Simplilearn

big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing datapreetik9044
Ā 
Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxsunithachphd
Ā 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersMindsMapped Consulting
Ā 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
Ā 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
Ā 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptxssuser6e8e41
Ā 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFSSiddharth Mathur
Ā 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxSakthiVinoth78
Ā 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFSSatyaHadoop
Ā 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
Ā 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop ArchitectureDelhi/NCR HUG
Ā 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesKelly Technologies
Ā 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryIJRESJOURNAL
Ā 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
Ā 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptxAyush .
Ā 

Similar to Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions & Answers | Simplilearn (20)

big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
Ā 
Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptx
Ā 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
Ā 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
Ā 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
Ā 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
Ā 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Ā 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptx
Ā 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
Ā 
Hdfs
HdfsHdfs
Hdfs
Ā 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
Ā 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Ā 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
Ā 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
Ā 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Ā 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Ā 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptx
Ā 
Hdfs
HdfsHdfs
Hdfs
Ā 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
Ā 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Ā 

More from Simplilearn

ChatGPT in Cybersecurity
ChatGPT in CybersecurityChatGPT in Cybersecurity
ChatGPT in CybersecuritySimplilearn
Ā 
Whatis SQL Injection.pptx
Whatis SQL Injection.pptxWhatis SQL Injection.pptx
Whatis SQL Injection.pptxSimplilearn
Ā 
Top 5 High Paying Cloud Computing Jobs in 2023
 Top 5 High Paying Cloud Computing Jobs in 2023  Top 5 High Paying Cloud Computing Jobs in 2023
Top 5 High Paying Cloud Computing Jobs in 2023 Simplilearn
Ā 
Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Simplilearn
Ā 
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Simplilearn
Ā 
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...Simplilearn
Ā 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Simplilearn
Ā 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...Simplilearn
Ā 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Simplilearn
Ā 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...Simplilearn
Ā 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Simplilearn
Ā 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Simplilearn
Ā 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Simplilearn
Ā 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...Simplilearn
Ā 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...Simplilearn
Ā 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...Simplilearn
Ā 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...Simplilearn
Ā 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Simplilearn
Ā 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...Simplilearn
Ā 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...Simplilearn
Ā 

More from Simplilearn (20)

ChatGPT in Cybersecurity
ChatGPT in CybersecurityChatGPT in Cybersecurity
ChatGPT in Cybersecurity
Ā 
Whatis SQL Injection.pptx
Whatis SQL Injection.pptxWhatis SQL Injection.pptx
Whatis SQL Injection.pptx
Ā 
Top 5 High Paying Cloud Computing Jobs in 2023
 Top 5 High Paying Cloud Computing Jobs in 2023  Top 5 High Paying Cloud Computing Jobs in 2023
Top 5 High Paying Cloud Computing Jobs in 2023
Ā 
Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024
Ā 
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Ā 
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
Ā 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Ā 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
Ā 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Ā 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
Ā 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Ā 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Ā 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Ā 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
Ā 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
Ā 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
Ā 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
Ā 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Ā 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
Ā 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
Ā 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
Ā 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
Ā 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
Ā 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
Ā 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
Ā 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
Ā 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
Ā 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
Ā 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
Ā 
Tį»”NG ƔN Tįŗ¬P THI VƀO Lį»šP 10 MƔN TIįŗ¾NG ANH NĂM Hį»ŒC 2023 - 2024 CƓ ĐƁP ƁN (NGį»® Ƃ...
Tį»”NG ƔN Tįŗ¬P THI VƀO Lį»šP 10 MƔN TIįŗ¾NG ANH NĂM Hį»ŒC 2023 - 2024 CƓ ĐƁP ƁN (NGį»® Ƃ...Tį»”NG ƔN Tįŗ¬P THI VƀO Lį»šP 10 MƔN TIįŗ¾NG ANH NĂM Hį»ŒC 2023 - 2024 CƓ ĐƁP ƁN (NGį»® Ƃ...
Tį»”NG ƔN Tįŗ¬P THI VƀO Lį»šP 10 MƔN TIįŗ¾NG ANH NĂM Hį»ŒC 2023 - 2024 CƓ ĐƁP ƁN (NGį»® Ƃ...Nguyen Thanh Tu Collection
Ā 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
Ā 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
Ā 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
Ā 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
Ā 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
Ā 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
Ā 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
Ā 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
Ā 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
Ā 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
Ā 

Recently uploaded (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
Ā 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
Ā 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
Ā 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
Ā 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
Ā 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Ā 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
Ā 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
Ā 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
Ā 
Tį»”NG ƔN Tįŗ¬P THI VƀO Lį»šP 10 MƔN TIįŗ¾NG ANH NĂM Hį»ŒC 2023 - 2024 CƓ ĐƁP ƁN (NGį»® Ƃ...
Tį»”NG ƔN Tįŗ¬P THI VƀO Lį»šP 10 MƔN TIįŗ¾NG ANH NĂM Hį»ŒC 2023 - 2024 CƓ ĐƁP ƁN (NGį»® Ƃ...Tį»”NG ƔN Tįŗ¬P THI VƀO Lį»šP 10 MƔN TIįŗ¾NG ANH NĂM Hį»ŒC 2023 - 2024 CƓ ĐƁP ƁN (NGį»® Ƃ...
Tį»”NG ƔN Tįŗ¬P THI VƀO Lį»šP 10 MƔN TIįŗ¾NG ANH NĂM Hį»ŒC 2023 - 2024 CƓ ĐƁP ƁN (NGį»® Ƃ...
Ā 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
Ā 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
Ā 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
Ā 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
Ā 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Ā 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
Ā 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
Ā 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
Ā 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Ā 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
Ā 

Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions & Answers | Simplilearn

  • 1.
  • 3. What are the different vendor specific distributions of Hadoop?1
  • 4. What are the different Hadoop configuration files? hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml Master and slaves 2
  • 5. What are the 3 modes in which Hadoop can run? Standalone mode Pseudo-distributed mode 1 2 3 Fully-distributed mode This is the default mode. It uses the local filesystem and a single Java process to run the Hadoop services It uses a single node Hadoop deployment to execute all the Hadoop services It uses separate nodes to run Hadoop master and slave services 3
  • 6. What are the differences between Regular file system and HDFS?4 1 2 3 1 2 3 Regular File System HDFS Data is maintained in a single system If the machine crashes, data recovery is very difficult due to low fault tolerance Seek time is more and hence it takes more time to process the data Data is distributed and maintained on multiple systems If a datanode crashes, data can still be recovered from other nodes in the cluster Time taken to read data is comparatively more as there is local data read to disc and coordination of data from multiple systems
  • 8. Why is HDFS fault tolerant? HDFS is fault tolerant as it replicates data on different datanodes. By default, a block of data gets replicated on 3 datanodes. Data Data block1 Data block2 Data block3 Data gets divided into multiple blocks Data blocks are stored in different datanodes. If one node crashes, the data can still be retrieved from other datanodes. This makes HDFS fault tolerant 5
  • 9. Explain the architecture of HDFS. Namenode Client MetaData (Name, replicas, ā€¦.): /home/foo/data, 3, ā€¦. Metadata ops Block ops Rack 1 DatanodesDatanodes Rack 2 Client Write Write Replication Read 6
  • 10. Explain the architecture of HDFS. Namenode Client MetaData (Name, replicas, ā€¦.): /home/foo/data, 3, ā€¦. Metadata ops Block ops Rack 1 DatanodesDatanodes Rack 2 Client Write Write Replication Read NameNode is the master severs that host metadata in disc and RAM. It holds information about the various datanodes, their location, the size of each block, etc. Namenode Metadata in Disk Edit log Fsimage Metadata in RAM Metadata (Name, replicas,ā€¦.): /home/foo/data, 3, ā€¦ 6
  • 11. Explain the architecture of HDFS. Namenode Client MetaData (Name, replicas, ā€¦.): /home/foo/data, 3, ā€¦. Metadata ops Block ops Rack 1 DatanodesDatanodes Rack 2 Client Write Write Replication Read ā€¢ Datanodes hold the actual data blocks and send block reports to Namenode every 10 seconds. ā€¢ Datanode stores and retrieves the blocks when asked by the Namenode. It reads and writes clientā€™s request and performs block creation, deletion and replication on instruction from the Namenode 6
  • 12. What are the 2 types of metadata a Namenode server holds? Namenode server Metadata in Disk Metadata in RAM Edit log Fsimage Metadata (Name, replicas,ā€¦.): /home/foo/data, 3, ā€¦ 7
  • 13. What is difference between Federation and High Availability? HDFS Federation HDFS High Availability There is no limitation to the number of namenodes and the namenodes are not related to each other There are 2 namenodes which are related to each other. Both active and standby namenodes work all the time All the namenodes share a pool of metadata in which each namenode will have its dedicated pool At a time, active namenode will be up and running while standby namenode will be idle and updating itā€™s metadata once in a while Provides fault tolerance i.e. if one namenode goes down, that will not affect the data of the other namenode Requires two separate machines. On first, the active namenode will be configured while the secondary namenode will be configured on the other system 8
  • 14. If you have an input file of 350 MB, how many input splits will be created by HDFS and what is the size of each input split? 128 MB 350 MB Data 128 MB 94 MB ā€¢ Each block by default is divided into 128 MB. ā€¢ The size of all blocks except the last block will be 128 MB. ā€¢ So, there are 3 input splits in total. ā€¢ The size of each split is 128 MB, 128 MB and 94 MB. 9
  • 15. How does Rack Awareness work in HDFS? HDFS Rack Awareness is about having knowledge of different data nodes and how it is distributed across the racks of a Hadoop Cluster Block A Block B Block C By default, each block of data gets replicated thrice on various datanodes present on different racks 10
  • 16. How does Rack Awareness work in HDFS? HDFS Rack Awareness is about having knowledge of different data nodes and how it is distributed across the racks of a Hadoop Cluster Block A Block B Block C 2 identical blocks cannot be placed on the same datanode 10
  • 17. How does Rack Awareness work in HDFS? HDFS Rack Awareness is about having knowledge of different data nodes and how it is distributed across the racks of a Hadoop Cluster Block A Block B Block C When a cluster is rack aware, all the replicas of a block cannot be placed on the same rack 10
  • 18. How does Rack Awareness work in HDFS? HDFS Rack Awareness is about having knowledge of different data nodes and how it is distributed across the racks of a Hadoop Cluster Block A Block B Block C If a datanode crashes, you can retrieve the data block from different datanodes 10
  • 19. How can you restart Namenode and all the daemons in Hadoop? Following are the methods to do so: Stop the Namenode with ./sbin /Hadoop-daemon.sh stop namenode and then start the Namenode using ./sbin/Hadoop-daemon.sh start namenode Stop all the daemons with ./sbin /stop-all.sh and then start the daemons using ./sbin/start-all.sh 1 2 11
  • 20. Which command will help you find the status of blocks and filesystem health? hdfs fsck <path> -files -blocks hdfs fsck / -files ā€“blocks ā€“locations > dfs-fsck.log To check the status of the blocks To check the health status of filesystem 12
  • 21. What would happen if you store too many small files in a cluster on HDFS? small files ā€¢ Storing a lot of small files on HDFS generates a lot of metadata files ā€¢ Storing these metadata in the RAM is a challenge as each file, block or directory takes 150 bytes just for metadata ā€¢ Thus, the cumulative size of all the metadata will be too big 13
  • 22. How to copy data from local system on to HDFS? Following command helps to copy data from local file system into HDFS: hadoop fs ā€“copyFromLocal [source] [destination] Example: hadoop fs ā€“copyFromLocal /tmp/data.csv /user/test/data.csv 14
  • 23. When do you use dfsadmin ā€“refreshNodes and rmadmin ā€“refreshNodes command? dfsadmin -refreshNodes This is used to run HDFS client and it refreshes node configuration for the NameNode rmadmin -refreshNodes This is used to perform administrative tasks for ResourceManager 15 These commands are used to refresh the node information while commissioning or decommissioning of nodes is done
  • 24. Is there anyway to change replication of files on HDFS after they are already written to HDFS? 16 Following are the ways to change the replication of files on HDFS: We can change the dfs.replication value to a particular number in $HADOOP_HOME/conf/hadoop-site.xml file which will start replicating to the factor of that number for any new content that comes in If you want to change the replication factor for a particular file or directory, then use: $HADOOP_HOME/bin/Hadoop dfs ā€“setrep ā€“w4 /path of the file Example: $HADOOP_HOME/bin/Hadoop dfs ā€“setrep ā€“w4 /user/temp/test.csv
  • 25. Who takes care of replication consistency in a Hadoop cluster and what do you mean by under/over replicated blocks? 17 NameNode Namenode takes care of replication consistency in a Hadoop cluster and fsck command gives the information regarding over and under replicated block Under-replicated blocks: ā€¢ These are blocks that do not meet their target replication for the file they belong to ā€¢ HDFS will automatically create new replicas of under-replicated blocks until they meet the target replication Over-replicated blocks: ā€¢ These are blocks that exceed their target replication for the file they belong to ā€¢ Normally, over-replication is not a problem, and HDFS will automatically delete excess replicas
  • 27. What is distributed cache in MapReduce? It is a mechanism supported by the Hadoop MapReduce framework. The data coming from the disk can be cached and made available for all worker nodes where the map/reduce tasks are running for a given job Once a file is cached for our job, Hadoop will make it available on each datanode where map/reduce tasks are running Copy the file to HDFS: $ hdfs dfs-put /user/Simplilearn/lib/jar_file.jar DistributedCache.addFileToClasspath(new path(ā€œ/user/Simplilearn/lib/jar_file.jarā€), conf) Setup the applicationā€™s JobConf: Add it in Driver class 18
  • 28. What role do RecordReader, Combiner and Partitioner play in a MapReduce operation? RecordReader RecordReader communicates with the InputSplit and converts the data into key-value pairs suitable for reading by the mapper Combiner Combiner is also known as the mini reducer and for every combiner, there is one mapper. It substitutes intermediate key value pairs and passes it to the partitioner Partitioner Partitioner decides how many reduced tasks would be used to summarize the data. Partitioner also confirms how outputs from the combiners are sent to the reducers. It controls the partitioning of keys of the intermediate map outputs 19
  • 29. Why is MapReduce slower in processing data in comparision to other processing frameworks? 20 MapReduce uses batch processing to process data Mostly uses Java language which is difficult to program as it has multiple lines of code Reads data from the disk and, after a particular iteration, sends results to the HDFS. Such a process increases latency and makes graph processing slow
  • 30. For a MapReduce job, is it possible to change the number of mappers to be created? 21 By default, the number of mappers is always equal to the number of input splits. So, it cannot be changed Example: If you have 1GB of file that is split into 8 blocks (of 128MB each), so there will be only 8 mappers running on the cluster But, there are different ways in which you can either set a property or customize your code to change the number of mappers
  • 31. Name some Hadoop specific data types that are used in a MapReduce program. 22 Following are some Hadoop specific data types used in a MapReduce program: IntWritable FloatWritable LongWritable BooleanWritableDoubleWritable ArrayWritable MapWritable ObjectWritable
  • 32. What is speculative execution in Hadoop?23 ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute another instance of the same task on another node ā€¢ The task that finishes first will be accepted and the other task is killed
  • 33. What is speculative execution in Hadoop? Scheduler Node A Task slow Task progress 23 ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute another instance of the same task on another node ā€¢ The task that finishes first will be accepted and the other task is killed
  • 34. What is speculative execution in Hadoop? Scheduler Node A Task slow Task progress Node B Task duplicate Launch speculative 23 ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute another instance of the same task on another node ā€¢ The task that finishes first will be accepted and the other task is killed
  • 35. What is speculative execution in Hadoop? Output Node A Task slow Node B Task duplicate 23 ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute another instance of the same task on another node ā€¢ The task that finishes first will be accepted and the other task is killed
  • 36. What is speculative execution in Hadoop? Output Node A Task slow Node B Task duplicate If Node A task is slower, then the output is accepted from Node B 23 ā€¢ If a datanode is executing any task slowly, the master node can redundantly execute another instance of the same task on another node ā€¢ The task that finishes first will be accepted and the other task is killed
  • 37. How is identity mapper different from chain mapper? Identity Mapper Chain Mapper It the default mapper which is chosen when no mapper is specified in MapReduce driver class It implements identity function, which directly writes all its key-value pairs into output This class is used to run multiple mappers in a single map task It is defined in org.apache.Hadoop.mapreduce.lib.chain.ChainMapper package It is defined in old MapReduce API (MR1) in org.apache.Hadoop.mapred.lib.package The output of the first mapper becomes the input to the second mapper, second to third and so on 24
  • 38. What are the major configuration parameters required in a MapReduce program? Input location of the job in HDFS Output location of the job in HDFS Input and output format Classes containing map and reduce functions .jar file for mapper, reducer and driver classes 1 2 3 4 5 25
  • 39. What do you mean by map-side join and reduce-side join in MapReduce? Map-side join Reduce-side join Here the join is performed by the mapper Here the join is performed by the reducer Each input data must be divided in same number of partitions Input to each map is in the form of a structured partition and is in sorted order No need to have the dataset in a structured form (or partitioned) Easier to implement than the map side join as the sorting and shuffling phase sends the values having identical keys to the same reducer 26
  • 40. What is the role of OutputCommitter class in a MapReduce job? OutputCommitter describes the commit of task output for a MapReduce job Example: org.apache.hadoop.mapreduce.OutputCommitter public abstract class OutputCommmitter extends OutputCommitter 27 MapReduce framework relies on the OutputCommitter of the job to: ā€¢ Setā€™s up the job initialization ā€¢ Cleanup the job after the job completion ā€¢ Setup the task temporary output ā€¢ Check whether a task need a commit ā€¢ Commit of the task output ā€¢ Discard the task commit
  • 41. Explain the process of spilling in MapReduce. ā€¢ Spilling is a process of copying the data from memory buffer to disc when the content of the buffer reaches a certain threshold size ā€¢ Spilling happens when there is not enough memory to fit all of the mapper output 28 ā€¢ By default, a background thread starts spilling the content from memory to disc after 80% of the buffer size is filled ā€¢ For a 100 MB size buffer, the spilling will start after the content of the buffer reach a size of 80 MB
  • 42. How can you set the mappers and reducers for a MapReduce job? The number of mappers and reducers can be set in the command line using: -D mapred.map.tasks=5 ā€“D mapred.reduce.tasks=2 In the code, one can configure JobConf variables: job.setNumMapTasks(5); // 5 mappers job.setNumReduceTasks(2); // 2 reducers 29
  • 43. What happens when a node running a map task fails before sending the output to the reducer? If such a case happens, map tasks will be assigned to a new node and the whole task will be run again to re-create the map output Map Task New node Run again to create the map output 30
  • 44. Can we write the output of MapReduce in different formats? Yes, we can write the output of MapReduce in different formats. Following are the examples: TextOutputFormat MapFileOutputFormatSequenceFileOutputFormat SequenceFileAsBinaryOutputFormat DBOutputFormat 31 Default output format and it writes records as lines of text Useful to write sequence files when the output files need to be fed into another mapreduce jobs as input files Used to write output as map files Used for writing to relational databases and HBase. This format also sends the reduce output to a SQL table Another variant of SequenceFileInputFormat. It also writes keys and values to sequence file in binary format
  • 45. YARN
  • 46. What benefits did YARN bring in Hadoop 2.0 and how did it solve the issues of MapReduce V1? Managing jobs using a single job tracker and utilization of computational resources was inefficient in MapReduce 1 In Hadoop 1.0, MapReduce performed both data processing and resource management Data processing Resource management MapReduce consisted of Job Tracker and Task Tracker 32
  • 47. What benefits did YARN bring in Hadoop 2.0 and how did it solve the issues of MapReduce V1? Managing jobs using a single job tracker and utilization of computational resources was inefficient in MR 1 In Hadoop 1.0, MapReduce performed both data processing and resource management Data processing Resource management MapReduce consisted of Job Tracker and Task Tracker 32 Scalability Availability issue Resource utilization Canā€™t run non- MapReduce jobs Following were some major issues:
  • 48. What benefits did YARN bring in Hadoop 2.0 and how did it solve the issues of MapReduce V1? Scalability Can have a cluster size of more than 10,000 nodes and can run more than 1,00,000 concurrent tasks Resource utilization Multitenancy Can use open-source and propriety data access engines and perform real- time analysis and running ad-hoc query Compatibility Allows dynamic allocation of cluster resources to improve resource utilization Applications developed for Hadoop 1 runs on YARN without any disruption or availability issues 32
  • 49. Explain how YARN allocates resources to an application with the help of its architecture. Resource ManagerClient Job Submission Submit job request Resource Manager manages the resource allocation in the cluster 33
  • 50. Explain how YARN allocates resources to an application with the help of its architecture. Resource ManagerClient Job Submission Submit job request Applications Manager Scheduler ā€¢ Scheduler allocates resources to various running applications ā€¢ Schedules resources based on the requirements of the applications ā€¢ Does not monitor or track the status of the applications ā€¢ Applications Manager accepts job submissions ā€¢ Monitors and restarts application masters in case of failure 33
  • 51. Explain how YARN allocates resources to an application with the help of its architecture. Resource ManagerClient Node Manager container App Master App Master container Node Manager Node Manager container container Job Submission Node Status MapReduce Status Resource Request Submit job request ā€¢ Node Manager is a tracker that tracks the jobs running ā€¢ Monitors each containerā€™s resource utilization 33
  • 52. Explain how YARN allocates resources to an application with the help of its architecture. Resource ManagerClient Node Manager container App Master App Master container Node Manager Node Manager container container Job Submission Node Status MapReduce Status Resource Request Submit job request ā€¢ Application Master manages resource needs of individual applications ā€¢ Interacts with Scheduler to acquire required resources ā€¢ Interacts with Node Manager to execute and monitor tasks 33
  • 53. Explain how YARN allocates resources to an application with the help of its architecture. Resource ManagerClient Node Manager container App Master App Master container Node Manager Node Manager container container Job Submission Node Status MapReduce Status Resource Request Submit job request ā€¢ Container is a collection of resources like RAM, CPU, Network Bandwidth ā€¢ Provides rights to an application to use specific amount of resources 33
  • 54. (a) NodeManager (b) ApplicationMaster (c) ResourceManager (d) Scheduler 34 Which of the following has occupied the place of JobTracker of MapReduce V1?
  • 55. Which of the following has occupied the place of JobTracker of MapReduce V1? (a) NodeManager (b) ApplicationMaster (c) ResourceManager (d) Scheduler 34
  • 56. Write the YARN commands to check the status of an application and kill an application. To check the status of an application: yarn application -status ApplicationID To kill or terminate an application: yarn application ā€“kill ApplicationID 35
  • 57. Can we have more than 1 ResourceManager in a YARN based cluster? Yes, there can be more than 1 ResourceManager in case of a High Availability cluster Active ResourceManager Standby ResourceManager 36
  • 58. Can we have more than 1 ResourceManager in a YARN based cluster? Yes, there can be more than 1 ResourceManager in case of a High Availability cluster Active ResourceManager Standby ResourceManager 36 At a particular time, there can only be one active ResourceManager. In case the active ResourceManager fails, then the standby ResourceManager comes to rescue
  • 59. What are the different schedulers available in YARN?. The different schedulers available in YARN are: 1. FIFO scheduler: The FIFO Scheduler places applications in a queue and runs them in the order of submission (first in, first out) 2. Capacity scheduler: A separate dedicated queue allows the small job to start as soon as it is submitted. The large job finishes later than when using the FIFO Scheduler 3. Fair scheduler: There is no need to reserve a set amount of capacity, since it will dynamically balance resources between all the running jobs 37
  • 60. What happens if a ResourceManager fails while executing an application in a high availability cluster? If a ResourceManager fails in case of a high availability cluster, the newly active ResourceManager instructs the ApplicationsMaster to abort Resource Manager recovers its running state by taking advantage of the container statuses sent from all Node Managers 2 3 38 In a high availability cluster, if one ResourceManager fails, another ResourceManager becomes active1
  • 61. In a cluster of 10 datanodes, each having 16 GB RAM and 10 cores, what would be the total processing capacity of the cluster? 39 ā€¢ Every node in a Hadoop cluster would have multiple processes running and these would need RAM ā€¢ The machine will also have its own processes, would also need some ram usage ā€¢ So, if have 10 datanodes, you need to deduct at least 20- 30% towards the overheads, cloudera based services, etc. ā€¢ you could have 11-12 GB available on every machine for processing and 6-7 cores. Multiply that by 10. Thatā€™s the processing capacity ā€¢ Every node in a Hadoop cluster have multiple processes running and these processes need RAM ā€¢ The machine which has its own processes, would also need some ram usage ā€¢ So, if you have 10 datanodes, you need to allocate at least 20-30% towards the overheads, cloudera based services, etc. ā€¢ You could have 11-12 GB and 6-7 cores available on every machine for processing. Multiply that by 10 and thatā€™s the processing capacity
  • 62. What happens if requested memory or CPU cores goes beyond the size of container allocation? 40 If an application needs more memory and CPU cores, it cannot fit into a container allocation. So the application fails

Editor's Notes

  1. Style - 01