SlideShare uma empresa Scribd logo
1 de 59
DataTorrent
HADOOP
Interacting with HDFS
1
→ What's the “Need” ? ←
❏ Big data Ocean
❏ Expensive hardware
❏ Frequent Failures and Difficult recovery
❏ Scaling up with more machines
2
→ Hadoop ←
Open source software
- a Java framework
- initial release: December 10, 2011
It provides both,
Storage → [HDFS]
Processing → [MapReduce]
HDFS: Hadoop Distributed File System
3
→ How Hadoop addresses the need? ←
Big data Ocean
Have multiple machines. Each will store some portion of data, not the entire data.
Expensive hardware
Use commodity hardware. Simple and cheap.
Frequent Failures and Difficult recovery
Have multiple copies of data. Have the copies in different machines.
Scaling up with more machines
If more processing is needed, add new machines on the fly 4
→ HDFS ←
Runs on Commodity hardware: Doesn't require expensive machines
Large Files; Write-once, Read-many (WORM)
Files are split into blocks
Actual blocks go to DataNodes
The metadata is stored at NameNode
Replicate blocks to different node
Default configuration:
Block size = 128MB
Replication Factor = 3 5
6
7
8
→ Where NOT TO use Hadoop/HDFS ←
Low latency data access
HDFS is optimized for high throughput of data at the expense of latency.
Large number of small files
Namenode has the entire file-system metadata in memory.
Too much metadata as compared to actual data.
Multiple writers / Arbitrary file modifications
No support for multiple writers for a file
Always append to end of a file
9
→ Some Key Concepts ←
❏NameNode
❏DataNodes
❏JobTracker (MR v1)
❏TaskTrackers (MR v1)
❏ResourceManager (MR v2)
❏NodeManagers (MR v2)
❏ApplicationMasters (MR v2)
10
→ NameNode & DataNodes ←
❏NameNode:
Centerpiece of HDFS: The Master
Only stores the block metadata: block-name, block-location etc.
Critical component; When down, whole cluster is considered down; Single point of failure
Should be configured with higher RAM
❏DataNode:
Stores the actual data: The Slave
In constant communication with NameNode
When down, it does not affect the availability of data/cluster
11
12
→ JobTracker & TaskTrackers ←
❏JobTracker:
Talks to the NameNode to determine location of the data
Monitors all TaskTrackers and submits status of the job back to the client
When down, HDFS is still functional; no new MR job; existing jobs halted
Replaced by ResourceManager/ApplicationMaster in MRv2
❏TaskTracker:
Runs on all DataNodes
TaskTracker communicates with JobTracker signaling the task progress
TaskTracker failure is not considered fatal
13
→ ResourceManager & NodeManager ←
❏Present in Hadoop v2.0
❏Equivalent of JobTracker & TaskTracker in v1.0
❏ResourceManager (RM):
Runs usually at NameNode; Distributes resources among applications.
Two main components: Scheduler and ApplicationsManager (AM)
❏NodeManager (NM):
Per-node framework agent
Responsible for containers
Monitors their resource usage 14
15
→ Hadoop 1.0 vs. 2.0 ←
HDFS 1.0:
Single point of failure
Horizontal scaling performance issue
HDFS 2.0:
HDFS High Availability
HDFS Snapshot
Improved performance
HDFS Federation
16
17
HDFS Federation
→ Interacting with HDFS ←
Command prompt:
Similar to Linux terminal commands
Unix is the model, POSIX is the API
Web Interface:
Similar to browsing a FTP site on web
18
Interacting With HDFS
On Command Prompt
19
→ Notes ←
File Paths on HDFS:
hdfs://<namenode>:<port>/path/to/file.txt
hdfs://127.0.0.1:8020/user/USERNAME/demo/data/file.txt
hdfs://localhost:8020/user/USERNAME/demo/data/file.txt
/user/USERNAME/demo/file.txt
demo/file.txt
File System:
Local: local file system (linux)
HDFS: hadoop file system
At some places: 20
→ Before we start ←
Command:
hdfs
Usage:
hdfs [--config confdir] COMMAND
Example:
hdfs dfs
hdfs dfsadmin
hdfs fsck
21
hdfs `dfs` commands
22
→ In general Syntax for `dfs` commands ←
hdfs
dfs
-<COMMAND>
-[OPTIONS]
<PARAMETERS>
e.g.
hdfs dfs -ls -R /user/USERNAME/demo/data/
23
0. Do It yourself
Syntax:
hdfs dfs -help [COMMAND … ]
hdfs dfs -usage [COMMAND … ]
Example:
hdfs dfs -help cat
hdfs dfs -usage cat
24
1. List the file/directory
Syntax:
hdfs dfs -ls [-d] [-h] [-R] <hdfs-dir-path>
Example:
hdfs dfs -ls
hdfs dfs -ls /
hdfs dfs -ls /user/USERNAME/demo/list-dir-example
hdfs dfs -ls -R /user/USERNAME/demo/list-dir-example
25
2. Creating a directory
Syntax:
hdfs dfs -mkdir [-p] <hdfs-dir-path>
Example:
hdfs dfs -mkdir /user/USERNAME/demo/create-dir-example
hdfs dfs -mkdir -p /user/USERNAME/demo/create-dir-example/dir1/dir2/dir3
26
3. Create a file on local & put it on HDFS
Syntax:
vi filename.txt
hdfs dfs -put [options] <local-file-path> <hdfs-dir-path>
Example:
vi file-copy-to-hdfs.txt
hdfs dfs -put file-copy-to-hdfs.txt /user/USERNAME/demo/put-example/
27
4. Get a file from HDFS to local
Syntax:
hdfs dfs -get <hdfs-file-path> [local-dir-path]
Example:
hdfs dfs -get /user/USERNAME/demo/get-example/file-copy-from-hdfs.txt ~/demo/
28
5. Copy From LOCAL To HDFS
Syntax:
hdfs dfs -copyFromLocal <local-file-path> <hdfs-file-path>
Example:
hdfs dfs -copyFromLocal file-copy-to-hdfs.txt /user/USERNAME/demo/copyFromLocal-example/
29
6. Copy To LOCAL From HDFS
Syntax:
hdfs dfs -copyToLocal <hdfs-file-path> <local-file-path>
Example:
hdfs dfs -copyToLocal /user/USERNAME/demo/copyToLocal-example/file-copy-from-hdfs.txt
~/demo/
30
7. Move a file from local to HDFS
Syntax:
hdfs dfs -moveFromLocal <local-file-path> <hdfs-dir-path>
Example:
hdfs dfs -moveFromLocal /path/to/file.txt /user/USERNAME/demo/moveFromLocal-example/
31
8. Copy a file within HDFS
Syntax:
hdfs dfs -cp <hdfs-source-file-path> <hdfs-dest-file-path>
Example:
hdfs dfs -cp /user/USERNAME/demo/copy-within-hdfs/file-copy.txt
/user/USERNAME/demo/data/
32
9. Move a file within HDFS
Syntax:
hdfs dfs -mv <hdfs-source-file-path> <hdfs-dest-file-path>
Example:
hdfs dfs -mv /user/USERNAME/demo/move-within-hdfs/file-move.txt
/user/USERNAME/demo/data/
33
10. Merge files on HDFS
Syntax:
hdfs dfs -getmerge [-nl] <hdfs-dir-path> <local-file-path>
Examples:
hdfs dfs -getmerge -nl /user/USERNAME/demo/merge-example/ /path/to/all-files.txt
34
11. View file contents
Syntax:
hdfs dfs -cat <hdfs-file-path>
hdfs dfs -tail <hdfs-file-path>
hdfs dfs -text <hdfs-file-path>
Examples:
hdfs dfs -cat /user/USERNAME/demo/data/cat-example.txt
hdfs dfs -cat /user/USERNAME/demo/data/cat-example.txt | head
35
12. Remove files/dirs from HDFS
Syntax:
hdfs dfs -rm [options] <hdfs-file-path>
Examples:
hdfs dfs -rm /user/USERNAME/demo/remove-example/remove-file.txt
hdfs dfs -rm -R /user/USERNAME/demo/remove-example/
hdfs dfs -rm -R -skipTrash /user/USERNAME/demo/remove-example/
36
13. Change file/dir properties
Syntax:
hdfs dfs -chgrp [-R] <NewGroupName> <hdfs-file-path>
hdfs dfs -chmod [-R] <permissions> <hdfs-file-path>
hdfs dfs -chown [-R] <NewOwnerName> <hdfs-file-path>
Examples:
hdfs dfs -chmod -R 777 /user/USERNAME/demo/data/file-change-properties.txt
37
14. Check the file size
Syntax:
hdfs dfs -du <hdfs-file-path>
Examples:
hdfs dfs -du /user/USERNAME/demo/data/file.txt
hdfs dfs -du -s -h /user/USERNAME/demo/data/
38
15. Create a zero byte file in HDFS
Syntax:
hdfs dfs -touchz <hdfs-file-path>
Examples:
hdfs dfs -touchz /user/USERNAME/demo/data/zero-byte-file.txt
39
16. File test operations
Syntax:
hdfs dfs -test -[defsz] <hdfs-file-path>
Examples:
hdfs dfs -test -e /user/USERNAME/demo/data/file.txt
echo $?
40
17. Get FileSystem Statistics
Syntax:
hdfs dfs -stat [format] <hdfs-file-path>
Format Options:
%b - file size in blocks, %g - group name of owner
%n - filename %o - block size
%r - replication %u - user name of owner
%y - modification date
41
18. Get File/Dir Counts
Syntax:
hdfs dfs -count [-q] [-h] [-v] <hdfs-file-path>
Example:
hdfs dfs -count -v /user/USERNAME/demo/
42
19. Set replication factor
Syntax:
hdfs dfs -setrep -w -R n <hdfs-file-path>
Examples:
hdfs dfs -setrep -w -R 2 /user/USERNAME/demo/data/file.txt
43
20. Set Block Size
Syntax:
hdfs dfs -D dfs.blocksize=blocksize -copyFromLocal <local-file-path> <hdfs-file-path>
Examples:
hdfs dfs -D dfs.blocksize=67108864 -copyFromLocal /path/to/file.txt
/user/USERNAME/demo/block-example/
44
21. Empty the HDFS trash
Syntax:
hdfs dfs -expunge
Location:
45
Other hdfs commands (admin)
46
22. HDFS Admin Commands: fsck
Syntax:
hdfs fsck <hdfs-file-path>
Options:
[-list-corruptfileblocks |
[-move | -delete | -openforwrite]
[-files [-blocks [-locations | -racks]]]
[-includeSnapshots]
47
48
23. HDFS Admin Commands: dfsadmin
Syntax:
hdfs dfsadmin
Options:
[-report [-live] [-dead] [-decommissioning]]
[-safemode enter | leave | get | wait]
[-refreshNodes]
[-refresh <host:ipc_port> <key> [arg1..argn]]
[-shutdownDatanode <datanode:port> [upgrade]]
[-getDatanodeInfo <datanode_host:ipc_port>]
[-help [cmd]]
Examples:
49
50
24. HDFS Admin Commands: namenode
Syntax:
hdfs namenode
Options:
[-checkpoint] |
[-format [-clusterid cid ] [-force] [-nonInteractive] ] |
[-upgrade [-clusterid cid] ] |
[-rollback] |
[-recover [-force] ] |
[-metadataVersion ]
Examples: 51
25. HDFS Admin Commands: getconf
Syntax:
hdfs getconf [-options]
Options:
[ -namenodes ] [ -secondaryNameNodes ]
[ -backupNodes ] [ -includeFile ]
[ -excludeFile ] [ -nnRpcAddresses ]
[ -confKey [key] ]
52
Again,,, THE most important commands !!
Syntax:
hdfs dfs -help [options]
hdfs dfs -usage [options]
Examples:
hdfs dfs -help help
hdfs dfs -usage usage
53
Interacting With HDFS
In Web Browser
54
Web HDFS
URL:
http://namenode:50070/explorer.html
Examples:
http://localhost:50070/explorer.html
http://ec2-52-23-214-111.compute-1.amazonaws.com:50070/explorer.html
55
References
1. http://www.hadoopinrealworld.com
2. http://www.slideshare.net/sanjeeb85/hdfscommandreference
3. http://www.slideshare.net/jaganadhg/hdfs-10509123
4. http://www.slideshare.net/praveenbhat2/adv-os-presentation
5. http://www.tomsitpro.com/articles/hadoop-2-vs-1,2-718.html
6. http://www.snia.org/sites/default/files/Hadoop2_New_And_Noteworthy_SNIA_v3.pdf
7. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-
hdfs/HDFSCommands.html
8. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-
56
Thank You!!
57
APPENDIX
58
Copy data from one cluster to another
Description:
Copy data between hadoop clusters
Syntax:
hadoop distcp hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo
hadoop distcp hdfs://nn1:8020/foo/a hdfs://nn1:8020/foo/b hdfs://nn2:8020/bar/foo
hadoop distcp -f hdfs://nn1:8020/srclist.file hdfs://nn2:8020/bar/foo
Where srclist.file contains
hdfs://nn1:8020/foo/a
hdfs://nn1:8020/foo/b
59

Mais conteúdo relacionado

Mais procurados

Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands SimoniShah6
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introductioninjae yeo
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 

Mais procurados (20)

Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Anatomy of file read in hadoop
Anatomy of file read in hadoopAnatomy of file read in hadoop
Anatomy of file read in hadoop
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introduction
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 

Destaque

Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentApache Apex
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Apache Apex
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsApache Apex
 
Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легендыNinel Kek
 
Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочкаNinel Kek
 
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSJonathan Oliver
 
бсп (обоб. урок)
бсп (обоб. урок)бсп (обоб. урок)
бсп (обоб. урок)HomichAlla
 
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4HomichAlla
 
Troubleshooting mysql-tutorial
Troubleshooting mysql-tutorialTroubleshooting mysql-tutorial
Troubleshooting mysql-tutorialjames tong
 
Windowing in Apache Apex
Windowing in Apache ApexWindowing in Apache Apex
Windowing in Apache ApexApache Apex
 

Destaque (20)

Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легенды
 
Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочка
 
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRS
 
бсп (обоб. урок)
бсп (обоб. урок)бсп (обоб. урок)
бсп (обоб. урок)
 
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4
 
Troubleshooting mysql-tutorial
Troubleshooting mysql-tutorialTroubleshooting mysql-tutorial
Troubleshooting mysql-tutorial
 
Windowing in Apache Apex
Windowing in Apache ApexWindowing in Apache Apex
Windowing in Apache Apex
 

Semelhante a Hadoop Interacting with HDFS

Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoopFrank Y
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryIJRESJOURNAL
 
5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTX5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTXMiguel720844
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Basic command of hadoop
Basic command of hadoopBasic command of hadoop
Basic command of hadoopAhmad Kabeer
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glanceTan Tran
 
Hadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorialHadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorialPranamesh Chakraborty
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentationpuneet yadav
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 

Semelhante a Hadoop Interacting with HDFS (20)

MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTX5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTX
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Unit 1
Unit 1Unit 1
Unit 1
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Basic command of hadoop
Basic command of hadoopBasic command of hadoop
Basic command of hadoop
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
 
Hadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorialHadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorial
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 

Mais de Apache Apex

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentApache Apex
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingApache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)Apache Apex
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream APIApache Apex
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexApache Apex
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & BigtopApache Apex
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex ApplicationApache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 

Mais de Apache Apex (17)

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & Bigtop
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Hadoop Interacting with HDFS

  • 2. → What's the “Need” ? ← ❏ Big data Ocean ❏ Expensive hardware ❏ Frequent Failures and Difficult recovery ❏ Scaling up with more machines 2
  • 3. → Hadoop ← Open source software - a Java framework - initial release: December 10, 2011 It provides both, Storage → [HDFS] Processing → [MapReduce] HDFS: Hadoop Distributed File System 3
  • 4. → How Hadoop addresses the need? ← Big data Ocean Have multiple machines. Each will store some portion of data, not the entire data. Expensive hardware Use commodity hardware. Simple and cheap. Frequent Failures and Difficult recovery Have multiple copies of data. Have the copies in different machines. Scaling up with more machines If more processing is needed, add new machines on the fly 4
  • 5. → HDFS ← Runs on Commodity hardware: Doesn't require expensive machines Large Files; Write-once, Read-many (WORM) Files are split into blocks Actual blocks go to DataNodes The metadata is stored at NameNode Replicate blocks to different node Default configuration: Block size = 128MB Replication Factor = 3 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. → Where NOT TO use Hadoop/HDFS ← Low latency data access HDFS is optimized for high throughput of data at the expense of latency. Large number of small files Namenode has the entire file-system metadata in memory. Too much metadata as compared to actual data. Multiple writers / Arbitrary file modifications No support for multiple writers for a file Always append to end of a file 9
  • 10. → Some Key Concepts ← ❏NameNode ❏DataNodes ❏JobTracker (MR v1) ❏TaskTrackers (MR v1) ❏ResourceManager (MR v2) ❏NodeManagers (MR v2) ❏ApplicationMasters (MR v2) 10
  • 11. → NameNode & DataNodes ← ❏NameNode: Centerpiece of HDFS: The Master Only stores the block metadata: block-name, block-location etc. Critical component; When down, whole cluster is considered down; Single point of failure Should be configured with higher RAM ❏DataNode: Stores the actual data: The Slave In constant communication with NameNode When down, it does not affect the availability of data/cluster 11
  • 12. 12
  • 13. → JobTracker & TaskTrackers ← ❏JobTracker: Talks to the NameNode to determine location of the data Monitors all TaskTrackers and submits status of the job back to the client When down, HDFS is still functional; no new MR job; existing jobs halted Replaced by ResourceManager/ApplicationMaster in MRv2 ❏TaskTracker: Runs on all DataNodes TaskTracker communicates with JobTracker signaling the task progress TaskTracker failure is not considered fatal 13
  • 14. → ResourceManager & NodeManager ← ❏Present in Hadoop v2.0 ❏Equivalent of JobTracker & TaskTracker in v1.0 ❏ResourceManager (RM): Runs usually at NameNode; Distributes resources among applications. Two main components: Scheduler and ApplicationsManager (AM) ❏NodeManager (NM): Per-node framework agent Responsible for containers Monitors their resource usage 14
  • 15. 15
  • 16. → Hadoop 1.0 vs. 2.0 ← HDFS 1.0: Single point of failure Horizontal scaling performance issue HDFS 2.0: HDFS High Availability HDFS Snapshot Improved performance HDFS Federation 16
  • 18. → Interacting with HDFS ← Command prompt: Similar to Linux terminal commands Unix is the model, POSIX is the API Web Interface: Similar to browsing a FTP site on web 18
  • 19. Interacting With HDFS On Command Prompt 19
  • 20. → Notes ← File Paths on HDFS: hdfs://<namenode>:<port>/path/to/file.txt hdfs://127.0.0.1:8020/user/USERNAME/demo/data/file.txt hdfs://localhost:8020/user/USERNAME/demo/data/file.txt /user/USERNAME/demo/file.txt demo/file.txt File System: Local: local file system (linux) HDFS: hadoop file system At some places: 20
  • 21. → Before we start ← Command: hdfs Usage: hdfs [--config confdir] COMMAND Example: hdfs dfs hdfs dfsadmin hdfs fsck 21
  • 23. → In general Syntax for `dfs` commands ← hdfs dfs -<COMMAND> -[OPTIONS] <PARAMETERS> e.g. hdfs dfs -ls -R /user/USERNAME/demo/data/ 23
  • 24. 0. Do It yourself Syntax: hdfs dfs -help [COMMAND … ] hdfs dfs -usage [COMMAND … ] Example: hdfs dfs -help cat hdfs dfs -usage cat 24
  • 25. 1. List the file/directory Syntax: hdfs dfs -ls [-d] [-h] [-R] <hdfs-dir-path> Example: hdfs dfs -ls hdfs dfs -ls / hdfs dfs -ls /user/USERNAME/demo/list-dir-example hdfs dfs -ls -R /user/USERNAME/demo/list-dir-example 25
  • 26. 2. Creating a directory Syntax: hdfs dfs -mkdir [-p] <hdfs-dir-path> Example: hdfs dfs -mkdir /user/USERNAME/demo/create-dir-example hdfs dfs -mkdir -p /user/USERNAME/demo/create-dir-example/dir1/dir2/dir3 26
  • 27. 3. Create a file on local & put it on HDFS Syntax: vi filename.txt hdfs dfs -put [options] <local-file-path> <hdfs-dir-path> Example: vi file-copy-to-hdfs.txt hdfs dfs -put file-copy-to-hdfs.txt /user/USERNAME/demo/put-example/ 27
  • 28. 4. Get a file from HDFS to local Syntax: hdfs dfs -get <hdfs-file-path> [local-dir-path] Example: hdfs dfs -get /user/USERNAME/demo/get-example/file-copy-from-hdfs.txt ~/demo/ 28
  • 29. 5. Copy From LOCAL To HDFS Syntax: hdfs dfs -copyFromLocal <local-file-path> <hdfs-file-path> Example: hdfs dfs -copyFromLocal file-copy-to-hdfs.txt /user/USERNAME/demo/copyFromLocal-example/ 29
  • 30. 6. Copy To LOCAL From HDFS Syntax: hdfs dfs -copyToLocal <hdfs-file-path> <local-file-path> Example: hdfs dfs -copyToLocal /user/USERNAME/demo/copyToLocal-example/file-copy-from-hdfs.txt ~/demo/ 30
  • 31. 7. Move a file from local to HDFS Syntax: hdfs dfs -moveFromLocal <local-file-path> <hdfs-dir-path> Example: hdfs dfs -moveFromLocal /path/to/file.txt /user/USERNAME/demo/moveFromLocal-example/ 31
  • 32. 8. Copy a file within HDFS Syntax: hdfs dfs -cp <hdfs-source-file-path> <hdfs-dest-file-path> Example: hdfs dfs -cp /user/USERNAME/demo/copy-within-hdfs/file-copy.txt /user/USERNAME/demo/data/ 32
  • 33. 9. Move a file within HDFS Syntax: hdfs dfs -mv <hdfs-source-file-path> <hdfs-dest-file-path> Example: hdfs dfs -mv /user/USERNAME/demo/move-within-hdfs/file-move.txt /user/USERNAME/demo/data/ 33
  • 34. 10. Merge files on HDFS Syntax: hdfs dfs -getmerge [-nl] <hdfs-dir-path> <local-file-path> Examples: hdfs dfs -getmerge -nl /user/USERNAME/demo/merge-example/ /path/to/all-files.txt 34
  • 35. 11. View file contents Syntax: hdfs dfs -cat <hdfs-file-path> hdfs dfs -tail <hdfs-file-path> hdfs dfs -text <hdfs-file-path> Examples: hdfs dfs -cat /user/USERNAME/demo/data/cat-example.txt hdfs dfs -cat /user/USERNAME/demo/data/cat-example.txt | head 35
  • 36. 12. Remove files/dirs from HDFS Syntax: hdfs dfs -rm [options] <hdfs-file-path> Examples: hdfs dfs -rm /user/USERNAME/demo/remove-example/remove-file.txt hdfs dfs -rm -R /user/USERNAME/demo/remove-example/ hdfs dfs -rm -R -skipTrash /user/USERNAME/demo/remove-example/ 36
  • 37. 13. Change file/dir properties Syntax: hdfs dfs -chgrp [-R] <NewGroupName> <hdfs-file-path> hdfs dfs -chmod [-R] <permissions> <hdfs-file-path> hdfs dfs -chown [-R] <NewOwnerName> <hdfs-file-path> Examples: hdfs dfs -chmod -R 777 /user/USERNAME/demo/data/file-change-properties.txt 37
  • 38. 14. Check the file size Syntax: hdfs dfs -du <hdfs-file-path> Examples: hdfs dfs -du /user/USERNAME/demo/data/file.txt hdfs dfs -du -s -h /user/USERNAME/demo/data/ 38
  • 39. 15. Create a zero byte file in HDFS Syntax: hdfs dfs -touchz <hdfs-file-path> Examples: hdfs dfs -touchz /user/USERNAME/demo/data/zero-byte-file.txt 39
  • 40. 16. File test operations Syntax: hdfs dfs -test -[defsz] <hdfs-file-path> Examples: hdfs dfs -test -e /user/USERNAME/demo/data/file.txt echo $? 40
  • 41. 17. Get FileSystem Statistics Syntax: hdfs dfs -stat [format] <hdfs-file-path> Format Options: %b - file size in blocks, %g - group name of owner %n - filename %o - block size %r - replication %u - user name of owner %y - modification date 41
  • 42. 18. Get File/Dir Counts Syntax: hdfs dfs -count [-q] [-h] [-v] <hdfs-file-path> Example: hdfs dfs -count -v /user/USERNAME/demo/ 42
  • 43. 19. Set replication factor Syntax: hdfs dfs -setrep -w -R n <hdfs-file-path> Examples: hdfs dfs -setrep -w -R 2 /user/USERNAME/demo/data/file.txt 43
  • 44. 20. Set Block Size Syntax: hdfs dfs -D dfs.blocksize=blocksize -copyFromLocal <local-file-path> <hdfs-file-path> Examples: hdfs dfs -D dfs.blocksize=67108864 -copyFromLocal /path/to/file.txt /user/USERNAME/demo/block-example/ 44
  • 45. 21. Empty the HDFS trash Syntax: hdfs dfs -expunge Location: 45
  • 46. Other hdfs commands (admin) 46
  • 47. 22. HDFS Admin Commands: fsck Syntax: hdfs fsck <hdfs-file-path> Options: [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]] [-includeSnapshots] 47
  • 48. 48
  • 49. 23. HDFS Admin Commands: dfsadmin Syntax: hdfs dfsadmin Options: [-report [-live] [-dead] [-decommissioning]] [-safemode enter | leave | get | wait] [-refreshNodes] [-refresh <host:ipc_port> <key> [arg1..argn]] [-shutdownDatanode <datanode:port> [upgrade]] [-getDatanodeInfo <datanode_host:ipc_port>] [-help [cmd]] Examples: 49
  • 50. 50
  • 51. 24. HDFS Admin Commands: namenode Syntax: hdfs namenode Options: [-checkpoint] | [-format [-clusterid cid ] [-force] [-nonInteractive] ] | [-upgrade [-clusterid cid] ] | [-rollback] | [-recover [-force] ] | [-metadataVersion ] Examples: 51
  • 52. 25. HDFS Admin Commands: getconf Syntax: hdfs getconf [-options] Options: [ -namenodes ] [ -secondaryNameNodes ] [ -backupNodes ] [ -includeFile ] [ -excludeFile ] [ -nnRpcAddresses ] [ -confKey [key] ] 52
  • 53. Again,,, THE most important commands !! Syntax: hdfs dfs -help [options] hdfs dfs -usage [options] Examples: hdfs dfs -help help hdfs dfs -usage usage 53
  • 54. Interacting With HDFS In Web Browser 54
  • 56. References 1. http://www.hadoopinrealworld.com 2. http://www.slideshare.net/sanjeeb85/hdfscommandreference 3. http://www.slideshare.net/jaganadhg/hdfs-10509123 4. http://www.slideshare.net/praveenbhat2/adv-os-presentation 5. http://www.tomsitpro.com/articles/hadoop-2-vs-1,2-718.html 6. http://www.snia.org/sites/default/files/Hadoop2_New_And_Noteworthy_SNIA_v3.pdf 7. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop- hdfs/HDFSCommands.html 8. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop- 56
  • 59. Copy data from one cluster to another Description: Copy data between hadoop clusters Syntax: hadoop distcp hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo hadoop distcp hdfs://nn1:8020/foo/a hdfs://nn1:8020/foo/b hdfs://nn2:8020/bar/foo hadoop distcp -f hdfs://nn1:8020/srclist.file hdfs://nn2:8020/bar/foo Where srclist.file contains hdfs://nn1:8020/foo/a hdfs://nn1:8020/foo/b 59

Notas do Editor

  1. Commodity Hardware: -affordable and easy to obtain -capable of running Windows, Linux, or MS-DOS without requiring any special devices or equipment -broadly compatible and can function on a plug and play basis -low-end but functional product without distinctive features
  2. BLOCK: A physical storage disk has a block size - minimum amount of data it can read or write. Normally 512 bytes. File systems for a single disk also deal with data in blocks. Normally few kilo bytes (4 kb). Hadoop has a much larger block size. By default it is 64 mb. Files in HDFS are broken down into block sized chunks and are stored as independent units. However, files smaller than a block size do not occupy the entire block. Why so large blocks??
  3. > NameNode does not store the actual data or the dataset. The data itself is actually stored in the DataNodes. > NameNode knows the list of the blocks and its location for any given file in HDFS. > With this information NameNode knows how to construct the file from blocks.
  4. JobTracker finds the best TaskTracker nodes to execute tasks based on: -data locality -available slots to execute a task on a given node
  5. HDFS High Availability Namenode metadata is written to a shared storage (Journal Manager) Only one active NN can write to shared storage Passive NNs read & replay metadata from shared storage When active NN fails, one of the passive NNs is promoted to active Snapshot: Able to store a checkpointed stage of hdfs () Improved performance: Multithreaded random read HDFS v1: 264MB/sec HDFS v2: 1395MB/sec (about 5X !!) Federation Namenode stores metadata in memory For very large files, namenode could exaust memory Spread metadata over multiple namenodes
  6. Details about HDFS Federation.
  7. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
  8. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
  9. Everything you need to know about hdfs commands: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
  10. Description: List the contents that match the specified pattern. If path is not specified, the contents of /user/<current_user> are listed Options: -d Directories are listed as plain files. -h Formats the sizes of files in a human-readable fashion, rather than a number of bytes. -R Recursively list the contents of directories. Output: (<permissions> <-/#replicas> <userid> <groupid> <size(in bytes)> <modification_date> <directoryName/fileName>)
  11. Description: Create a directory in specified location. Options: -p : Create directories in the specified path, if does not exist
  12. Description: Copy files into fs. Options: -f : If the file already exists, copying does not fail & the destination is overwritten. -p : Preserves access time, modification time, ownership and the mode.
  13. Description: Copy files from hdfs. When copying multiple files, the destination must be a directory. Options: -p : Preserves access time, modification time, ownership and the mode. -ignorecrc : Files that fail the CRC check may be copied with this option. -crc : Files and CRCs may be copied using this option.
  14. Description: Copy files from local filesystem into hdfs. It is same as “put” command but more specific w.r.t local filesystem Options: -f : If the file already exists, copying does not fail & the destination is overwritten. -p : Preserves access time, modification time, ownership and the mode.
  15. Description: Copy files from hdfs to local filesystem. When copying multiple files, the destination must be a directory. It is same as “get” command but more specific w.r.t local filesystem Options: -p : Preserves access time, modification time, ownership and the mode. -ignorecrc : Files that fail the CRC check may be copied with this option. -crc : Files and CRCs may be copied using this option.
  16. Description: Same as -copyFromLocal, except that the source is deleted after it's copied.
  17. Description: Copy files from hdfs to the same hdfs. File pattern can be specified. When copying multiple files, the destination must be a directory. Options: -f : If the file already exists, copying does not fail & the destination is overwritten. -p : Preserves access time, modification time, ownership and the mode.
  18. Description: Move files from hdfs to the same hdfs. File pattern can be specified. When moving multiple files, the destination must be a directory.
  19. Description: Get all the files in the directories that match the source file pattern and merge and sort them to only one file on local fs. <src> is kept. Options: -nl Add a newline character at the end of each file.
  20. -cat [-ignoreCrc] <src> ... : Fetch all files that match the file pattern <src> and display their content on stdout. -tail [-f] <file> : Show the last 1KB of the file. -f Shows appended data as the file grows. -text [-ignoreCrc] <src> ... : Takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream and Avro.
  21. Description: Delete all files that match the specified file pattern. Equivalent to the Unix command "rm <src>" Options: -skipTrash option bypasses trash, if enabled, and immediately deletes <src> -f If the file does not exist, do not display a diagnostic message or modify the exit status to reflect an error. -[rR] Recursively deletes directories
  22. -chgrp [-R] GROUP PATH... : This is equivalent to -chown ... :GROUP ... -chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH... : Changes permissions of a file. This works similar to the shell's chmod command with a few exceptions. -R modifies the files recursively. This is the only option currently supported. <MODE> Mode is the same as mode used for the shell's command. -chown [-R] [OWNER][:[GROUP]] PATH... : Changes owner and group of a file. This is similar to the shell's chown command with a few exceptions. -R modifies the files recursively. This is the only option currently supported.
  23. -du [-s] [-h] <path> ... : Show the amount of space, in bytes, used by the files that match the specified file pattern. The following flags are optional: -s Rather than showing the size of each individual file that matches the pattern, shows the total (summary) size. -h Formats the sizes of files in a human-readable fashion rather than a number of bytes. Note that, even without the -s option, this only shows size summaries one level deep into a directory. The output is in the form size disk space consumed name(full path)
  24. -touchz <path> ... : Creates a file of zero length at <path> with current time as the timestamp of that <path>. An error is returned if the file exists with non-zero length
  25. Options: -d: f the path is a directory, return 0. -e: if the path exists, return 0. -f: if the path is a file, return 0. -s: if the path is not empty, return 0. -z: if the file is zero length, return 0.
  26. -stat [format] <path> ... : Print statistics about the file/directory at <path> in the specified format. Format accepts filesize in blocks (%b), group name of owner(%g), filename (%n), block size (%o), replication (%r), user name of owner(%u), modification date (%y, %Y)
  27. -count [-q] [-h] [-v] <path> ... : Count the number of directories, files and bytes under the paths that match the specified file pattern. The -h option shows file sizes in human readable format. The -v option displays a header line. The output columns are: DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME or, with the -q option: QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
  28. -setrep [-R] [-w] <rep> <path> ... : Set the replication level of a file. If <path> is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at <path>. -w It requests that the command waits for the replication to complete. This can potentially take a very long time. -R It is accepted for backwards compatibility. It has no effect.
  29. The block size specified by dfs.blocksize should be multiple of 512 -copyFromLocal error if block size is not valid: Invalid values: dfs.bytes-per-checksum (=512) must divide block size (=104857601).
  30. -expunge : Empty the Trash To enable hdfs thrash set fs.trash.interval > 1 in core-site.xml Deleted data goes in hdfs folder at : /user/<username>/.Trash/
  31. Options: -move move corrupted files to /lost+found -delete delete corrupted files -files print out files being checked -openforwrite print out files opened for write -includeSnapshots include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it -list-corruptfileblocks print out list of missing blocks and files they belong to -blocks print out block report -locations print out locations for every block -racks print out network topology for data-node locations
  32. Command: hdfs dfsadmin -help report Description: Reports basic filesystem information and statistics. The dfs usage can be different from "du" usage, because it measures raw space used by replication, checksums, snapshots and etc. on all the DNs. Optional flags may be used to filter the list of displayed DNs. Options: -report [-live] [-dead] [-decommissioning]
  33. hadoop getconf [-namenodes] gets list of namenodes in the cluster. [-secondaryNameNodes] gets list of secondary namenodes in the cluster. [-backupNodes] gets list of backup nodes in the cluster. [-includeFile] gets the include file path that defines the datanodes that can join the cluster. [-excludeFile] gets the exclude file path that defines the datanodes that need to decommissioned. [-nnRpcAddresses] gets the namenode rpc addresses [-confKey [key]] gets a specific key from the configuration