4. HDFS Clients
• DFSClient: the native client
– High performance (using RPC)
– Java blinding
• libhdfs: a C++ client interface
– Using JNI => large overhead
– Also Java blinding (require Hadoop installing)
Architecting the Future of Big Data Page 4
5. HFTP
• Designed for cross-version copying (DistCp)
– High performance (using HTTP)
– Read-only
– The HTTP API is proprietary
– Clients must use HftpFileSystem (hftp://)
• WebHDFS is a rewrite of HFTP
Architecting the Future of Big Data Page 5
6. Design Goals
• Support a public HTTP API
• Support Read and Write
• High Performance
• Cross-version
• Security
Architecting the Future of Big Data Page 6
7. WebHDFS features
• HTTP REST API
– Defines a public API
– Permits non-Java client implementation
– Support common tools like curl/wget
• Wire Compatibility
– The REST API will be maintained for wire compatibility
– WebHDFS clients can talk to different Hadoop versions.
Architecting the Future of Big Data Page 7
8. WebHDFS features (2)
• A Complete HDFS Interface
– Support all user operations
– reading files
– writing to files
– mkdir, chmod, chown, mv, rm, …
• High Performance
– Using HTTP redirection to provide data locality
– File read/write are redirected to the corresponding
datanodes
Architecting the Future of Big Data Page 8
9. WebHDFS features (3)
• Secure Authentication
– Same as Hadoop authentication: Kerberos (SPNEGO)
and Hadoop delegation tokens
– Support proxy users
• A HDFS Built-in Component
– WebHDFS is a first class built-in component of HDFS.
– Run inside Namenodes and Datanodes
• Apache Open Source
– Available in Apache Hadoop 1.0 and above.
Architecting the Future of Big Data Page 9
10. WebHDFS URI & URL
• FileSystem scheme:
webhdfs://
• FileSystem URI:
webhdfs://<HOST>:<HTTP_PORT>/<PATH>
• HTTP URL:
http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=..
– Path prefix: /webhdfs/v1
– Query: ?op=..
Architecting the Future of Big Data Page 10
11. URI/URL Examples
• Suppose we have the following file
hdfs://namenode:8020/user/szetszwo/w.txt
• WebHDFS FileSystem URI
webhdfs://namenode:50070/user/szetszwo/w.txt
• WebHDFS HTTP URL
http://namenode:50070/webhdfs/v1/user/
szetszwo/w.txt?op=..
• WebHDFS HTTP URL to open the file
http://namenode:50070/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN
Architecting the Future of Big Data Page 11
12. Example: curl
• Use curl to open a file
$curl -i -L "http://namenode:50070/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN"
HTTP/1.1 307 TEMPORARY_REDIRECT
Content-Type: application/octet-stream
Location: http://192.168.5.2:50075/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN&offset=0
Content-Length: 0
Server: Jetty(6.1.26)
Architecting the Future of Big Data Page 12
13. Example: curl (2)
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 21
Server: Jetty(6.1.26)
Hello, WebHDFS user!
Architecting the Future of Big Data Page 13
14. Example: wget
• Use wget to open the same file
$wget "http://namenode:50070/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN" –O w.txt
Resolving ...
Connecting to ... connected.
HTTP request sent, awaiting response...
307 TEMPORARY_REDIRECT
Location: http://192.168.5.2:50075/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN&offset=0 [following]
Architecting the Future of Big Data Page 14
15. Example: wget (2)
--2012-06-13 01:42:10-- http://192.168.5.2:50075/
webhdfs/v1/user/szetszwo/w.txt?op=OPEN&offset=0
Connecting to 192.168.5.2:50075... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21 [application/octet-stream]
Saving to: `w.txt'
100%[=================>] 21 --.-K/s in 0s
2012-06-13 01:42:10 (3.34 MB/s) - `w.txt' saved
[21/21]
Architecting the Future of Big Data Page 15