1. ABOUT THE COMPANY:
• LINUX WORLD-TRAINING AND DEVELOPING CENTER.
• ISO 9001:2008 Certified Organization working dedicatedly on Linux
& Open Source Technologies in entire Rajasthan.
• Centre who has maximum number of students who scores 100% in
RHCSA & RHCE Global Exam.
2. CONTENTS:
• 1. INTRODUCTION OF BIG DATA.
• 2. SOLUTIONS TO BIG DATA PROBLEM.
• 3. HADOOP.
• 4. HDFS(DISTRIBUTED STORAGE).
• 5.MAPREDUCE(DISTRIBUTED STORAGE).
• 6. HADOOP AUTOMATION TOOL FOR LINUX(TUI).
• 7. HADOOP AUTOMATION TOOL FOR iOS(TUI).
• 8. HADOOP AUTOMATION TOOL FOR LINUX(GUI).
• 9. ON DEMAND CLUSTER.
• 10.OPENSTACK-SAVANA(HADOOP WITH CLOUD).
• 11. INNOVATION.
3. INTRODUCTION TO HADOOP:
• Hadoop is a free, Java-based programming framework that supports
the processing of large data sets in a distributed computing
environment. It is part of the Apache project sponsored by the
Apache Software Foundation.
• Open-source software. Open-source software is created and
maintained by a network of developers from around the globe. It's
free to download, use and contribute to, though more and more
commercial versions of Hadoop are becoming available.
• Framework. In this case, it means that everything you need to
develop and run software applications is provided – programs,
connections, etc.
• Massive storage. The Hadoop framework breaks big data into blocks,
which are stored on clusters of commodity hardware.
4. BENEFITS OF HADOOP:
• Computing power. Its distributed computing model quickly processes
big data. The more computing nodes you use, the more processing
power you have.
• Flexibility. Unlike traditional relational databases, you don’t have to
preprocess data before storing it. You can store as much data as you
want and decide how to use it later. That includes unstructured data
like text, images and videos.
• Fault tolerance. Data and application processing are protected
against hardware failure. If a node goes down, jobs are automatically
redirected to other nodes to make sure the distributed computing
does not fail. And it automatically stores multiple copies of all data.
• Low cost. The open-source framework is free and uses commodity
hardware to store large quantities of data.
8. ABOUT THE PROJECT:
• High performance distributed computing implements for Big Data using Hadoop
Framework and running application on large cluster.
• This project is dealing with distributive storage and distributed computing.
• Using this project we can create an environment of super computing.
• This environment can also access using IOS device.
9. HARDWARE REQUIREMENTS:
• PROCESSOR : Pentium 4 ,i3 or later.
• MEMORY(RAM) : 1GB normally, 512MB in virtual machine.
• HARD DRIVE SPACE : 512MB
10. SOFTWARE REQUIREMENTS:
• OPERATING SYSTEM : REDHAT ENTERPRISE LINUX version 5 or later , Fedora , Centos,
IOS 8.4 or later.
• Python 2.6 or later.
• JDK.
• Hadoop RPM.
• PIG, HIVE , SQOOP RPMs.
• iTUNES(for iOS)
• CYDIA(for iOS)
• SERVERAUDITOR(for iOS).
• CLOUDSERVICES(for iOS).
• DIALOG BOX(for TUI).
12. HADOOP AUTOMATION TOOL:
• This tool can be used by two type- Custom and Typical.
• In Custom you can make whole cluster according to yourself.
• In typical the code will make whole cluster automatically.
• It also provide the uses of frameworks like PIG, HIVE, SQOOP.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52. Access Automation Tool Using iOS Device:
• Requirement-iOS device must be connected to same network of
namenode.
• iOS device must be jailbroken.
• we need to create a security file .pem on namenode.
• Tweeks needed to install on device to read and access this security
file.
• Using .pem file we can access the namenode.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63. CLUSTER-ON-DEMAND:
• A virtual cluster is a group of physical or virtual machines configured
for a common purpose, with associated user accounts and storage
resources.
• Cluster-on-Demand (COD) is a system to enable rapid, automated, on-
the-fly partitioning of a physical cluster into multiple independent
virtual clusters.
• You no longer need to build your own compute cluster in order to
tackle your High Performance Computing projects.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74. INNOVATION:
• Hadoop version-1 does not Support the replication of namenode and
it is also very necessary for security purpose but in this tool
replication of namenode is also supported.
• Using iOS device we can deploy cluster and install the frameworks,
currently this option is also not available in market.