CONTENT
1. Introduction
2. List of papers
3. Review process adopted
4. List of issues
5. List of solution approaches
6. Issue wise review
7. Strengths and Weaknesses
8. Scope of our work
9. Conclusion
10.References
INTRODUCTION
Human beings now create 2.5 quintillion bytes of data per
day. The rate of data creation has increased so much that 90%
of the data in the world today has been created in the last two
years alone.
The term Big Data refers to large scale information
management and analysis technologies that exceed the
capability of traditional data processing technologies.
The incorporation of Big Data is changing Business Intelligence
and Analytics by providing new tools and opportunities for
leveraging large quantities of structured and unstructured
data.
Big data analysis-Efficient and effective handling of large data
LIST OF PAPERS
1)“Mobile Agent based New Framework for Improving Big Data
Analysis” .(2013)
2)“pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time
Big Data”.(2013)4)“5Ws model for big data analysis and
visualization”(2013)
3) “IOT-StatisticDB: A General Statistical Database Cluster
Mechanism for Big Data Analysis in the Internet of Things”.(2013)
4)“Road Traffic Big Data Collision Analysis Processing
Framework”.(2012)
5)“ RUBA: Real-time Unstructured Big Data Analysis
Framework”(2013)
6)“An Integrated Framework for Disaster Event Analysis in Big Data
Environment”(2013)
7)“Large Imbalance Data Classification Based on MapReduce
for Traffic Accident Prediction”.(2014)
8)Addressing Big Data Problem Using Hadoop and
Map Reduce”.(2012)
9)“Big R: Large-scale Analytics on Hadoop using R”. (2014)
10)“High Performance and Fault Tolerant Distributed File
System for Big Data Storage and Processing using
Hadoop”. (2014)
11)“Big Data Analysis Using Apache Hadoop”(2012)
12)”5Ws model for Big Data Analysis and Visualization”(2013)
13)” IRIS recognition on hadoop:a biometrics system
implementation on cloudcomputing”(2012)
14)“log analysis in cloud computing environment with hadoop
and spark”.(2011)
15) “Minimizing Big Data Problems using Cloud Computing Based
on Hadoop Architecture”.(2012)
16)“Big R: Large-scale Analytics on Hadoop using R”. (2014)
17)“Access Security on Cloud Computing Implemented in
Hadoop System”. (2012)
18)“Big Data Analysis Using Apache Hadoop”(2012)
19)” Applying Hadoop’s MapReduce Framework on
Clustering the GPS Signals through Cloud Computing”(2011)
20)” IRIS recognition on hadoop:a biometrics system
implementation on cloudcomputing”(2012)
21)“Mass Log Data Processing and Mining Based on
Hadoop and Cloud Computing”.(2011)
22) “H2T: A simple Hadoop-to-Twister Translator for Cloud
Computing”.(2012)
23)“An In-depth Study of Map Reduce in Cloud Environment”.
(2014)
24)“Optimizing Multiway Joins in a Map-Reduce Environment”.
(2012)
25)“Comparing Map-Reduce and FREERIDE for Data-Intensive
Applications”(2013)
Review process adopted
• There are basically 5 stages for review
process:
1. Stage 0
2. Stage 1
3. Stage 2
4. Stage 3
5. Stage 3+
Stage 0 – “Get a feel”
In this stage, we collect the data from environment.
-Conference research papers
Stage 1 – “Get the picture”
We describe a picture of our research from collected data.
Stage 2- “Get the detail”
Define all information about research topic such as title,
issue, solution approach from collected data and find out
that what we are looking for and where to find it?
Stage 3- “Evaluate the detail”
Here we defined the solution approach in detail such as
algorithm, methodology, mathematical explanation,
assumptions.
Stage 3+ - “Synthesize”
There are we synthesize our review, its topic, issue, solution
approach, mathematical explanation of solution approach,
type of research and find out the alternative approaches.
LIST OF ISSUES
These papers present different issues, which are listed as
below :
Paper no. Issues
1,2,3,12,13,
14,15,16
Big data analysis
4,6,7,17,18,
19,20
Real time big data analysis using hadoop in
cloud computing
5,8,10,11,
21,22,23,24,
25
Classification of big data using Tools and
Frameworks
LIST OF SOLUTION APPROACHES
Paper
No.
Issues Solution
1,2,3,12
,13,
14,15,
16
Big data
analysis
1)-MapReduce Agent Mobility (MRAM) used to
overcome the drawbacks of Hadoop.
2)-A new plug-in system PuntStore with pLSM (Punt
Log Structured Merge Tree) improve the read and
write throughput in NoSql database.
COLA(Cache Oblivious Look-ahead Array ) was also
used for efficiently insertion and range queries.
3)-“IOT-StatisticDB”- Statistical Database Cluster
Mechanism
Can support complicated statistical queries through
PostgreSQL8.2.4
12)-a 5Ws model to analyze the big data attributes and
patterns and densities between data.
Paper no Issue Solutions
4,6,7,17,18,19,20
Real time big data
analysis using
hadoop in cloud
computing
4)-Road Traffic Big Data Collision
Analysis Processing Frame work
proposed the distributed CEP which
dynamically distributed event
processing load in road traffic event
6)-An integrated framework using Co-
occurring Theory and Markov chain
approach to find out probabilities
7)-Hadoop framework and sampling
method for removing the imbalance
in data.
LIST OF SOLUTION APPROACHES
Paper no Issue Solutions
5,8,9,10,11,
21,22,23,24,
25
Classification of
big data using
Tools and
Frameworks
Hadoop Distributed File System
(HDFS), Hadoop cluster,
Map Reduce programming
framework
Visual clustering analysis
RUBA Unstructured Big data
Analysis framework
Apache Hadoop
LIST OF SOLUTION APPROCHES
Issue-Wise Findings :-
Issue 1 :- Big Data Analysis
• Worked to improve big data analysis and overcome the drawbacks of
Hadoop.
• Designed and developed the MapReduce Agent Mobility (MRAM) which is
based on the Java Agent Development Framework (JADE).
• Discussed few research works on big data analysis by using Hadoop and
stated the drawbacks of Hadoop on its performance and reliability
against big data analysis.
• Designed and developed a new plug-in system PuntStore with pLSM (Punt
Log Structured Merge Tree) index engine to provide scalable and efficient
index services for real-time data analysis.
• The Punt LSM (pLSM) can satisfy the needs for performing index probes
in write optimized systems.
Issue 2 :- Real time big data analysis using hadoop in cloud computing
• Worked to solve the Road traffic collision problem for big data analyzing
and processing
• Tested the proposed framework on road traffic data on a 45-mile section
of I-880N freeway CA, USA. By integrating freeway traffic big data and
collision data over a ten year period (1TB Size), and obtained the collision
probability.
• Worked for Real-time analysis and dynamic modification in unstructured
big data analysis
• the insufficient number of compute nodes as number of map tasks
increases with growing dataset size.
• Hadoop makes the users program the distributed software easily even
they know nothing about the bottom circumstances..
• A Markov chain with transition probabilities applied to the random
variables of cubes and result was taken to find the probability of disaster
events.
Issue 3 :- Classification of big data using Tools and
Frameworks
• Worked to investigate the database kernel level, parallel statistical analysis
techniques for massive sensor sampling data in the Internet of Things.
• The General Statistical Database Cluster Mechanism for Big Data Analysis
in the Internet of Things (“IOT-StatisticDB”)on sensor sampling data is one
of the most important procedures in IoT systems to transform “data” into
“knowledge”.
• Designed and developed a 5Ws model to analyze the big data attributes
and patterns and densities Between data.
• Hadoop Distributed File System (HDFS), Hadoop cluster.
• Map Reduce programming framework.
STRENGTH
• Solve the problem of centralized master node if it fails and fault tolerance
of the system in hadoop
• Increase the performance by MRAM to analyze the data comparing to
Hadoop
• Replace the MySql by NoSql by increasing the read and write throughput
and making searching, inserting and deletion easily in database.
• Provide parallel statistical analysis techniques for massive sensor sampling
data in the Internet of Things.
• Solve the problem of sampling the sensor data in parallel and distributed
system.
• Provide the information about the big data pattern and visualization by
using the 5Ws model.
• Can find out about the attackers location or ip addresses using 5Ws model
and its application.
• Many kinds of real time big data analysis can be done using hadoop
clustering techniques.
• Hadoop and HBase techniques can be used for analysis of real time road
traffic collision data.
• CEP analysis can be used to analyze an unstructured big data like CCTV
data and process it in distributed system.
• One can obtain the information about the current situation for the
disaster event.
WEAKNESSES
• Event analysis methods can not be applied for faster and
reliable insight information of real time data.
• Working of MRAM based on the Java Agent Development
Framework (JADE) so to develop it ,is more complex for
anyone.
• pLSM NoSql requires more space and memory size to
implement its work.
• Its uneasy to apply statistical analyzing methods on the
unstructured data in parallel and distributed environment.
• Providing useful traffic data form loop detectors is quite tough
work .
SCOPE OF OUR WORK
• Further work can be done on the Hadoop techniques as
MapReduce, HDFS, HBase environment to process the
distributed data by using MRAM framework.
• We can apply the RUBA framework to fields of U-city, U-plant
and ITS.
• In future we can use the 5WS model by deploying the
densities classification in more areas and more data sets and
use of Gapminder’s visualization techniques.
• We can improve the current disaster event analysis methods
for faster and reliable insight information
Future work will focus on performance evaluation and
modeling of hadoop data-intensive applications on cloud
platforms like Amazon Elastic Compute Cloud (EC2).
Conclusion
We have elaborated review of 25 research papers ranging from
2011 to 2014 based on Big Data Analysis. The review process
consists of 3 stage analysis. Basically we found three main
issues in the field of Big Data viz Big data analysis tools,
Classification of big data using Tools and Frameworks and
Real Time Big Data Analysis.
Here after finding the solution approaches we concluded that
Big Data Analysis is the main area into which the future work
can be done. We found many Solution approaches out of
which MapReduce Agent Mobility (MRAM), PuntStore with
pLSM (Punt Log Structured Merge Tree), “IOT-StatisticDB”-
Statistical Database Cluster Mechanism & Visual clustering
analysis are most promising due to its advantages &
properties.